Is it possible to add a option to spin every word/phrase when using The Best Spinner? As of right now the "Good" setting skips over several phrases and words. I would like to spin the article as much as possible.
When using these settings in The Best Spinner program I am able to spin almost every word/phrase, compared to inserting the syntax in GSA Content Generator using the "Good" setting.
Other problem I have - scraping content in another language than English (I setup proper laguage in project settings). I tried to scrape websites and in log, I see only:
"Unable to extract content" even If there is a keyword in article etc...
I tried in 5 different websites without luck.
Can you implement scraping websites using option "search engine" but without keywords? using only given URL?
hi Sven has somthing changed as when I try to export the content now it used to ask me to what Tier I want to export it to, but now it just says GSA search engine ranker but where does it go? as I have a number of projects running and I can't find the content?
Derek
it works on an old scrape but not the new one ie if I choose a scrape i did yesterday it allows me to choose the tier but not the one I have just done.
@meph thats what it should do...if you add a website only, it will try to search for the given keyword and extract the data around it. Please show a screenshot of the settings and I will try to reproduce it here. -- @barrymoor12 if you haven't saved the project, it will not be visible to ContentGenerator. crash: can you show the log ?
@Sven I messed up with kwd settings - I had keyword in "Use additional keywords for checking" BUT unchecked options "Keywords must be in URL" and "Use additional keywords for checking" that's why I did not get any articles... Is it supposed to not checking titles/URLs when I unchecked these options?
The app is scraping website but only when I add link as custom and not search engine.
I am trying to setup my own source in ini file - but then app is again replacing my link with %search% etc.
Sample ini file:
[setup]
;name of the source (empty=file name)
name=komputerswiat.pl
;some description
desc=Article Directory
;set a special category in case this is only offering special things
category=Technology
;give a number from 1 to 5 (5 is best) / 0=disabled/not visible
quality=5
;what language is the content in (default en)
lang=pl
;search URL (use %search% as parameter)
url=http://www.komputerswiat.pl/nowosci.aspx
url mask=*komputerswiat.pl/nowosci/*.aspx
1. Is there way to remove some HTML parts from article? In filters I can setup only text but what about HTML?
2. In filters I can not polish chars like "ł" or "ż"
3. There is sth wrong with title parsing:
Article have title (in H1 and in TITLE tag): "Bezprzewodowe słuchawki douszne - test, recenzja 7 modeli dousznych słuchawek bezprzewodowych" but app always get the second part: "test, recenzja 7 modeli dousznych słuchawek bezprzewodowych"
@meph that komputerswiat.pl site uses google as search engine in a frame. So it is not working using that script as you provided it. you can directly create a search engine using google with "site:komputerswiat.pl" as parameter.
@meph 1. the html build with your article is just <h> and <p> and thats added after the parsing 2. fixed in next update 3. actually the tool does not know where the correct title is, it assumes that <title> is holding it. BUT usually you also see the domain added to the title as well as keywords or other un-relevant things. So it tries to remove that by searching for >>, | or also " - ". In that case, yes it might have gone wrong. Anyway next update improved this though.
1. Yes, I know, but sometimes there are some not related HTML tags like ads or "See also" paragraphs in <p> but with own class, so I wanted to remove it with HTML class as it is the same across the whole site. Or to remove sentences contains some links etc.
Some other thoughts:
1. With this script (ini file) I wanted to scrape website's category without using search. Just category listing, so I added to URL %search% parameter which actually does nothing in this case:
But, If you could implement for "search engine" option to not using %search% and checking keywords inside text body - this will allow us to scrape any website articles listings such as tags, categories etc.
Just using URL and URL mask.
2. I see that manual is not ready yet, but wonder if there is some option to select article body inside particular HTML tag like <div id="article">BODY</div> not knowing how it ends (not using content_front and content_back, because there are many others HTML tags inside)
4. I also notice that sometimes app is not getting text from <ul>, <li> etc. It's necessary if I choose to use Same Article algorithm to have full articles.
5. Is it possible to not scraping articles which contain or not contain some text/HTML ?
For example if I want to avoid scraping galleries so I can setup to not parse such article by inserting some unique tag (in filters)?
1. you can use the filter here as well and tell it to remove sentence if it starts with See also or somethign. --- 1. you can leave the %search% from the url, but then the tool needs to find the actual search form and fill it. In your case this is not working as that page does not have it's own search form but uses google as a search engine in a iframe. 2. yes you can do that as well. have a look in the scraper articles\Free Article Zines.ini 3. YES!!!! I agree 4. well that part is indeed not fully extracted, give me a sample and i work on it 5. yes, add gallery in it and the action to skip article
@710fla latest update has your "Max Usage" option included
0.61 - new: added new scrapers for Russian language
- new: added Yandex as image/video scraper
- new: improved title extraction
- new: unicode usage in project/filters
- fix: sentences where not correctly extracted in some languages
- new: ability to calculate maximum article usage for spin syntax (experimental)
Comments
--
@barrymoor12 if you haven't saved the project, it will not be visible to ContentGenerator.
crash: can you show the log ?
1. the html build with your article is just <h> and <p> and thats added after the parsing
2. fixed in next update
3. actually the tool does not know where the correct title is, it assumes that <title> is holding it. BUT usually you also see the domain added to the title as well as keywords or other un-relevant things. So it tries to remove that by searching for >>, | or also " - ". In that case, yes it might have gone wrong. Anyway next update improved this though.
---
1. you can leave the %search% from the url, but then the tool needs to find the actual search form and fill it. In your case this is not working as that page does not have it's own search form but uses google as a search engine in a iframe.
2. yes you can do that as well. have a look in the scraper articles\Free Article Zines.ini
3. YES!!!! I agree
4. well that part is indeed not fully extracted, give me a sample and i work on it
5. yes, add gallery in it and the action to skip article
title_back=</title>
that should get you the title...but only in latest update it should NOT cut things after "-" because it will hopefully find that in the body itself.
content_front=<div id="article">
content_back=</div><script type="text/javascript">|</script></div>
without that, it will try to extract everything on it's own.
anyway, next update will have that engine added, but i use some other settings as you will see.
https://pastebin.com/RsWFdrQU