Article/Content Generation Software

710fla · April 2017

Thanks @sven!

xuanbinhco · April 2017

very good, thanks

710fla · April 2017

Is it possible to add a option to spin every word/phrase when using The Best Spinner? As of right now the "Good" setting skips over several phrases and words. I would like to spin the article as much as possible.

When using these settings in The Best Spinner program I am able to spin almost every word/phrase, compared to inserting the syntax in GSA Content Generator using the "Good" setting.

https://snag.gy/KZafRm.jpg

Edit: I don't know why clicking the link doesn't work. Just copy and paste the address in your browser.

Sven · April 2017

@710fla I checked the API again (http://thebestspinner.com/?action=api_info) and didn't saw a parameter for "frequency". I added everything to configure that seems useful though.

meph · April 2017

@Sven how can I scrape without keyword?

I try to scrape category page from the particular website but every time app is automatically replacing URL with search form etc...

meph · April 2017

example:

I add my own source: http://domain.com/some-category/

but app change url to http://domain.com/search?q=keyword

Sven · April 2017

did you add it as search engine ?

meph · April 2017

It replaces it when I use source as SE.

Other problem I have - scraping content in another language than English (I setup proper laguage in project settings). I tried to scrape websites and in log, I see only:

"Unable to extract content" even If there is a keyword in article etc...

I tried in 5 different websites without luck.

Can you implement scraping websites using option "search engine" but without keywords? using only given URL?

barrymoore12 · April 2017

hi Sven has somthing changed as when I try to export the content now it used to ask me to what Tier I want to export it to, but now it just says GSA search engine ranker but where does it go? as I have a number of projects running and I can't find the content?

Derek

it works on an old scrape but not the new one ie if I choose a scrape i did yesterday it allows me to choose the tier but not the one I have just done.

I will go back and try the wizard

barrymoore12 · April 2017

Hi there seems to be a bug, I can't get it to work now, it starts the scrape and then locks up tried twice, I have to shut it down with task manager!!

Sven · April 2017

@meph thats what it should do...if you add a website only, it will try to search for the given keyword and extract the data around it. Please show a screenshot of the settings and I will try to reproduce it here.
--
@barrymoor12 if you haven't saved the project, it will not be visible to ContentGenerator.
crash: can you show the log ?

meph · April 2017

@Sven I messed up with kwd settings - I had keyword in "Use additional keywords for checking" BUT unchecked options "Keywords must be in URL" and "Use additional keywords for checking" that's why I did not get any articles... Is it supposed to not checking titles/URLs when I unchecked these options?

The app is scraping website but only when I add link as custom and not search engine.

I am trying to setup my own source in ini file - but then app is again replacing my link with %search% etc.

Sample ini file:


[setup]

;name of the source (empty=file name)
name=komputerswiat.pl

;some description
desc=Article Directory

;set a special category in case this is only offering special things
category=Technology

;give a number from 1 to 5 (5 is best) / 0=disabled/not visible
quality=5

;what language is the content in (default en)
lang=pl

;search URL (use %search% as parameter)
url=http://www.komputerswiat.pl/nowosci.aspx
url mask=*komputerswiat.pl/nowosci/*.aspx

meph · April 2017

1. Is there way to remove some HTML parts from article? In filters I can setup only text but what about HTML?

2. In filters I can not polish chars like "ł" or "ż"

3. There is sth wrong with title parsing:

Article have title (in H1 and in TITLE tag): "Bezprzewodowe słuchawki douszne - test, recenzja 7 modeli dousznych słuchawek bezprzewodowych" but app always get the second part: "test, recenzja 7 modeli dousznych słuchawek bezprzewodowych"

Sven · April 2017

@meph that komputerswiat.pl site uses google as search engine in a frame. So it is not working using that script as you provided it. you can directly create a search engine using google with "site:komputerswiat.pl" as parameter.

Sven · April 2017

@meph
1. the html build with your article is just <h> and <p> and thats added after the parsing
2. fixed in next update
3. actually the tool does not know where the correct title is, it assumes that <title> is holding it. BUT usually you also see the domain added to the title as well as keywords or other un-relevant things. So it tries to remove that by searching for >>, | or also " - ". In that case, yes it might have gone wrong. Anyway next update improved this though.

meph · April 2017

@Sven

1. Yes, I know, but sometimes there are some not related HTML tags like ads or "See also" paragraphs in <p> but with own class, so I wanted to remove it with HTML class as it is the same across the whole site. Or to remove sentences contains some links etc.

Some other thoughts:

1. With this script (ini file) I wanted to scrape website's category without using search. Just category listing, so I added to URL %search% parameter which actually does nothing in this case:

url=http://www.komputerswiat.pl/nowosci.aspx?q=%search% (if I don't add it app will replace my url)

But, If you could implement for "search engine" option to not using %search% and checking keywords inside text body - this will allow us to scrape any website articles listings such as tags, categories etc.

Just using URL and URL mask.

2. I see that manual is not ready yet, but wonder if there is some option to select article body inside particular HTML tag like <div id="article">BODY</div> not knowing how it ends (not using content_front and content_back, because there are many others HTML tags inside)

4. I also notice that sometimes app is not getting text from <ul>, <li> etc. It's necessary if I choose to use Same Article algorithm to have full articles.

5. Is it possible to not scraping articles which contain or not contain some text/HTML ?

For example if I want to avoid scraping galleries so I can setup to not parse such article by inserting some unique tag (in filters)?

Sven · April 2017

1. you can use the filter here as well and tell it to remove sentence if it starts with See also or somethign.
---
1. you can leave the %search% from the url, but then the tool needs to find the actual search form and fill it. In your case this is not working as that page does not have it's own search form but uses google as a search engine in a iframe.
2. yes you can do that as well. have a look in the scraper articles\Free Article Zines.ini
3. YES!!!! I agree

4. well that part is indeed not fully extracted, give me a sample and i work on it
5. yes, add gallery in it and the action to skip article

meph · April 2017

Here are some <li> which are not parsed from the article:

"Ulepszony proces produkcyjny:"...

http://www.komputerswiat.pl/testy/sprzet/procesory/2017/03/intel-core-i7-7700k-test.aspx

But here are some that we do not want (and now for this works nicely):

"Test nośników SSD o pojemności 240-275 GB"

http://www.komputerswiat.pl/testy/sprzet/dyski-twarde/2017/03/wielki-test-ssd-jaki-nosnik-kupic-w-2017-roku.aspx

Sven · April 2017

@710fla latest update has your "Max Usage" option included

 0.61 - new: added new scrapers for Russian language
      - new: added Yandex as image/video scraper
      - new: improved title extraction
      - new: unicode usage in project/filters
      - fix: sentences where not correctly extracted in some languages
      - new: ability to calculate maximum article usage for spin syntax (experimental)

meph · April 2017

@Sven

I still can't get proper titles. Even If I try to get from other tags/places, example:

http://www.komputerswiat.pl/testy/sprzet/roboty/2017/03/jimu-robot-test.aspx

I tried:

title_front=<meta property="og:title" content="

title_back="

title_front=<h1>

title_back=</

title_front=<title>

title_back=</

I always get: "test, recenzja klocków do budowy robota"

What am I doing wrong? I want to get titles from H1 tag.

Sven · April 2017

title_front=<title>
title_back=</title>

that should get you the title...but only in latest update it should NOT cut things after "-" because it will hopefully find that in the body itself.

meph · April 2017

Sorry to bother you, but It is still not working.

Example:

<title>HyperX Cloud Stinger - test, opinie, recenzja słuchawek gamingowych</title>

<h1>HyperX Cloud Stinger - test gamingowych słuchawek, które nie rujnują domowego budżetu</h1>

Still got: "test, opinie, recenzja słuchawek gamingowych"

Tested in 0.62

Every time I clean global and project cache...

Sven · April 2017

show me the complete scraper file you have for this please.

meph · April 2017

<pre>

[setup]

;name of the source (empty=file name)

name=_komputerswiat.pl

;some description

desc=Article Directory

;set a special category in case this is only offering special things

category=Technology

;give a number from 1 to 5 (5 is best) / 0=disabled/not visible

quality=5

;what language is the content in (default en)

lang=pl

;search URL (use %search% as parameter)

url=http://www.komputerswiat.pl/testy.aspx?q=%search%

url mask=*komputerswiat.pl/testy/*.aspx

;user agent=Mozilla/5.0 (Windows NT 6.3; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0Z

title_front=<title>

title_back=</title>

</pre>

Sven · April 2017

you are missing...
content_front=<div id="article">
content_back=</div><script type="text/javascript">|</script></div>

without that, it will try to extract everything on it's own.

anyway, next update will have that engine added, but i use some other settings as you will see.
https://pastebin.com/RsWFdrQU

meph · April 2017

Thank you!

meph · April 2017

Is it possible to include used source urls in the export template?

Sven · April 2017

for what reason?

meph · April 2017

I have a lot of projects and I need them to not duplicate using same articles across projects (I have my own solutions for that, but need urls)

Sven · April 2017

hmm sorry this source gathering is a bit more complicated and i will not add it to export templates for now. i however put it to my to do list.

Article/Content Generation Software

Comments