Skip to content

Article/Content Generation Software

11012141516

Comments

  • 710fla710fla ★ #1 GSA SER VERIFIED LIST serpgrow.com
    Thanks @sven!
  • very good, thanks 
  • 710fla710fla ★ #1 GSA SER VERIFIED LIST serpgrow.com
    edited April 2017
    Is it possible to add a option to spin every word/phrase when using The Best Spinner? As of right now the "Good" setting skips over several phrases and words. I would like to spin the article as much as possible. 

    When using these settings in The Best Spinner program I am able to spin almost every word/phrase, compared to inserting the syntax in GSA Content Generator using the "Good" setting.


    Edit: I don't know why clicking the link doesn't work. Just copy and paste the address in your browser.
  • SvenSven www.GSA-Online.de
    @710fla I checked the API again (http://thebestspinner.com/?action=api_info) and didn't saw a parameter for "frequency". I added everything to configure that seems useful though.
  • edited April 2017
    @Sven how can I scrape without keyword?
    I try to scrape category page from the particular website but every time app is automatically replacing URL with search form etc...

  • example:
  • SvenSven www.GSA-Online.de
    did you add it as search engine ?
  • edited April 2017
    It replaces it when I use source as SE. 

    Other problem I have - scraping content in another language than English (I setup proper laguage in project settings). I tried to scrape websites and in log, I see only:
    "Unable to extract content" even If there is a keyword in article etc...
    I tried in 5 different websites without luck.

    Can you implement scraping websites using option "search engine" but without keywords? using only given URL? 
  • edited April 2017
    hi Sven has somthing changed as when I try to export the content now it used to ask me to what Tier I want to export it to, but now it just says GSA search engine ranker but where does it go? as I have a number of projects running and I can't find the content?

    Derek

    it works on an old scrape but not the new one ie if I choose a scrape i did yesterday it allows me to choose the tier but not the one I have just done.
    I will go back and try the wizard
  • Hi there seems to be a bug, I can't get it to work now, it starts the scrape and then locks up tried twice, I have to shut it down with task manager!!
  • SvenSven www.GSA-Online.de
    @meph thats what it should do...if you add a website only, it will try to search for the given keyword and extract the data around it. Please show a screenshot of the settings and I will try to reproduce it here.
    --
    @barrymoor12 if you haven't saved the project, it will not be visible to ContentGenerator.
    crash: can you show the log ?
  • @Sven I messed up with kwd settings - I had keyword in "Use additional keywords for checking" BUT unchecked options "Keywords must be in URL" and "Use additional keywords for checking" that's why I did not get any articles... Is it supposed to not checking titles/URLs when I unchecked these options?

    The app is scraping website but only when I add link as custom and not search engine. 
    I am trying to setup my own source in ini file - but then app is again replacing my link with %search% etc.
    Sample ini file:


    
    [setup]
    
    ;name of the source (empty=file name)
    name=komputerswiat.pl
    
    ;some description
    desc=Article Directory
    
    ;set a special category in case this is only offering special things
    category=Technology
    
    ;give a number from 1 to 5 (5 is best) / 0=disabled/not visible
    quality=5
    
    ;what language is the content in (default en)
    lang=pl
    
    ;search URL (use %search% as parameter)
    url=http://www.komputerswiat.pl/nowosci.aspx
    url mask=*komputerswiat.pl/nowosci/*.aspx
    
  • edited April 2017
    1. Is there way to remove some HTML parts from article? In filters I can setup only text but what about HTML?
    2. In filters I can not polish chars like "ł" or "ż"
    3. There is sth wrong with title parsing:
    Article have title (in H1 and in TITLE tag): "Bezprzewodowe słuchawki douszne - test, recenzja 7 modeli dousznych słuchawek bezprzewodowych" but app always get the second part: "test, recenzja 7 modeli dousznych słuchawek bezprzewodowych"
  • SvenSven www.GSA-Online.de
    @meph that komputerswiat.pl site uses google as search engine in a frame. So it is not working using that script as you provided it. you can directly create a search engine using google with "site:komputerswiat.pl" as parameter.
  • SvenSven www.GSA-Online.de
    @meph
    1. the html build with your article is just <h> and <p> and thats added after the parsing
    2. fixed in next update
    3. actually the tool does not know where the correct title is, it assumes that <title> is holding it. BUT usually you also see the domain added to the title as well as keywords or other un-relevant things. So it tries to remove that by searching for >>, | or also " - ". In that case, yes it might have gone wrong. Anyway next update improved this though.
  • 1. Yes, I know, but sometimes there are some not related HTML tags like ads or "See also" paragraphs in <p> but with own class, so I wanted to remove it with HTML class as it is the same across the whole site. Or to remove sentences contains some links etc.

    Some other thoughts:

    1. With this script (ini file) I wanted to scrape website's category without using search. Just category listing, so I added to URL %search% parameter which actually does nothing in this case:
    url=http://www.komputerswiat.pl/nowosci.aspx?q=%search% (if I don't add it app will replace my url)

    But, If you could implement for "search engine" option to not using %search% and checking keywords inside text body - this will allow us to scrape any website articles listings such as tags, categories etc. 
    Just using URL and URL mask.

    2. I see that manual is not ready yet, but wonder if there is some option to select article body inside particular HTML tag like <div id="article">BODY</div> not knowing how it ends (not using content_front and content_back, because there are many others HTML tags inside)

    4. I also notice that sometimes app is not getting text from <ul>, <li> etc. It's necessary if I choose to use Same Article algorithm to have full articles.

    5. Is it possible to not scraping articles which contain or not contain some text/HTML ?
    For example if I want to avoid scraping galleries so I can setup to not parse such article by inserting some unique tag (in filters)?
  • SvenSven www.GSA-Online.de
    1. you can use the filter here as well and tell it to remove sentence if it starts with See also or somethign.
    ---
    1. you can leave the %search% from the url, but then the tool needs to find the actual search form and fill it. In your case this is not working as that page does not have it's own search form but uses google as a search engine in a iframe.
    2. yes you can do that as well. have a look in the scraper articles\Free Article Zines.ini
    3. YES!!!! I agree ;)
    4. well that part is indeed not fully extracted, give me a sample and i work on it
    5. yes, add gallery in it and the action to skip article
  • Here are some <li> which are not parsed from the article:
    "Ulepszony proces produkcyjny:"...

    But here are some that we do not want (and now for this works nicely):
    "Test nośników SSD o pojemności 240-275 GB"
  • SvenSven www.GSA-Online.de
    @710fla latest update has your "Max Usage" option included
     0.61 - new: added new scrapers for Russian language
          - new: added Yandex as image/video scraper
          - new: improved title extraction
          - new: unicode usage in project/filters
          - fix: sentences where not correctly extracted in some languages
          - new: ability to calculate maximum article usage for spin syntax (experimental)
    
  • edited April 2017
    I still can't get proper titles. Even If I try to get from other tags/places, example:

    I tried:
    title_front=<meta property="og:title" content="
    title_back="

    title_front=<h1>
    title_back=</

    title_front=<title>
    title_back=</

    I always get: "test, recenzja klocków do budowy robota"
    What am I doing wrong? I want to get titles from H1 tag.
  • SvenSven www.GSA-Online.de
    title_front=<title>
    title_back=</title>

    that should get you the title...but only in latest update it should NOT cut things after "-" because it will hopefully find that in the body itself.
  • Sorry to bother you, but It is still not working.
    Example:
    <title>HyperX Cloud Stinger - test, opinie, recenzja słuchawek gamingowych</title>
    <meta property="og:title" content="HyperX Cloud Stinger - test, opinie, recenzja słuchawek gamingowych" />
    <h1>HyperX Cloud Stinger - test gamingowych słuchawek, które nie rujnują domowego budżetu</h1>
    Still got: "test, opinie, recenzja słuchawek gamingowych" 
    Tested in 0.62
    Every time I clean global and project cache...
  • SvenSven www.GSA-Online.de
    show me the complete scraper file you have for this please.
  • edited April 2017
    <pre>

    [setup]

    ;name of the source (empty=file name)
    name=_komputerswiat.pl

    ;some description
    desc=Article Directory

    ;set a special category in case this is only offering special things
    category=Technology

    ;give a number from 1 to 5 (5 is best) / 0=disabled/not visible
    quality=5

    ;what language is the content in (default en)
    lang=pl

    ;search URL (use %search% as parameter)
    url mask=*komputerswiat.pl/testy/*.aspx

    ;user agent=Mozilla/5.0 (Windows NT 6.3; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0Z

    title_front=<title>
    title_back=</title>
    </pre>
  • SvenSven www.GSA-Online.de
    edited April 2017
    you are missing...
    content_front=<div id="article">
    content_back=</div><script type="text/javascript">|</script></div>

    without that, it will try to extract everything on it's own.

    anyway, next update will have that engine added, but i use some other settings as you will see.
    https://pastebin.com/RsWFdrQU
  • Thank you!
  • Is it possible to include used source urls in the export template?
  • SvenSven www.GSA-Online.de
    for what reason?
  • edited April 2017
    I have a lot of projects and I need them to not duplicate using same articles across projects (I have my own solutions for that, but need urls)
  • SvenSven www.GSA-Online.de
    hmm sorry this source gathering is a bit more complicated and i will not add it to export templates for now. i however put it to my to do list.
Sign In or Register to comment.