Skip to content

expired domains

Hi Is it possible to add a feature to scrape articles from expired domains?

These expired domains are specified


For example, the method used by the Expired Article Hunter program

Comments

  • SvenSven www.GSA-Online.de
    That can be done already. You have 3 scrapers that would do this task for you. Though it requires good proxies.
  • How can I do this?

    Where do I put the expired domains?
  • SvenSven www.GSA-Online.de
    hmm no, it works differently. It will query databases of expired domains with your keywords, gets the data and parses webarchive for the lost content.

    You can try importing your expired domains with...
    https://web.archive.org/web/20170101202654if_/http://<b>DOMAIN</b>


  • I mean to add a new method by adding an expired domain, and the program extracts articles

  • I use Scrapebox for that, but I find the reliability of web.archive.org terrible to useless, lately. 
  • SvenSven www.GSA-Online.de
    The latest update allows you to import a list of expired domains as custom source.
    Thanked by 1wolfvanween
  • I tried this domain

    https://web.archive.org/web/20181231000000if_/internetguruhub.com/


    But no article was extracted

    [18:09:17] Starting "Scraping Articles"...
    [18:09:18] Starting "Removing duplicate content"...
    [18:09:18] Starting "Filtering Content"...
    [18:09:18] Starting "Generating Articles"...
    [18:09:18] SameArticle: extracting paragraphs and titles...
    [18:09:18] Amount of words from all data sets: 0
    [18:09:18] Sorry, not enough content to create any article. Try to select more sources or use more keywords.
    [18:09:18] Finished.
  • SvenSven www.GSA-Online.de
    it was not even downloading here!?
  • SvenSven www.GSA-Online.de
    I found the problem and will optimize it for next update.
  • I've updated. But still the same problem

    [22:03:03] Starting "Scraping Articles"...
    [22:03:03] Starting "Removing duplicate content"...
    [22:03:03] Starting "Filtering Content"...
    [22:03:03] Starting "Generating Articles"...
    [22:03:03] MixSentence: extracting sentences and titles from 0 data sets...
    [22:03:03] Amount of words from all data sets: 0
    [22:03:03] Sorry, not enough content to create any article. Try to select more sources or use more keywords.
    [22:03:03] Starting "Inserting Spin syntax"...
    [22:03:03] Finished.
  • SvenSven www.GSA-Online.de
    is that custom source actually enabled? Can you show a screenshot?
  • SvenSven www.GSA-Online.de
    please send me the project backup. I don't get why it is not even scraping for you.
  • SvenSven www.GSA-Online.de
    you need to have keywords added to the project...you didn't have that so it was of course unable to extract something based on your keywords.
  • Thank you

    Can articles be scraped without keywords?
  • SvenSven www.GSA-Online.de
    no, because the text to extract should have some relation to the keyword.
  • I am not targeting keywords. Because I want the largest number of articles to use in building a backlink
  • SvenSven www.GSA-Online.de
    well use some conman words then like "the" or "a".
    Thanked by 14440
  • Type y
    Sven said:
    hmm no, it works differently. It will query databases of expired domains with your keywords, gets the data and parses webarchive for the lost content.

    You can try importing your expired domains with...
    https://web.archive.org/web/20170101202654if_/http://<b>DOMAIN</b>



    Gentoo, para entrar os artigos de dominios inspiradas tenho que inserir "https://web.archive.org/web/20170101202654if_/http://DOMAIN"

    Would that be it, or isn't it more accurate ?


Sign In or Register to comment.