Skip to content

Expired article hunter alternative?

Hi! I am already looking months for a good replacement for expired article hunter? This bot is scraping articles from expired domains. However, the dev stopped updating it. Does anyone please know another way or bot to scrape articles from expired domains?

Thanks in advance!
X

Comments

  • SvenSven www.GSA-Online.de
    GSA Content Generator can do it.
  • Veronique89Veronique89 Belgium
    edited December 2019
    Oh, I didn't knew that. In expired article hunter you can upload expired domains and then it scrapes all the articles from them. How does that work with GSA content scraper please? Is there a trial please? Thanks for your reply!
  • SvenSven www.GSA-Online.de
    Simply choose the ExpiredDomains as a source and thats it.
  • Thanks so much! Can I also import my own domains or is that impossible (last question - apologies!)?
  • SvenSven www.GSA-Online.de
    no, but you can import the urls from archive.org
  • Sven said:
    Simply choose the ExpiredDomains as a source and thats it.
    Seriously last question - so sorry :)! Do they come from expireddomains.net and are the imports unlimited? Because at expired domain article hunter the maximum domains is 9999. And some keywords in expireddomains.net have even +50.000 domains. Thanks so much for your reply!
  • SvenSven www.GSA-Online.de
    its limited to 10 pages...however that can be changed in the script when you manually edit it.

    Other sources for expired domains are also added like SEDO or PEEW
  • Could you please share how I can manually edit the limit  :p?
  • SvenSven www.GSA-Online.de
    Edit the following file with e.g. Notepad...

    c:\users\<login>\appdata\roaming\gsa content generator\scraper\article\<script>.ini

    You will see comments for each entry as well.
    Thanked by 1Veronique89
  • Hi Sven, I did everything as you suggested. However when only selecting expired domains and wayback the content generator says not enough content to create articles. I know that in expireddomains.net my keywords has more than 10.000 domains. Can you maybe tell me what I am doing wrong? Thanks in advance!
  • SvenSven www.GSA-Online.de
    Do you use proxies? The seem to ban you very fast without using them.
  • Apologies for the late reply!

    Yes, I am using private proxies and when multiple sources are checked, it creates articles without any problem. However when I only check expireddomains it says not enough content to create any article :(. Do you maybe have any idea of what I am doing wrong?

    What I actually want is the following. For example: my main keyword is curtains. I want to scrape any article from wayback archive that has the word curtains in it (no matter if it is in the URL or not). So simply said: as many expired content from the keyword as possible :).

    Would there be a way to help me out? If necessary: paid.

    Thanks s much for your help! It is so much appreciated!
  • SvenSven www.GSA-Online.de
    well what does the current log look like when you let it search the mentioned sources alone?
  • Hereby the log:

    [12:31:29] Starting "Scraping Articles"...
    [12:31:37] Last page: 1 | Results: 0 | URL: https://www.expireddomains.net/domainnamesearch/?q=xanax&start=0
    [12:31:37] Starting "Removing duplicate content"...
    [12:31:37] Starting "Filtering Content"...
    [12:31:37] Starting "Generating Articles"...
    [12:31:37] MixSentence: extracting sentences and titles...
    [12:31:37] Sorry, not enough content to create any article. Try to select more sources or use more keywords.
    [12:31:37] Starting "Inserting Spin syntax"...
    [12:31:37] Finished.
  • SvenSven www.GSA-Online.de
    thanks. Looks like they are blocking you...I will try to debug it. But might have to wait till holiday session is over since Im not in office.
  • Sure, not a problem.
  • 710fla710fla ★ #1 GSA SER VERIFIED LIST serpgrow.com
    You tried changing proxies or scraping without proxies?

    Might need rotating proxies to avoid IP ban
    Thanked by 1Veronique89
  • Veronique89Veronique89 Belgium
    edited December 2019
    Yes, I tried both without proxies and even with rotating proxies, but without any luck :(. Thanks for your help!
  • Please don't forget me :).

    Best wishes for 2020!
  • SvenSven www.GSA-Online.de
    I have it on my to-do list don't worry...though it has to wait till Im back in office.
  • Sure, not a problem. All the best!
  • SvenSven www.GSA-Online.de
    I tried it now that Im back in office...
    Seems like I can't even get a single result now in browser (with/without proxies) as well. Though Im online over a VPN so I can't tell for sure if it's also a ban or site is malfunctioned.
    Thanked by 1Veronique89
  • Yes, noticed this too, I think they are working on expireddomains.net. It happens from time to time. Usually tomorrow it has to work again.

    Thanks for your help! Appreciated :).
  • Hi Sven. Again here (apologies)! I am aware now that GSA content generator scrapes from expireddomains and other sites, but do you know if there even is a scraper that can scrape archive.org on keywords only (like no keyword in URL)? Thanks for helping me out. Please let me know if I need to pay extra for your services.
  • SvenSven www.GSA-Online.de
    I can try adding an own version for archive.org...though you don't know if this found site is from an expired domain or not.
  • Sven said:
    I can try adding an own version for archive.org...though you don't know if this found site is from an expired domain or not.
    I understand the issue, but don't mind. What would be the charge for that?
  • I'll jump in here, if you allow, apologies @Veronique89 - but I believe it's better having only one discussion about expired articles. 
    @Sven is it possible that there is some error involved with the expired article sources when you switch language?
    I have just had the program run for a few hours searching for German expired articles, with a long list of financial-related (German) keywords. And I while I watched the log I saw a lot of web.archive requests being fulfilled, yet the end result is 0 articles. 

  • And if I switch language to english (with obvious German keywords), it finds a lot of articles, which however seem to be mostly english indeed... 
    But now web.archive rejects all my requests (despite me using over 1000 private proxies). Do they have a way to know?
  • SvenSven www.GSA-Online.de
    @wolfvanween send me a log please for this
    Thanked by 1wolfvanween
Sign In or Register to comment.