Skip to content

Regularly save scraping progress

Ok, I'm quite pissed so instead of creating a rage thread, I'll channel this rage into something useful.

The last couple of days I was scraping contextual URLs for my latest project. At ~95% scraping progress my computer crashed. After turning it back on I realised that SER scraper isn't saving the URLs in real time to your custom .txt file. It only saves once the progress is finished or aborted.

So how about you make it save every ~1-5% so instead of losing the whole list you only lose the last 5% or so.

Comments

  • SvenSven www.GSA-Online.de
    edited May 2014
    Well it saves it every 5min or 100+URLs. How did you feed it with URLs?
  • edited May 2014
    @Sven It didn't for me.. my list's completely empty.
    Did you mean 'feed'? If so, I was scraping using GSA foot prints + KW list.

    edit: Where does SER save the URLs to, if no custom .txt file is selected?
  • SvenSven www.GSA-Online.de
    for no custom file selected, it saves them in site lists -> identified.
  • I have Identified unticked. I only save to custom files and read from previously created lists. Well, I guess it was just bad luck/bug.
    I'm scraping again now and it's updating my custom file every ~100 URLs, just as you said it would.

    Thanks for the help Sven.
Sign In or Register to comment.