Skip to content

Feature request: Master parsed root domain list

GSADominationGSADomination California
I've noticed that after scraping for a while that many of the same root domains are crawled again at some point as I'm finding many duplicate emails/URLs. Every day or so I clear out the project so that the URLs/data don't take too many resources.

I was curious if there was a way to build a master parsed root domain list so that I'm not recrawling the same domains over and over again?


Comments

  • SvenSven www.GSA-Online.de
    hmm you can try adding that to the filter.
Sign In or Register to comment.