Feature request: Master parsed root domain list
GSADomination
California
I've noticed that after scraping for a while that many of the same root domains are crawled again at some point as I'm finding many duplicate emails/URLs. Every day or so I clear out the project so that the URLs/data don't take too many resources.
I was curious if there was a way to build a master parsed root domain list so that I'm not recrawling the same domains over and over again?
I was curious if there was a way to build a master parsed root domain list so that I'm not recrawling the same domains over and over again?
Comments