Scraping for targets - Massive Duplicates
So I have been experimenting with many different ways of scraping for targets using all kinds of footprints. I focus on Google API and Bing, moreso Bing lately.
I've tried limiting results from 1000 to 100, but it doesn't seem to make much difference in # of duplicates you end up scraping.
One tool I've found that is capable of finding a lot of UNIQUE targets is actually SEO List Builder, the only problem is its really unstable. When it works though, it is pretty awesome.
What techniques have you come up with to reduce the # of duplicate targets you get on your scrapes? Do you just deal with it and accept it as an inevitability, or have you been able to come up with a method of eliminating most of your dupes?
Comments
Also, some footprints tend to be very similar so I try to use no more than 1 footprint / engine.
And naturally, don't scrape from more than 1 engine, that's going to get you nothing but dupes most of the time.