Scraping for targets - Massive Duplicates

kijix84 · May 2016

So I have been experimenting with many different ways of scraping for targets using all kinds of footprints. I focus on Google API and Bing, moreso Bing lately.

I've tried limiting results from 1000 to 100, but it doesn't seem to make much difference in # of duplicates you end up scraping.

One tool I've found that is capable of finding a lot of UNIQUE targets is actually SEO List Builder, the only problem is its really unstable. When it works though, it is pretty awesome.

What techniques have you come up with to reduce the # of duplicate targets you get on your scrapes? Do you just deal with it and accept it as an inevitability, or have you been able to come up with a method of eliminating most of your dupes?

jonseo · May 2016

Good question... I've been googling around a bit - seems to me that most best practice tutorials have become outdated *very recently*.

Google is getting better and better at spotting those evil Scrapeboxers

And switching to other engines inevitable produces a *lot* of dupes - haven't found a solution for that myself yet either. That's basically why I started this thread:

https://forum.gsa-online.de/discussion/20494/sven-if-we-all-ask-nicely-would-you-build-us-a-better-scrapebox-please

Hinkys · May 2016

Basically using seed keywords that are not long tails. Meaning don't take 100 keywords and run them through keyword scraper, try to find 10,000 keywords that are all as different from each other as possible and then do a run with that.

Also, some footprints tend to be very similar so I try to use no more than 1 footprint / engine.

And naturally, don't scrape from more than 1 engine, that's going to get you nothing but dupes most of the time.

Scraping for targets - Massive Duplicates

Comments