I use GScraper, too and I think this is because G bans proxies very fast which use footprints with inurl or intitle or something that uses special footprints. Second is that GScraper sorts footprints alphabetically instead of mixed/scrambled. Therefore you have lots of duplicate footprints in short time and this is also a problem in GScraper. Just my thoughts...
if you're using gscraper then you need to limit the amount of advanced operator footprints you are using. play around with it a bit and run some tests and you can see the urls/sec increase as you simplify and streamline your platform footprints.
with that said, how long it takes depends on how many searches (keywords + footprints) you feed it. you can let it run for a few hours, or a day, or a week.. its really up to you.
if you streamline your footprints though, you don't need a lot of scraping to find most of the targets everyone else is using. 80/20 rule applies here.
This is slightly related....as of the past few weeks....Google is now banning the private, paid proxies I use for GSA. I get my proxies from buyproxies.org....and either they are all shit, or Google is getting better at banning proxies. What do?
Comments
Hey guys…..
I’m doing some scraping for GSA and its taking AGES to scrape targets…
I’m using footprints from Footprint factory so I have about 30k footprints. I’m running at max 1500 threads with 30 sec timeout…
I’m removing duplicates at scraping so I think that may be a reason its taking some time.
Here are my server details…
PowerUp
2 CPU
60 GB
2048 MB
Unmetered Bandwith
Any thoughts on why its taking so long to complete scrape?
30k footprints sounds like overkill.
example: 30k x 100 keywords = 3 million searches.
15k keywords
I have unlimited proxies from Gscraper
with that said, how long it takes depends on how many searches (keywords + footprints) you feed it. you can let it run for a few hours, or a day, or a week.. its really up to you.
if you streamline your footprints though, you don't need a lot of scraping to find most of the targets everyone else is using. 80/20 rule applies here.