I’m doing some scraping for GSA and its taking AGES to
scrape targets…
I’m using footprints from Footprint factory so I have about
30k footprints. I’m running at max 1500 threads with 30 sec timeout…
I’m removing duplicates at scraping so I think that may be a
reason its taking some time.
Here are my server details…
PowerUp
2 CPU
60 GB
2048 MB
Unmetered Bandwith
Any thoughts on why its taking so long to complete scrape?
jpvr90
edited March 2014
proxies?
30k footprints sounds like overkill. example: 30k x 100 keywords = 3 million searches.
DonCorleone
edited March 2014
how many keywords do you have ?
what about your proxies ?
Also don't remove urls in scraping . Do one thing at a time
Kaine thebestindexer.com
edited March 2014
I use it.
I'm sure SER can scraper faster.
And I would like, this is the best solution to be put first on a comment by example.
I do not know why but it does not allocate threads asking for this action.
magix
I use GScraper, too and I think this is because G bans proxies very fast which use footprints with inurl or intitle or something that uses special footprints. Second is that GScraper sorts footprints alphabetically instead of mixed/scrambled. Therefore you have lots of duplicate footprints in short time and this is also a problem in GScraper. Just my thoughts...
Try public proxies found on forums and other sources .
dr0ne
if you're using gscraper then you need to limit the amount of advanced operator footprints you are using. play around with it a bit and run some tests and you can see the urls/sec increase as you simplify and streamline your platform footprints.
with that said, how long it takes depends on how many searches (keywords + footprints) you feed it. you can let it run for a few hours, or a day, or a week.. its really up to you.
if you streamline your footprints though, you don't need a lot of scraping to find most of the targets everyone else is using. 80/20 rule applies here.
GSAguy123
This is slightly related....as of the past few weeks....Google is now banning the private, paid proxies I use for GSA. I get my proxies from buyproxies.org....and either they are all shit, or Google is getting better at banning proxies. What do?
Comments
Hey guys…..
I’m doing some scraping for GSA and its taking AGES to scrape targets…
I’m using footprints from Footprint factory so I have about 30k footprints. I’m running at max 1500 threads with 30 sec timeout…
I’m removing duplicates at scraping so I think that may be a reason its taking some time.
Here are my server details…
PowerUp
2 CPU
60 GB
2048 MB
Unmetered Bandwith
Any thoughts on why its taking so long to complete scrape?
30k footprints sounds like overkill.
example: 30k x 100 keywords = 3 million searches.
15k keywords
I have unlimited proxies from Gscraper
with that said, how long it takes depends on how many searches (keywords + footprints) you feed it. you can let it run for a few hours, or a day, or a week.. its really up to you.
if you streamline your footprints though, you don't need a lot of scraping to find most of the targets everyone else is using. 80/20 rule applies here.