Scrapebox is barely responding
I've just bought scrapebox and I couldn't help but notice that it is very slow when it comes to responding. I've imported a list of KWs (exactly 5k) and combined them with my GSA footprint list which resulted in a total of 7.795million keywords in the list.
Once I imported my KWs Scrapebox gets incredibly slow and barely responds and regularly crashes. Admittedly my VPS is not the best, but this really baffles me. Is the KW list too big? And how can the tool have problems responding to simple tasks, not even related to the amount of KWs?
Right now my footprints list is quite big (everything except Video and Video-Adult), because I'd like to get links for my upper tiers, as well as some crappy spam links for my lower tiers.
Any ideas?
Once I imported my KWs Scrapebox gets incredibly slow and barely responds and regularly crashes. Admittedly my VPS is not the best, but this really baffles me. Is the KW list too big? And how can the tool have problems responding to simple tasks, not even related to the amount of KWs?
Right now my footprints list is quite big (everything except Video and Video-Adult), because I'd like to get links for my upper tiers, as well as some crappy spam links for my lower tiers.
Any ideas?
Comments
Gscraper might be better for scraping Google and Scrapebox has its issues, but if used properly Scrapebox is still a very good bulk scraper.
@tixxpff your issue is that SB struggles with anything more than a million. So a KW list of 8million will cause it no end of issues. You need to tailor your KWs/footprints so that you end up with less than 1m results.
To do this you need to split your KW & footprint lists into much smaller chunks, probably about 200-300 kw and one set of footprints at a time e.g. use articles only.
In total you want no more than a couple of 1000 kw/footprints per scrape. I normally aim for a lot less than that. Then run SB and leave it for a day. If you've got your footprints right, you 'll end up with hundreds of thousands of urls. If you end up with more than a 1m, then SB may crash, so if it does have a look at the Harvester_sessions dir in your SB dir and you'll find the URL list in there.
With the right settings and footprints you can use SB to keep SER busy
Alright then, I guess tomorrow I'll sort out a couple my footprints. However, as of right now I don't really have a verified list and I'd like to change that.
Would you recommend going through every single platform one by one to stock up? I thought about using only very very few, but very popular KWs (10-50) and then combine them with all of my platform footprints to get a little bit of everything, you know? Then, once I've build a small verified list I can actually start ranking a couple of small projects with, I'd start scraping every single engine, one by one.
Thoughts?
I do it in rotation and split the KW alphanumerically, so maybe a - d, then e - h etc etc and I also don't always use KWs - try using aa* ab* ac* ad* etc etc
You'll need to test and play around, but you'll soon get an idea of what works and what doesn't
You can also get SER building a list whilst you do this - set up a few dummy projects posting to everything with a made up target URL, duplicate them 5 or 10 or 20 times depending on your setup and set it going.
Once you've got a list started have a read of this thread from @Hinkys:
http://www.blackhatworld.com/blackhat-seo/black-hat-seo-tools/605958-tut-how-easily-build-huge-sites-lists-gsa-ser.html
Although it appears a bit complex, it can generate a lot of targets.
Split your footprints down into smaller chunks and i think you will have better results.
Since i started doing that, SB never crashes for me and i can get several million results with no issues.
What exactly did you mean by:
You can also get SER building a list whilst you do this - set up a few dummy projects posting to everything with a made up target URL, duplicate them 5 or 10 or 20 times depending on your setup and set it going.
Are you suggesting to create dummy projects and let SER scrape using a couple of engines to search for new targets to post to, instead of using a verified list?
@gooner Yes, that totally did the trick. It works so much better now and without any problems at all. Thanks mate.
edit:
2 follow up questions, since you guys seem like you know a thing or two about Scrapebox
If I stop the harvesting process, because I need to replace my proxies (I'm using public proxies) can I resume where I left off, or will SB start again from the very beginning with KW1 + Footprint1?
And secondly, what would you guys consider a good scraping speed (on a VPS with decent hardware. I'm not talking dedi server with one trillion CPU cores and terrabytes of RAM)? Right now I'm at ~35URLs/s. I'm using only 200 connections, because I'm testing playing around with the settings a little bit to find out what gives me the best performance.
I've never stopped SB mid scrape and any time I have it has been right at the beginning when I've realised I'd forgotten something, so I just start again.
Your last question is a difficult one cos it depends on far too many factors - KWs, footprints, # of results, proxies, etc etc etc