Scrapebox is barely responding

Tixxpff · May 2014

I've just bought scrapebox and I couldn't help but notice that it is very slow when it comes to responding. I've imported a list of KWs (exactly 5k) and combined them with my GSA footprint list which resulted in a total of 7.795million keywords in the list.

Once I imported my KWs Scrapebox gets incredibly slow and barely responds and regularly crashes. Admittedly my VPS is not the best, but this really baffles me. Is the KW list too big? And how can the tool have problems responding to simple tasks, not even related to the amount of KWs?

Right now my footprints list is quite big (everything except Video and Video-Adult), because I'd like to get links for my upper tiers, as well as some crappy spam links for my lower tiers.

Any ideas?

satyr85 · May 2014

Scrapebox is not tool for bulk scrape. I faced same problems with Scrapebox (its really buggy), and no problems with Gscraper + Scrapebox can be 5-10 times slower than Gscraper.

I can scrape with Gscraper for X-XX days with no problems, Scrapebox would crash in this time several times. Scrapebox have several useful functions but when it comes to scraping it looks like tool for kids, not for serious marketers.

davbel · May 2014

@satyr85 total bollocks. Have you actually ever used Scrapebox? I guess not from your answers. In fact I can't believe how completely inaccurate your post was.

Gscraper might be better for scraping Google and Scrapebox has its issues, but if used properly Scrapebox is still a very good bulk scraper.

@tixxpff your issue is that SB struggles with anything more than a million. So a KW list of 8million will cause it no end of issues. You need to tailor your KWs/footprints so that you end up with less than 1m results.

To do this you need to split your KW & footprint lists into much smaller chunks, probably about 200-300 kw and one set of footprints at a time e.g. use articles only.

In total you want no more than a couple of 1000 kw/footprints per scrape. I normally aim for a lot less than that. Then run SB and leave it for a day. If you've got your footprints right, you 'll end up with hundreds of thousands of urls. If you end up with more than a 1m, then SB may crash, so if it does have a look at the Harvester_sessions dir in your SB dir and you'll find the URL list in there.

With the right settings and footprints you can use SB to keep SER busy

Tixxpff · May 2014

@davbel I'm actually stunned by how helpful and accurate your reply was. Thanks so much, mate. I already assumed that this might be because of the huge amount of KWs I'm using, but I couldn't quite figure out why, because I thought once they were loaded into SB there'd be no reason for it to still be struggling with the KWs.

Alright then, I guess tomorrow I'll sort out a couple my footprints. However, as of right now I don't really have a verified list and I'd like to change that.
Would you recommend going through every single platform one by one to stock up? I thought about using only very very few, but very popular KWs (10-50) and then combine them with all of my platform footprints to get a little bit of everything, you know? Then, once I've build a small verified list I can actually start ranking a couple of small projects with, I'd start scraping every single engine, one by one.

Thoughts?

satyr85 · May 2014

davbel Yes i was trying many times with Scrapebox and no succes because it was crashing, Gscraper crashed maybe 1-2 times and im using it over a year. I dont know what is bulk for you, for me its 100+ milion urls per day (before removing duplicates) - its possible with scrapebox but i would need mid-high XX instances. Too much work when one instance of Gscraper can do this.

davbel · May 2014

@Tixxpff I normally do it platform by platform and tend to focus on the contextuals, so Articles, Social Network and Wikis.

I do it in rotation and split the KW alphanumerically, so maybe a - d, then e - h etc etc and I also don't always use KWs - try using aa* ab* ac* ad* etc etc

You'll need to test and play around, but you'll soon get an idea of what works and what doesn't

You can also get SER building a list whilst you do this - set up a few dummy projects posting to everything with a made up target URL, duplicate them 5 or 10 or 20 times depending on your setup and set it going.

Once you've got a list started have a read of this thread from @Hinkys:
http://www.blackhatworld.com/blackhat-seo/black-hat-seo-tools/605958-tut-how-easily-build-huge-sites-lists-gsa-ser.html

Although it appears a bit complex, it can generate a lot of targets.

gooner · May 2014

@tixxpiff - I found SB performs much better with a smaller footprint/keyword list. Exactly as @davbel says.

Split your footprints down into smaller chunks and i think you will have better results.
Since i started doing that, SB never crashes for me and i can get several million results with no issues.

Tixxpff · May 2014

@davbel Thanks. I'll keep that in mind. For now I started a project with very few KWs, but all footprints, so I can get a little bit of everything. Once that project is done, I'll start scraping platform by platform.

What exactly did you mean by:
You can also get SER building a list whilst you do this - set up a few dummy projects posting to everything with a made up target URL, duplicate them 5 or 10 or 20 times depending on your setup and set it going.
Are you suggesting to create dummy projects and let SER scrape using a couple of engines to search for new targets to post to, instead of using a verified list?

@gooner Yes, that totally did the trick. It works so much better now and without any problems at all. Thanks mate.

edit:
2 follow up questions, since you guys seem like you know a thing or two about Scrapebox
If I stop the harvesting process, because I need to replace my proxies (I'm using public proxies) can I resume where I left off, or will SB start again from the very beginning with KW1 + Footprint1?

And secondly, what would you guys consider a good scraping speed (on a VPS with decent hardware. I'm not talking dedi server with one trillion CPU cores and terrabytes of RAM)? Right now I'm at ~35URLs/s. I'm using only 200 connections, because I'm testing playing around with the settings a little bit to find out what gives me the best performance.

davbel · May 2014

@Tixxpff yes, you can use SER to build a list too by doing what I described earlier. By doing this you'll be creating your own verified list which you can then add to by importing SB scrapes or 3rd party bought lists.

I've never stopped SB mid scrape and any time I have it has been right at the beginning when I've realised I'd forgotten something, so I just start again.

Your last question is a difficult one cos it depends on far too many factors - KWs, footprints, # of results, proxies, etc etc etc

fakenickahl · May 2014

It's very limited how much your hardware is going to impact your scraping speed, especially at such low volumes. It all depends on your proxies... and having Gscraper of course. Don't get me wrong, I love Scrapebox for all the addons and I regularly use Scrapebox, I would just never use it for scraping anymore. On the exact same proxy source with the same amount of threads, I'm scraping around 10-15 times faster with Gscraper than Scrapebox. Also the ability to leave Gscraper running for weeks without interfering is huge and something which is impossible with Scrapebox.

Tixxpff · May 2014

@fakenickahl Yes, I was kinda aware of that before I purchased SB. But because of all the very useful addons and because of its at least decent scraping speed, I decided to buy SB instead of GS. In the long run and once I've built a steady income and/or have upgraded to a better VPS/maybe even dedi I'll go with GScraper for sure.

Scrapebox is barely responding

Comments