Scrapebox - The Proxy Question

Tixxpff · May 2014

Hi guys,
first of all, I do realise there's a forum dedicated to Scrapebox, but the answers I got there were hardly satisfying, so I decided to rely on you guys

I've recently started scraping by myself and bought Scrapebox and went with a public proxy service. The service is 20$/month and I get 2x ~15k proxies every day. And this is exactly where the problem is.

The public proxies I get burn out pretty fast and therefore I'll have to exchange them at least once every 24h. This is quite annoying and costs a lot of time, because I have to abort the scrape, export unfinished KWs, reload new proxies, load the KWs, etc. And this issue makes it pretty much impossible for me to do big scrapes.

So my question is - Would you guys suggest getting ~40 semi dedicated proxies for scraping and maybe reduce the simultaneous connections per proxy to avoid proxy bans, or is there like a minimum amount for private proxies if I wanted to use them for scraping and should I therefore stay with my public proxy service?
I've read a couple of guides and some even considered amounts as low as 25 proxies to be enough for beginners.

Any suggestions?

gooner · May 2014

If you don't use the proxies for scraping in SER then you can use them in SB.

I have 100 dedicated and i use them in SB 24/7 with 300 threads and never have any problems (They are being used on 3 servers for SER too).

The only problem is yours are semi-dedicated, so other people might be getting them banned already.

fakenickahl · May 2014

Changing proxies is a lot easier in Gscraper. You can just make it re-import a list of proxies every x minutes, and when you get an updated list, you just replace the file from which Gscraper imports from. No down time or other work required.

And @gooner are you really able to scrape with decent speeds on just 100 proxies 24/7? I would have imagined they'd get burnt after a couple of minutes at 300 threads. Are you avoiding footprints with advanced search operators by any chance?

Tixxpff · May 2014

@gooner "If you don't use the proxies for scraping in SER then you can use them in SB."
Well right now I've reduced my proxy package to 10 proxies, because my projects are rather small and I wanted to save some costs.

But I do see your point though. I guess semi private proxies work just fine for SER submissions, but if I only have like ~30-50 semi dedicated proxies and a couple of these get bans, even if it's just a temp. ban then this will severely affect my scraping progress and I might have to stop it, just like I'd have to do it with my public proxy lists right now.

So I guess what you're saying is if I wanted to use private proxies for scraping I should go for fully private proxies and not semi private, right?

@fakenickahl Yes, I've heard about GScraper's proxy handling. Much better than SB. I actually can't believe that a tool as great as SB handles its proxies so incredibly bad. Well, I guess I'll have to live with it until I can afford GScraper or fully private proxies.

Thanks for the input guys.

gooner · May 2014

@fakenickahl - You made a very good point and something i forgot, it does depend on what you scrape and yes i am not using footprints in SB.

I forgot to mention that, @tixxpff ignore what i said before. Under normal circumstances those proxies would be burnt out in SB.

@fakenickahl is absolutely right there.

satyr85 · May 2014

There are few problems with proxy lists:

- list is shared between X users so proxies will die fast.

- proxies are public so everyone can find them.

- most sellers from bhw give you proxies with short life time.

- your proxy seller give you not filtered list. From this 15k i think max 5k is working.

Because of that only way to benefit from lists of proxies is scrape as fast as you can and as much as you can before proxies dies. Scrapebox is slow scraper and thats why you can get not satysfing results with proxy lists and SB.

Only way to go with proxy lists in my opinion is Gscraper. Paid version can:

- change and retry proxy when proxy is dead (it dont retry unlimited times so when all your proxies are suspended some keywords can be skipped).

- delete used keywords when scraping ends.

- harvest few times faster than SB so you can benefit from proxies lists.

- remove duplicates while scraping.

- you dont have to stop scrape to import new proxy list.

You can download Gscraper demo (its old and dont have same functionality like paid version) and see how many results you will get with SB and Gscraper in 1-2 hours.

Tixxpff · May 2014

Alright, I thought I'd give this thread one final comment, just in case anyone ever stumbles upon it and wonders what I ended up doing.

I tried 2 proxy services (which I won't name here, but I can give them out via PM if requested).

One was giving out daily proxy updates, which required me to stop my projects, export my KWs, import the new proxies, reimport my KWs and continue scraping. I guess this method does work, but it wasn't my preferred choice.

Service 2 was giving out a (small) list of proxies through which you would connect to a huge pool of proxies which are constantly updated. So pretty much the same as service 1, but without the annoying proxy changing.

I guess it doesn't really matter for GScraper, since you can import proxies on the fly, but for Scrapebox reimporting proxies and KWs was a huge pain in the ass.

Scrapebox - The Proxy Question

Comments