If you have 20 proxies you need to reduce connection to 1. Set delay to 10 seconds. This should keep you from getting banned.
I came up with this number because from my tests you can scrape google 30x per hour with at least a 10 second delay as long as you have multiple proxies that you can rotate through. If you have only one proxy you have to set a long delay. So its just math:
20(proxy number) x 30 (scrapes per proxy per hour) = 600 / 60 (minutes in an hour) = 10 (delay in seconds required)
If you have 20 proxies you need to reduce connection to 1. Set delay set to 10 seconds. This should keep you from getting banned.
I came up with this number because from my tests you can scrape google 30x per hour with at least a 10 second delay as long as you have multiple proxies that you can rotate through. If you have only one proxy you have to set a long delay. So its just math:
20(proxy number) x 30 (scrapes per proxy per hour) = 600 / 60 (minutes in an hour) = 10 (delay in seconds required)
If you keep getting banned I suggest that you buy at least 60 proxies for scraping. The more the better because you can scrape faster with less delay. The proxies hit google less as well since theres more to rotate through. Scraping is a brute force venture these days
If you keep getting banned I suggest that you buy at least 60 proxies for scraping. The more the better because you can scrape faster with less delay. The proxies hit google less as well since theres more to rotate through. Scraping is a brute force venture these days
alright! I will try the setting and increase the proxies
I don't like to do that because it will likely start getting other proxies banned. That interferes with the math. If you need to do proxy retry then you need to adjust connections, timeout, or both.
You know what - The math I shared is incorrect for this application. I apologize. That math is for finding the amount of connections to make simultaneously for a faster style of scraping. I got confused for a second because there are two formulas I use for scraping (and I don't have to do this often because my set up has been the same for years..)
That math above says with 20 proxies you can do 10 connections with a delay of 1 minute (So that each proxy only scrapes 30x an hour or less).
With the settings I told you to do originally you definitely wont get banned, but its going to be slow scraping. I prefer to scrape slow these days I get much better results.
So to recap, No more than 30 requests per hour, per proxy, and each proxy must have a delay before hitting google again. That is why with the formula I gave you the sum at the end is 10, because (hopefully) after the delay duration is up, the other 10 proxies will be used, providing even more delay for the 10 proxies you used the first time.
Proxies still get banned like this however. I use the single thread method now when scraping with dedicated proxies. Much better results.
Sorry about that, I haven't thought about the math in a long time!
You know what - The math I shared is incorrect for this application. I apologize. That math is for finding the amount of connections to make simultaneously for a faster style of scraping. I got confused for a second because there are two formulas I use for scraping (and I don't have to do this often because my set up has been the same for years..)
That math above says with 20 proxies you can do 10 connections with a delay of 1 minute (So that each proxy only scrapes 30x an hour or less).
With the settings I told you to do originally you definitely wont get banned, but its going to be slow scraping. I prefer to scrape slow these days I get much better results.
So to recap, No more than 30 requests per hour, per proxy, and each proxy must have a delay before hitting google again. That is why with the formula I gave you the sum at the end is 10, because (hopefully) after the delay duration is up, the other 10 proxies will be used, providing even more delay for the 10 proxies you used the first time.
Proxies still get banned like this however. I use the single thread method now when scraping with dedicated proxies. Much better results.
Sorry about that, I haven't thought about the math in a long time!
Comments
really appreciate! will try this out!
alright! I will try the setting and increase the proxies