Skip to content

what type of proxy do you guys use for scraping?

I use dedicated proxies but it gets banned so quickly!

so I wonder what proxy type / company do you guys use to scrape on GSA products? (mainly on GSA Keyword research)

Comments

  • edited August 2022
    Dedicated proxies are fine for scraping. Post these things:
    • How many proxies you have
    • Your settings, such as connections number and delay.
    I will help you learn how to set them properly depending on the amount of proxies you have.
    Thanked by 1jonny23
  • Dedicated proxies are fine for scraping. Post these things:
    • How many proxies you have
    • Your settings, such as connections number and delay.
    I will help you learn how to set them properly depending on the amount of proxies you have.

    thanks a lot for kind reply!

    I got 20 dedicated proxies for scraping.
    connection only 20 and delay 60.
    I guess this is very safe setting but I get blocked after 2~3 times of scraping 10~20 keywords.



  • scraping 10~20 keywords = I mean getting Meta info or full anaylsis for 10~20 keywords on GSA keyword research
  • edited August 2022
    If you have 20 proxies you need to reduce connection to 1. Set delay to 10 seconds. This should keep you from getting banned.
    I came up with this number because from my tests you can scrape google 30x per hour with at least a 10 second delay as long as you have multiple proxies that you can rotate through. If you have only one proxy you have to set a long delay. So its just math:
    20(proxy number) x 30 (scrapes per proxy per hour) = 600 / 60 (minutes in an hour) = 10 (delay in seconds required)
    Thanked by 1jonny23
  • If you have 20 proxies you need to reduce connection to 1. Set delay set to 10 seconds. This should keep you from getting banned.
    I came up with this number because from my tests you can scrape google 30x per hour with at least a 10 second delay as long as you have multiple proxies that you can rotate through. If you have only one proxy you have to set a long delay. So its just math:
    20(proxy number) x 30 (scrapes per proxy per hour) = 600 / 60 (minutes in an hour) = 10 (delay in seconds required)

    really appreciate! will try this out!
  • edited August 2022
    If you keep getting banned I suggest that you buy at least 60 proxies for scraping. The more the better because you can scrape faster with less delay. The proxies hit google less as well since theres more to rotate through. Scraping is a brute force venture these days
  • If you keep getting banned I suggest that you buy at least 60 proxies for scraping. The more the better because you can scrape faster with less delay. The proxies hit google less as well since theres more to rotate through. Scraping is a brute force venture these days

    alright! I will try the setting and increase the proxies
  • just 1 more question, you set proxy retry?
    if so, how many do you do?
  • I don't like to do that because it will likely start getting other proxies banned. That interferes with the math. If you need to do proxy retry then you need to adjust connections, timeout, or both.
    Thanked by 1jonny23
  • edited August 2022
    You know what - The math I shared is incorrect for this application. I apologize. That math is for finding the amount of connections to make simultaneously for a faster style of scraping. I got confused for a second because there are two formulas I use for scraping (and I don't have to do this often because my set up has been the same for years..)
    That math above says with 20 proxies you can do 10 connections with a delay of 1 minute (So that each proxy only scrapes 30x an hour or less).
    With the settings I told you to do originally you definitely wont get banned, but its going to be slow scraping. I prefer to scrape slow these days I get much better results.
    So to recap, No more than 30 requests per hour, per proxy, and each proxy must have a delay before hitting google again. That is why with the formula I gave you the sum at the end is 10, because (hopefully) after the delay duration is up, the other 10 proxies will be used, providing even more delay for the 10 proxies you used the first time.
    Proxies still get banned like this however. I use the single thread method now when scraping with dedicated proxies. Much better results.
    Sorry about that, I haven't thought about the math in a long time!


    Thanked by 1jonny23
  • You know what - The math I shared is incorrect for this application. I apologize. That math is for finding the amount of connections to make simultaneously for a faster style of scraping. I got confused for a second because there are two formulas I use for scraping (and I don't have to do this often because my set up has been the same for years..)
    That math above says with 20 proxies you can do 10 connections with a delay of 1 minute (So that each proxy only scrapes 30x an hour or less).
    With the settings I told you to do originally you definitely wont get banned, but its going to be slow scraping. I prefer to scrape slow these days I get much better results.
    So to recap, No more than 30 requests per hour, per proxy, and each proxy must have a delay before hitting google again. That is why with the formula I gave you the sum at the end is 10, because (hopefully) after the delay duration is up, the other 10 proxies will be used, providing even more delay for the 10 proxies you used the first time.
    Proxies still get banned like this however. I use the single thread method now when scraping with dedicated proxies. Much better results.
    Sorry about that, I haven't thought about the math in a long time!



    thanks a lot! I will take this in note as well ;)

Sign In or Register to comment.