Skip to content

GSA PROXY SCRAPER: Does it matter which proxy types you scrape with?

I wonder if it matters what proxy types you use when scraping URLs with Scrapebox for expired domains? 
Which one works best etc? 

Comments

  • SvenSven www.GSA-Online.de
    socks proxies are always best as they usually do not change there anonymous level, for web proxies you never know.
  • You mentioned anonymous level, I have zero knowledge about proxies so therefore curious as why it should matter that you want to scrape with socks proxies. 

    Whats the different compared to using the others? I'm basically scraping directories for expired domains, thats the purpose.
  • SvenSven www.GSA-Online.de
    using a proxy which will leak your real IP can be considered as transparent proxy. Thats something you do not want as the websites might block you then as well when they see you connecting from different proxies with same real ip in header.

    so using socks only proxies you can be sure that the ip is never leaked as the protocol of a socks proxy does not insert extra data to your sent header.
  • kxpkxp United States
    @antonearn - Sites will often try to protect themselves from scraping for a few reasons:

    - They don't want their content stolen (and possibly republished elsewhere)
    - Scraping takes up bandwidth from them.
    - Scraping, if done aggressively enough, can take their website down.

    So they use some measures to avoid clearly automated "attacks." One of those measures is IP blocking. Meaning, they will temporarily block IPs that seem a bit too aggressive.

    Many sites have no protection in place, but others might use Cloudflare, or even some security policies in their web server software to protect themselves. That's where proxies come in. By constantly changing the IP of your connection, they can't block you because they don't know which IP is you, and which IP is a new visitor.

    Proxy Anonymity

    Now, some proxies leak your originating IP. There are different levels of proxy anonymity. Transparent proxies give the server your IP. Anonymous Level 2 proxies hide your IP, but the server can tell the connection is coming from a proxy server, they just don't know what the IP is behind that proxy. Elite Level 1 Proxies hide your IP and provide no hint that a proxy server is involved in the visit at all.

    SOCKS vs HTTP

    This is kind of a big subject as far as explaining and it's pretty technical, so here's a good resource if you want to learn about the differences:

    http://ghostproxies.com/blog/2016/04/difference-http-socks-proxies/

    Scraping Directories

    To answer your question, if you're scraping directories, you might be able to get away with not using proxies, but if the directory is using any sort of protection, they'll probably temporarily block you, then you'll have to use proxies. I generally use proxies for everything, but that's not necessarily the right way to do things, that's just how I do it. The trade off with proxies is speed (proxies are slower than directly connecting), and if you're using public proxies, they often suck and burn out quickly because so many other people are using them.
Sign In or Register to comment.