Proxy Judge, SOCKS5 and other Proxy Question

Hi Sven,

It's a long time since the early days since I was power user of your scraper... I was recently researching some proxy stuff, for an own project where I need stable proxy in bulk and found out that GSA also developed now a Proxy Scraper :) The trial was not 5min installed and I bought it. GSA Software 4 ever :)

I was playing around with it the hole day and have now some questions:

1. In general, what settings do you recommend on a newer Core i7 with gbit connection and 16 GB RAM and would consider it as stable and reliable? I can't really rate what the optimal settings are and I don't want to burn a lot proxies with the wrong settings... What would you recommend?

2. I'm not 100% sure how to seriously test proxies and eliminate outer barriers. I saw it's possible to define custom tests. Assumed I've got a few domains/servers, whats the best way to test proxies? Is it making sense to install a own proxy judge? And if yes what's the best script and is there also one which is supporting SSL testing? (or any other more advanced stuff).

3. I've looked (for own ideas) into https://github.com/chrisiaut/proxycheck_script (yes a read it's primary for cli and not multithreaded) but I liked the idea with the checks it contains. For example I know there are evil proxies which are injecting own stuff into JS and stuff like this. So is it possible to verify this also on a own server or servers. What would work out?

4. I'm needing to access whois information and all bulk api's are sucking for some information I need to scrape - so I've to code my own solution as there is no "normal possibility" beside scraping the info myself. For this I know I need SOCKS5 proxies accessing Port 43. Is it possible to make these checks too? (without auto-exporting your apps information and do it then via PHP or another language)

5. I've quite a clean strong workstation without a huge mess. GSA PS crashes quite often - mostly when the GUI is visible after a few hours - does it make sense to send you debug lugs (and if yes which ones?) or are there recommended settings I have to take care for? (I'm using Win 10 Pro)

Thanks in advance for your answers (which could be also in German if you want). I'm happy to see that you're still so super active since years :)

Cheers,
BF

Comments

  • Some additional thoughts and questions:

    Isn't the weakness of the whole proxy scanning topic that:

    • Public proxies are very unstable and unreliable
    • At the end, it only makes sense to use them niche specific
    So my targets are:
    • Only Elite proxies, everything else doesn't make sense
    • SSL proxies are prefered, because a lot stuff is behind HTTPS
    • When non HTTP stuff needs to be accessed only SOCKS5 make sense (SOCKS4 for example not because they don't provide the same "features")
    • Using them for Google or other SE isn't important for me
    The proxy testing alone is in general very moment dependent and I think it's highly important to eliminate at least the barrier of the counterpart (to which the proxies are tested against it). So my conclusion is the counter-part needs to be under own control and need to check as much as possible/provide the most debug information possible.
    • I don't think it would be rocket science to develop a little script what is providing the necessary information - as long when I know what information is needed at all.
    • But what existing "testing-templates" should be used (merged?) against it to max it out?
    More advanced stuff could be, passing to the script for example the proxy itself to let the script in a success case directly-retest it (even for special use what could be controlled over passed GET-Parameters). But that's a step further, let's start with the basics - ensuring that they are elite.

    Next to this the following is a little bit unclear:
    • Your proxy scraper detects if it can resolve a domain to IP or not. The most SOCKS5 I found can't. But what's the conclusion out of this? Can these proxies only used for direct IP access or how can I understand it? If this is the case: Please make the list sort/filterable also by this "feature".
    • In the settings I can define what "test-templates" to use. But what are at the end the results for example if I choose multiple for anonymity? Are all of them then needed to pass to generate a success message or is it enough if one passes (and it's only a fallback)?
    • The reliability seems for me a very important factor. But manual "test-sessions" have consequences for the stats right? So how can I for example maintain my pool of longer-lasting public proxies but also scan them from time to time for niche use - without affecting their general stats?
    • In general for manual testing the test templates can be selected - but if I want to separate the multiple test-template-results, is it only possible to assign tags to it in success case?
    Sorry for this ton of questions.. But I see again with a GSA tool a lot potential and customized possibilities. Some of the behaviour isn't 100% clear to me. GSA tools are normally to powerful to not use them really and get the feeling for the important details. Some more detailed answers or links (but I tried to research my questions before posting already), would help me to really use it :)

    Thanks for any input in advance. Let me know if some of my questions weren't clear.
  • SvenSven www.GSA-Online.de
    1) Well hard to say what the best options are here. I would keep it as the default setup and maybe increase threads a bit to something that keeps memory in a reasonable amount (below 500mb).

    2) Until now I didn't saw the need to setup my own proxy judge. There are plenty out there and if you use more than one test it will perform until it gets a result. test against ssl is easy as long as you use https:// as test url.

    3) things like that can be setup as well. You can define your own tags and the test script could check for certain things on the output and set the tag or not. just have a look at some test scripts

    4) You mean you want to check a proxy against something else than just a webpoge? Yes thats also possible. I have made a smtp-test script for checking a proxy being able to perform email verifications on gmail. You can use that as a reference to script your own tests.

    5) PS should never crash, so yes Im interested in everything that helps finding and fixing bugs.

  • SvenSven www.GSA-Online.de
    Weakness of public proxies:
    They are public and so accessible by anyone who finds it. That means eventually that proxies is getting used by a lot of people which makes it unstable and slow according to how many people use it.

    Another problem is the dual usage of such a proxy on the same website e.g. google by many people. That is of course causing it to be banned temporary or longer making it useless for that site or service.
    ----
    Just check the test scripts that give you the wanted data. You can check more than one of the same type (e.g. Anonymous level) and the tool will just check till it get's a reply and skip the rest if they would not add more details to the proxy.
    ----
    If a proxy can not resolve the IP to a domain, then it will not work for some apps who only deliver the domain instead of the IP. Some let you configure it, but most assume it's possible. I will try adding a sort by this proxy-information in next update.
    ----
    Reliability: This is just keeping record of tests made and the amount of success/failed test and calculated the reliability from it. You can test any time without influencing that data too much.
    ----
    Yes, a tag can only be assigned if a test was successful.
  • First of all thank you for your answer/time :)

    1. Even with 300 connections and 1.3 mio proxies in the list it consumes lower than 400 mb. I saw screenshots from other persons with 300 threads too. But I can't really rate if it is working or not (beside if it's crashing). The bandwith it uses is anyway very small and with 1 Gbit I'm more than sufficient. Did you ever test it yourself? What are the indicators that too many threads are used and the tests aren't working correctly anymore?

    2. I'm maybe a little bit too paranoid about this, but when I think that a lot people are hitting the same judges with a lot connections, it's highly possible that sooner or later they have problems. As everything is anyway unstable with public proxies, I think it would at least eliminate this factor. What are your prefered proxy judges?

    3. Is there any more advanced documentation about the testing-scripts? For me especially SOCKS5, dns resolution and HTTPS is important. What would you choose for this?

    4. In fact I want to test it, if it can connect to port 43 and reads Whois information. This isn't possible with web proxies and SOCKS5 are needed. But I don't see a possibility to check it other than accessing a script, passing the proxy information to it and then return if it worked or not. Do you see better possbilities?

    5. Since I wrote it, it never crashed again without changing or rebooting.. The only difference I made is hiding GSA SER to the tray area. At the beginning I was very often looking at it for a longer time and then the GUI crashed. Is there a difference how you handle GSA PS resources if it's visible or if it's minimized? Another ting is that I can't import proxies by file.. Is this really working? At the beginning I was thinking it's maybe the format but with "use proxies from clipboard" it imports them without problems. The same problem (if no special format is needed) I saw with proxy sources. I try to import the file, it tells me about how many it already had and that 0 were imported (with my over > 10K URL list). I also tried to use it as administrator. What additional debug information can I provide?

    6. "Just check the test scripts".. Thanks that helped. Now I got how it works :)

    7. "Try to sort this information"... Sounds very nice :) I was in general surprised how many can't really resolve DNS. Especially SOCKS5.
  • SvenSven www.GSA-Online.de
    1. hard to say. i have not a good line to use 300 threads but i see no reason why the tool should crash at all. less proxies woul dget found i guess due to laggs and timeouts.

    2. i have no preferred one really. as long as they just give back all the headers sent to the page all is fine.

    3. no docu as noone asked for it by now. I can maybe add something on docu.gsa-online.de if there is a need for it. the custom script with reply checking might be a bit to explain things, the rest (normal http-checking) is easy to understand by viewing current scripts

    4. sounds ok to me and easy to do with the current scripting

    5. no difference when being minimized of not / import from file works very well. Make sure your file is encoded correctly (just ansi). If it tells you it had nothing imported then because there is all in ;) I can not find any decent new sources for a longer time now. keep in mind that new URLs are just checked against added sub.domains.

    6. ok

  • Thanks again :)

    1. "due to laggs and timeouts" - Exactly my point. But as all resources are available (speed, hardware) I can't evaluate if this happens or not.

    2. Got it

    3. Exactly the custom part would be interesting.

    4. I'm maybe overlooking something extremely simple, but what existing script could be used as base to do this? Sounds fantastic that this not hard.

    5. Minimized vs. non: Strange, but as long as it works stable now I'm fine :) (know this sounds strange, but it was my reality and without any user actions next to it). Found the problem, it's UTF-8 encoded... (Standard in Sublime Text).
  • edited October 2016
    Some addition related to Proxy Judge: I quote "http://www.proxynova.com/proxy-articles/list-of-proxy-judges/

    "Setting up a proxy judge is very easy and it can cost nothing so there is no shortage of proxy judges on the Internet and finding them is easy. The problem is that most of the public proxy judges are unreliable because they are hosted on slow or unstable servers, so if a proxy judge gets too popular, then the server slows down or even shuts down. If that happens, then tests done with a proxy judge will either fail or timeout marking the proxy server as dead even if it's alive."
  • SvenSven www.GSA-Online.de
    4. see test_data\Email Verification (gmail.com).ini
    ---
    proxy judge: thats why you should select more than one judge in the settings. PS will take one randomly and if that failes moves on to the next.
Sign In or Register to comment.