Proxy Judge, SOCKS5 and other Proxy Question
Hi Sven,
It's a long time since the early days since I was power user of your scraper... I was recently researching some proxy stuff, for an own project where I need stable proxy in bulk and found out that GSA also developed now a Proxy Scraper The trial was not 5min installed and I bought it. GSA Software 4 ever
I was playing around with it the hole day and have now some questions:
1. In general, what settings do you recommend on a newer Core i7 with gbit connection and 16 GB RAM and would consider it as stable and reliable? I can't really rate what the optimal settings are and I don't want to burn a lot proxies with the wrong settings... What would you recommend?
2. I'm not 100% sure how to seriously test proxies and eliminate outer barriers. I saw it's possible to define custom tests. Assumed I've got a few domains/servers, whats the best way to test proxies? Is it making sense to install a own proxy judge? And if yes what's the best script and is there also one which is supporting SSL testing? (or any other more advanced stuff).
3. I've looked (for own ideas) into https://github.com/chrisiaut/proxycheck_script (yes a read it's primary for cli and not multithreaded) but I liked the idea with the checks it contains. For example I know there are evil proxies which are injecting own stuff into JS and stuff like this. So is it possible to verify this also on a own server or servers. What would work out?
4. I'm needing to access whois information and all bulk api's are sucking for some information I need to scrape - so I've to code my own solution as there is no "normal possibility" beside scraping the info myself. For this I know I need SOCKS5 proxies accessing Port 43. Is it possible to make these checks too? (without auto-exporting your apps information and do it then via PHP or another language)
5. I've quite a clean strong workstation without a huge mess. GSA PS crashes quite often - mostly when the GUI is visible after a few hours - does it make sense to send you debug lugs (and if yes which ones?) or are there recommended settings I have to take care for? (I'm using Win 10 Pro)
Thanks in advance for your answers (which could be also in German if you want). I'm happy to see that you're still so super active since years
Cheers,
BF
It's a long time since the early days since I was power user of your scraper... I was recently researching some proxy stuff, for an own project where I need stable proxy in bulk and found out that GSA also developed now a Proxy Scraper The trial was not 5min installed and I bought it. GSA Software 4 ever
I was playing around with it the hole day and have now some questions:
1. In general, what settings do you recommend on a newer Core i7 with gbit connection and 16 GB RAM and would consider it as stable and reliable? I can't really rate what the optimal settings are and I don't want to burn a lot proxies with the wrong settings... What would you recommend?
2. I'm not 100% sure how to seriously test proxies and eliminate outer barriers. I saw it's possible to define custom tests. Assumed I've got a few domains/servers, whats the best way to test proxies? Is it making sense to install a own proxy judge? And if yes what's the best script and is there also one which is supporting SSL testing? (or any other more advanced stuff).
3. I've looked (for own ideas) into https://github.com/chrisiaut/proxycheck_script (yes a read it's primary for cli and not multithreaded) but I liked the idea with the checks it contains. For example I know there are evil proxies which are injecting own stuff into JS and stuff like this. So is it possible to verify this also on a own server or servers. What would work out?
4. I'm needing to access whois information and all bulk api's are sucking for some information I need to scrape - so I've to code my own solution as there is no "normal possibility" beside scraping the info myself. For this I know I need SOCKS5 proxies accessing Port 43. Is it possible to make these checks too? (without auto-exporting your apps information and do it then via PHP or another language)
5. I've quite a clean strong workstation without a huge mess. GSA PS crashes quite often - mostly when the GUI is visible after a few hours - does it make sense to send you debug lugs (and if yes which ones?) or are there recommended settings I have to take care for? (I'm using Win 10 Pro)
Thanks in advance for your answers (which could be also in German if you want). I'm happy to see that you're still so super active since years
Cheers,
BF
Comments
Isn't the weakness of the whole proxy scanning topic that:
- I don't think it would be rocket science to develop a little script what is providing the necessary information - as long when I know what information is needed at all.
- But what existing "testing-templates" should be used (merged?) against it to max it out?
More advanced stuff could be, passing to the script for example the proxy itself to let the script in a success case directly-retest it (even for special use what could be controlled over passed GET-Parameters). But that's a step further, let's start with the basics - ensuring that they are elite.- Your proxy scraper detects if it can resolve a domain to IP or not. The most SOCKS5 I found can't. But what's the conclusion out of this? Can these proxies only used for direct IP access or how can I understand it? If this is the case: Please make the list sort/filterable also by this "feature".
- In the settings I can define what "test-templates" to use. But what are at the end the results for example if I choose multiple for anonymity? Are all of them then needed to pass to generate a success message or is it enough if one passes (and it's only a fallback)?
- The reliability seems for me a very important factor. But manual "test-sessions" have consequences for the stats right? So how can I for example maintain my pool of longer-lasting public proxies but also scan them from time to time for niche use - without affecting their general stats?
- In general for manual testing the test templates can be selected - but if I want to separate the multiple test-template-results, is it only possible to assign tags to it in success case?
Sorry for this ton of questions.. But I see again with a GSA tool a lot potential and customized possibilities. Some of the behaviour isn't 100% clear to me. GSA tools are normally to powerful to not use them really and get the feeling for the important details. Some more detailed answers or links (but I tried to research my questions before posting already), would help me to really use itThanks for any input in advance. Let me know if some of my questions weren't clear.
2) Until now I didn't saw the need to setup my own proxy judge. There are plenty out there and if you use more than one test it will perform until it gets a result. test against ssl is easy as long as you use https:// as test url.
3) things like that can be setup as well. You can define your own tags and the test script could check for certain things on the output and set the tag or not. just have a look at some test scripts
4) You mean you want to check a proxy against something else than just a webpoge? Yes thats also possible. I have made a smtp-test script for checking a proxy being able to perform email verifications on gmail. You can use that as a reference to script your own tests.
5) PS should never crash, so yes Im interested in everything that helps finding and fixing bugs.
They are public and so accessible by anyone who finds it. That means eventually that proxies is getting used by a lot of people which makes it unstable and slow according to how many people use it.
Another problem is the dual usage of such a proxy on the same website e.g. google by many people. That is of course causing it to be banned temporary or longer making it useless for that site or service.
----
Just check the test scripts that give you the wanted data. You can check more than one of the same type (e.g. Anonymous level) and the tool will just check till it get's a reply and skip the rest if they would not add more details to the proxy.
----
If a proxy can not resolve the IP to a domain, then it will not work for some apps who only deliver the domain instead of the IP. Some let you configure it, but most assume it's possible. I will try adding a sort by this proxy-information in next update.
----
Reliability: This is just keeping record of tests made and the amount of success/failed test and calculated the reliability from it. You can test any time without influencing that data too much.
----
Yes, a tag can only be assigned if a test was successful.
2. i have no preferred one really. as long as they just give back all the headers sent to the page all is fine.
3. no docu as noone asked for it by now. I can maybe add something on docu.gsa-online.de if there is a need for it. the custom script with reply checking might be a bit to explain things, the rest (normal http-checking) is easy to understand by viewing current scripts
4. sounds ok to me and easy to do with the current scripting
5. no difference when being minimized of not / import from file works very well. Make sure your file is encoded correctly (just ansi). If it tells you it had nothing imported then because there is all in I can not find any decent new sources for a longer time now. keep in mind that new URLs are just checked against added sub.domains.
6. ok
1. "due to laggs and timeouts" - Exactly my point. But as all resources are available (speed, hardware) I can't evaluate if this happens or not.
4. I'm maybe overlooking something extremely simple, but what existing script could be used as base to do this? Sounds fantastic that this not hard.
5. Minimized vs. non: Strange, but as long as it works stable now I'm fine (know this sounds strange, but it was my reality and without any user actions next to it). Found the problem, it's UTF-8 encoded... (Standard in Sublime Text).
---
proxy judge: thats why you should select more than one judge in the settings. PS will take one randomly and if that failes moves on to the next.