Skip to content

Internal Proxy Server

Tools like ScrapeBox can not handle CONNECT proxies and so you will see different results when using them in that software.
1. Internal Proxy Server

In GSA Proxy Scraper you have the option to enable it's own internal proxy server in options (off by default). Once you have it running, it will allow you to use the proxy with IP 127.0.0.1 and Port 8080 (default values are changeable) in every other software. So adding 127.0.0.1:8080 as a proxy will allow other tools (not only GSA software) to make use of all proxies within GSA Proxy Scraper.

- - - - - - - - - - - - - -

@Sven,

So best settings for ScrapeBox seem to be:

- Use Internal Proxy server
- Uncheck CONNECT proxies
- Insert server IP into ScrapeBox
- Have GSA re-scan / test every X minutes and remove bad proxies

Uses:
The proxies will be used for scraping domains, so for example, loading the internal links of a lot of pages, then afterwards loading the external links in attempt to find domains.

Filters:
Do not accept anonymous (no elite) proxies? Do no-elite proxies have a higher potential of leaking IP? I'm running this from my home ISP, so I want to make sure I don't get any phone calls asking if I'm running a "botnet". Or are regular anonymous proxies safe because they also don't send your real IP?

Do not accept transparent proxies (CHECK)
Skip suspicious proxies (CHECK)
Skip the following IP-Ranges (CHECK) - I'm in the USA, don't want to take any risks with hitting a honeypot.

- - - - - - - - - - - - - -

Should I use NoBlock mode? SB + GSA Proxy Scraper are running on a brand new i5-8400 machine with Windows 7.

It's not necessary to "Set proxy in browser" being I'm doing no web browsing on this PC, right?

- - - - - - - - - - - - - -

Lastly, during my trial of GSA Proxy Scraper, I did notice that ScrapeBox had a lot of "This proxy leaks your IP". However, I didn't read about the CONNECT proxies until just now. Is this what was causing this error?

Thanks a lot!!

Comments

  • edited June 2018
    On this note, how do I specify when GSA PS re-tests the proxies? I did a manual re-test and it found a lot that were dead. Does it only re-test after it's gone through the entire "grab and test" phase? I don't see any option for how often to have the proxies re-tested.

    Hmm well that's not very promising. GSA PS just crashed after doing a small test scrape with SB, now it's re-testing all of the 60k proxies I imported, and saved no data about previous testing... Sent bug report with my email.
  • edited June 2018
    Starting to get the hang of the software, used for first SB scrape. It worked and automatically disabled proxies that failed (which ended up being a lot). Curious if there's any particular test in addition to Anonymous to get an idea of what proxies may work for pulling data off websites. I.e. "whatismyipaddress.com" or some other basic test?

    Also. I'm having a really weird bug that's happened multiple times now. The program seems to have some "memory glitch" (memory leak?)? Where it's like frozen in a state of not being able to do anything. I vaguely remember GSA SER having this issue way back in the day. For example, if I press "Quit" on GSA PS, if it's in this "memory glitch" state, it will exit the threads but refuse to quit the program.

    I have to ctrl+alt+delete, which upon re-opening, it's seemingly forgotten the state of the proxies, and draws from a previous state (but the options are saved).
  • SvenSven www.GSA-Online.de
    so many question so please hold your breath ;)
    1. When making use of the internal proxy of GSA Proxy Scraper (PS), there is no need to disable CONNECT proxies. Whatever Scrapebox tries as a protocol to communicate with PS will work. It will translate that protocol and even use the CONNECT proxies.

    2. The problem with using the internal proxy server from other tools is the following: It is seen by the tools as only one proxy even though many are used behind that one. So it might slow down things a lot as tools will have to wait between requests to not get banned. If possible, try to import the found proxies to the tools.

    3. Depending on how you use the proxies for, you might have to test them differently. When used in ScrapeBox, you might want to make sure they pass the google-test if used for search queries.

    4. If you have a lot proxies working on bing/port 80, make sure they are not some kind of bing-cache servers that only will work for bing and nothing else.

    memory glitch: things like that should really not happen. Maybe its due to the heavy load you do with that internal proxy?

  • edited June 2018
    1) Oh, excellent.

    2) I'd rather use the Internal Proxy Server with Scrapebox because this system will be running pretty much 24/7 and I don't want to (and am not able to) be constantly testing/importing proxies into SB. I have GSA PS running at 100-150 threads, and SB link extractor running at 100-150 threads. This seems to be a good balance between acquiring new proxies, SB being able to work fine (some errors I just export and re-check them).

    3) The proxies aren't used for scraping at all -- I have some Google proxies I use for acquiring targets to use the SB link-extractor, which are already filtered by TF, etc... and the GSA PS proxies are pretty much only being used for extracting internal/external links from websites in order to find expired domains.

    4) I did notice some Bing-tagged proxies (even though I don't have bing checked) but they were automatically disabled by GSA PS when SB tried to use them.

    5) Yes, it does seem to be a "memory glitch". And it seems very common, already 3 times it's happened within hours of using GSA PS. I remember way back in 2012-2013 GSA had a similar problem. You'd be running a lot of threads, then once they were done, you'd try to end the threads, but they wouldn't stop. Sometimes I remember ctrl+alt+delete and all my links would be gone, the "previous state" as mentioned above. I'm not sure if the error report I sent in related, that was just a hard-crash/close. This "memory bug" only happens after running the program for some time.

    For example just a minute ago, I could tell it was happening, because I finished the SB threads... then went to delete some failed proxies in GSA PS, and it wouldn't delete them, I pressed "Delete Highlighted" and it did nothing. So I'm like, "yea this is the memory glitch". I press "Quit" it closes the threads, and doesn't exit the program. That's an example of how/when it happens, although I haven't used it enough to notice any particular pattern that causes it. (I have also now setup automatic export as a fail-safe).

    It seems to happens whether I'm actively using the Internal server or not (it's happened when SB wasn't even open). I did notice when the Quit works properly, it says "Next schedule 60 minutes," whereas if the glitch happens, the threads exit without the scheduler being able to assign a new time to check the proxies next. This time I saved the proxies to a file thankfully, because it once again restored a previous "state" that had 300 less proxies.

    - - - - - - - - - - - -

    Also, if I select "Use search engines to locate proxy lists" and then insert some keywords like "free proxy list" etc, does GSA automatically know how to extract the proxies from the pages? Or is it best to manually search myself and add-in new sources?
  • SvenSven www.GSA-Online.de
    memory glitch: Well this is a different thing really. Once a proxy is in use, PS does not allow deleting. Do you see the thread-count being greater than zero?
  • Okay I'm starting to get the hang of the software. The memory glitch is still there, but, I've learned to partly avoid it affecting me too much by auto-exporting the proxies and also doing manual exports from time to time... and restart GSA PS before I do a link extraction.

    Sadly though, when it happens, I press "Quit" to restart GSA PS, and it exits threads but doesn't quit. I have to ctrl alt delete, and then add + re-test the proxies I manually exported.

    I also seemingly found the limit of the software. Basically I run the software at 100 threads and SB at 200-300, and it seems to work. Any higher, and thinks do not end up too well!

    Anyways, great piece of software. I've also just purchased Proxy Buddy and think it will be a nice compliment to GSA PS.
  • SvenSven www.GSA-Online.de
    Well let us find the glitch together then.
    When you see PS being in a state where threads do not close...simply send a bugreport manually from the Help menu with a note that it's you (this thread url e.g.).
  • edited June 2018
    @sven Alright, I've noticed that the Internal Proxy Server just completely dies with anything over, maybe 100 threads? Why does the internal proxy server use 0% CPU? Does the application not support multi-threading? I'd like to see the internal proxy server be able to consume more CPU in order to send proxies to programs faster...

    Is the amount of threads you set in GSA PS relevant to the speed at which Internal proxy server can process / send new proxies? Because when I'm testing it, if it's not searching for new proxies, the threads stay at "0" even if I'm running 100 threads on SB.

    I've started now to just quit using it entirely, but I really hate that I have to do that, because allowing it to use the CONNECT proxies results in dramatically higher success rates for certain things (likely because not every other SB user is using it).

    Sometimes I'll look at the options in GSA PS, and see the proxies switching crazy fast... others, it'll just be stuck on 1 proxy forever, and this is when I know the program has likely died. Is there a way to make the proxy server faster, to consume more CPU, to be able to perform better?

    - - - - -

    Also, does running GSA + CB in addition to PS + SB somehow affect the Internal proxy server? I have the "server" option on CB completely disabled... but I notice when trying to run SB + PS and GSA + CB, it seems to break the internal proxy server, or maybe it was just coincidence...?
  • SvenSven www.GSA-Online.de
    each new request made to the proxy server will create another thread (though not counted in gui as it is not relevant to the action there). So it is multi-threaded and 100 threads or above should not cause any issues. I also didn't see that behavior on my end.
    --
    CB has nothing to do with the proxy server...it is not causing any issues.
    I really don't know what might cause conflicts here. I need more data really to reproduce your behavior on my end.

Sign In or Register to comment.