Skip to content

Increase proxy thread loading speed (proxy test)

KaineKaine thebestindexer.com
edited January 2021 in GSA Proxy Scraper

I find myself in a case where I would like to test millions of proxies (test only the anon, no other additional test), the file is 25mb.
Having a powerfull fiber optic (full 1gbs) I would like to use around 700 threads but I find myself facing a problem.
Threads aren't starting fast enough to reach the number of concurrent tests my connection can handle (I could go higher without a problem).
The time that test threads increase over the time that previous tests complete does not allow me to exceed about 250 concurrent connections.
I think the charging thread is not fast enough or should be multiplied to compensate for this. For example 2 or 4 loading threads by sharing the sum of the proxies to be tested.

What do you think ? is this feasible?

I imagine many of us would find ourselves in this kind of situation (only the scratch free test) and the performance would surely be greatly accelerated.

Sorry for my extreme optimization fanaticism. I see a probable gain of x2 x3 minimum.

EDIT: It would be possible to automate the number of load threads directly by the number of actual threads through the number of threads desired. If the actual thread count is less than 10, an additional load thread is added.

This implies at the very beginning of the process not to divide (and therefore allocate) the total sum of proxy by the number of load threads, but rather a fixed amount of proxy. For example 500 proxy per loading thread.

Comments

  • SvenSven www.GSA-Online.de
    I don't see room to optimize this really. If there are free slots to start threads, it's done immediately. Maybe the interval to check for free slots can be optimized but thats it.

    Please explain what you do exactly and maybe send a bugreport from help menu if you see it not acting fast enough.
  • KaineKaine thebestindexer.com
    edited January 2021
    I'm not doing anything special, just importing a big url file. Then I click on test, then select only anonymous proxy. The only problem is that it can't get threads to start faster than the ones that quit. With me this limit is around 250.
    I sent you a bug report by mentioning this thread but I don't think that will help.
  • SvenSven www.GSA-Online.de
    please update and report if speed improved (uploading new version now).
    Thanked by 1Kaine
  • KaineKaine thebestindexer.com
    edited January 2021


    Sorry for time i no longer receive notification, no approximatively 250 threads (700 in setting).

    If I understood correctly by setting 700 threads, there are something like 700 spots available. By the time the control (free space) and the thread are started, at least one other thread ends when we are around 250 (closing = starting).
    Perhaps it is necessary to deactivate the control as long as one has not reached the number defined in the parameters?
    Now if the set of control + start cannot be faster, I think that the threads will go down gradually afterwards (after reaching the desired setting and then re-enable control).

    This is why I was talking about multiplying the loading queues if you want to leave it as it is (control).

    Just out of curiosity and to think in the right direction, I imagine you increased the speed of the control but by how much? x2 x3 ...?
  • SvenSven www.GSA-Online.de
    OK, I think I found the bottleneck now. Next update will improve speed here.
    Thanked by 1Kaine
  • KaineKaine thebestindexer.com
    Thank you :)

    It goes up to the 700 threads that I set, there are still some small instability like relapses of 1 or 2 seconds towards 400 threads but then it goes up. Most of the time it stays on the selected value. A little something is missing for stability but it's already a lot better (x3). 
  • SvenSven www.GSA-Online.de
    "missing for stability" = ? What do you mean by that? the temporary thread drop is normal as it needs to clear some lists or other stuff that can't be left alone.
    Thanked by 1Kaine
  • KaineKaine thebestindexer.com
    By "missing for stability" I mean that it remains in the + or - 10 threads close to the desired setting (by comparison with drops from 700 to 400 even if this lasts a little time). At this stage it is no longer very serious you have met 90% of expectations and this kind of thing can surely be used again.
  • KaineKaine thebestindexer.com
    edited February 2021
    Hi @Sven

    The more I test the proxy simultaneously, the more difficult it is for PS to start threads. Does that have anything to do with the GUI?


    There is a bottleneck :


  • SvenSven www.GSA-Online.de
    Well at a certain point it is simply not possible to increase the speed. I can try having a look when you send me the proxies you test and run it in some profiler to see the bottlenecks.
    Thanked by 1Kaine
  • KaineKaine thebestindexer.com
    Ok great, I'll send you the next batch as soon as possible with the forum link.
  • KaineKaine thebestindexer.com
    edited February 2021
    @Sven

    I sent you some problematic urls to scrape (I think it's the quantity but it must be able to do it) and the proxies.

    I have the impression that it is the fact of entering into the GUI hundreds of thousands of proxies to be tested that causes problems (freezing, performance).
    Why not put them in a simple file and test them gradually?
    Or better test them without having imported them into the GUI but leaving them in ram? (only the proxies which pass the test are imported) this will never exceed 100mb for example and if the software crashes, the users will know that it is their fault.
    It is possible that all of this is linked (too much action on the main thread), if we avoid overloading (too much in the GUI) by deporting what is not yet tested, the performance will be increased tenfold.
    In any case, the more proxies to be tested that have been imported and the speed is lower, I have verified that it is gradual.
    Remember what it gave on SER by increasing the cache  :)
  • SvenSven www.GSA-Online.de
    next update will allow you to define how to proceed with such data
    "add + test" or "test + add"
    the later will only add good proxies and probably will use less resources.
    Thanked by 1Kaine
  • KaineKaine thebestindexer.com
    edited February 2021
    Thank you Sven, I know you have a lot of work and with the Covid it is not easy, thank you again for this truly legendary support.
    At the time of the release of SER I had already heard about it and you never failed in your reputation, respect that must be said.
    Thanked by 1Sven
Sign In or Register to comment.