Skip to content

Number of threads in relation to Internet Speed

Most tutorials and FAQs suggest that you use something between 5-10 threads per proxy for link submission and then work your way up. However, these tutorials always assume that you're running SER on a VPS with insane internet speed (1Gbps).
I'm actually running SER on my home computer. But since my home line isn't nearly as fast as a VPS connection would be, I'm wondering if I should reduce the number of threads to avoid timeouts/download fails/etc. I'm not getting a whole lot of those, but occasionally I do.

Right now I'm running with 200-300 threads + simultaneous scraping with SER's built in scraper (on public proxies). CPU + memory isn't nearly at max. My home line speed is ~17Mbps.
  • Should I consider reducing the threads? And additionally, is my internet speed creating a tight bottle neck for the whole system, or is it fine, as long as I only run a few projects (which is the case right now)?
  • Would renting a VPS with much higher internet speed than my home computer increase my LPM by a lot (even if the hardware is much worse)?


I'm kinda trying to figure out the dominant factors when using SER. How important the internet speed vs. # of threads is.

Thanks in advance guys.

Comments

  • KaineKaine thebestindexer.com
    Just put a lot of thread and opens a web page. 

    If it takes too long to open, down the thread.
  • edited May 2014
    I'm really wondering who came up with the 10 threads per proxy recommendation. It's not like you're scraping more often just because you're using more threads, and most proxy providers allows 100 simulteneous connections per proxy, so it really should be fine to use 100 threads per proxy. However you'll probably run into problems with being blacklisted by IP fairly quickly, and I doubt SER will distribute threads evenly at all times.

    Timeouts and download failed errors can be caused by a lot of factors out of your control. Personally, I'd just keep increasing my threads until I was using as much bandwidth as possible.

    The amount of threads you're running doesn't impact the bandwidth usage, only your amount of threads and timeout setting.

    Don't rent a VPS unless your bandwidth is already being maxed out by SER and you want more.

  • @Kaine Tried it. Actually SER doesn't stay stable above >1000 threads (thread count wise) it's really jumpy and ~1000 seems to be the limit for me. I'm not sure, but I'd say it's due to bandwidth, because websites load really slow at this thread count.


    @fakenickahl The 5-10 per proxy 'rule' is a recommendation, I've read a couple of times. The guys from serlists.com recommend to start with 10 per proxy and then work your way up (which is exactly what I've said in my first post. I never said you should stay at 10T per proxy).
    You've actually stated something which has been running around my head for a while now. TimeOut/DL failed can be caused by so many factors. I've had an awful lot at those while running at 100 threads. So I guess it's not the one and only factor you should take into consideration.

    "The amount of threads you're running doesn't impact the bandwidth usage, only your amount of threads and timeout setting."
    Would you mind elaborating? I don't quite follow. Isn't the timeout setting very dependant on the bandwidth? Say you're running with 1000 threads and a very small timeout setting of let's say 10 seconds. If you're running these settings on a dedi server with 1Gbps speed and on a home computer then you'd get completely different results, because the more threads are running, the less bandwidth is available per thread and therefore the timeout setting needs to be chosen much higher. Or am I misunderstanding the conept of threads/bandwidth/timeout?
  • KaineKaine thebestindexer.com
    17 mb is not much, in the end you should be around 150-250 max I think. Do not forget that you used as proxies too.

    And I'm talking list scraper nearby.
  • @Kaine I don't quite understand your answer, I'm sorry. Would you mind rephrasing it?
    I've tested ~800-1000 threads now and this is pretty much where the limit for my bandwidth is, I guess. At 1000 threads my connection was maxed out. So I guess 800-900 is within the limits of my bandwidth.
  • KaineKaine thebestindexer.com
    edited May 2014
    Too much i think :)

    800/1000 action, uploading/downloading in 17 mbps. You'll have plenty of missed.

    With 100mbps i have bad result at 800 threads ... ;)

    I begin to lose over 400.

    Number of threads is not important, search good compromise.
  • Ops, I meant to say the amount of projects you are running doesn't affect the bandwidth usage but rather the amount of threads will. Personally, I'd just keep html timeout at something reasonable and increase threads until your connection speed is maxed out.

    Concerning the threads/proxy recommendation, I was just wondering who came up with that number at first. It really makes little sense to me why the amount of threads you're using should depend on how many proxies you're using.


  • KaineKaine thebestindexer.com
    edited May 2014
    For exemple with zenno, you can make perfect project with 99-100% succès.

    But, if you use proxy on same project, succes is really bad style 10%. (one thread for one proxy).

    Imagine multiple threads on a single proxy.
  • edited May 2014
    Mhh.. this is tough. At 200-300 threads I can browse on the web without any noticeable delay at all. As soon as I increase the threads up to 700-1000 it gets very laggy. So my assumption would be that my bandwidth is maxed at somewhere between 700-1000 threads.
    But I just can't believe that running 700-800 threads on a 17Mbps connection is realistic. It just sounds way too much. That'd mean that people with dedicated servers and high end hardware are able to run 3000-5000 threads or even more and I haven't read anything about numbers that high.

    @fakenickahl What do you consider a reasonable timeout setting? I'm at 150-180.
  • edited May 2014
    Sure we have a fast connection on dedicated servers, but you'll run into memory errors waaay before you encounter connection trouble. I'm currently just running 600 threads to avoid "out of memory" errors on verified lists, and I'm using 10-25 mbps. Oh, and I've got 24 GB ram, so that's definitely not the issue.

    Why don't you just monitor your bandwidth usage in resource monitor instead of guessing where the sweet spot is at?

    Oh, and I said a reasonable timeout setting, because I really have no idea what to exactly have it at. I've always used 120 seconds, and havn't paid any more attention to it.
  • @fakenickahl Doesn't CPU overload usually happen way before you run out of memory? Increasing threads barely affects my memory usage, but makes my CPU usage spike quite heavily. I've always been wondering, because people keep preaching that RAM > CPU when it comes to VPS/Dedi hardware.

    Monitoring the bandwidth usage is something I've considered before, but it's surprisingly hard to find a reliable tool for that. But I finally found one and this was quite helpful, to be honest. I increased threads by 50 everytime I hit a stable plateau and now I'm at 400 threads and this seems to work just fine. My connection keeps dancing around 80-90% usage with occasional spikes to 100-110%. I think I'll keep it that way to avoid too many spikes which may cause timeouts.

    Regarding the timeout settings - yes, I'm doing kinda the same here haha. I'm keeping it at 180 since I'm still convinced that my bandwidth is the weak link in the system and therefore I don't feel like putting too much pressure on it. If I'm not mistaken this setting will only increase LPM, which is only neccessary if you're running a shit load of projects or a churn & burn campaign and need as many links as fast as possible. Since neither is the case for me, I'll keep it like that.

    This has been a great discussion, mate. I'm very grateful for that. Thanks for taking the time to help me out. I really do appreciate it.
  • Alrighty.. this topic has been and still is quite interesting so I've kept tweaking with all the factors and in my opinion it all comes down to which limit you hit first.

    Explained:
    - Proxies: most providers allow up to 100 connections per proxy, some don't have a limit
    - Threads: the amount of simultaneous tasks your VPS/computer can handle
    - Bandwidth: max amount of kb/s per second

    So what I did is play around with those factors a little. My result was that unless one of these factors had reached its personal limit, there was no reason for me to decrease any of the others.

    Example:
    - 10 proxies
    - 17.5mbps bandwidth limit
    I then started increasing the amount of threads until my bandwidth was maxed. This was now the limit for my setup.

    Example2:
    - 10 proxies
    - 1Gbps bandiwdth limit (VPS)
    - Again, steadily increasing the threads. This time my VPS gave in at a certain point (due to high amount of threads). In this scenario the hardware is the limit.


    If you think about it, it's quite logical. There's no reason not to increase your threads until you either hit your hardware limit, or your bandwidth limit.
    So my conclusion is, do not rely on fixed numbers like xy threads for setup abc. Start increasing your threads until either of your factors hits its limit. This is then the limit for your whole system. Additionally you automatically know what you'd have to change about your system if you wanted to increase overall performance.

    Just my 2cents.

Sign In or Register to comment.