the network connection itself has not that much influence but the hardware does. A good router is the top priority here. Many threads need a lot open connections to be handled. Thats done by a router and if thats a bad one it can not handle it and breaks down (loses connection).
Another issue is the software. A normal setup system without firewall/anti virus is optimal here. Everything inbetwen slows things downs as they monitor the traffic and eventually can't proceed that fast.
The last thing you should care about is the network speed. Of course it helps to have a better one here but I don't see it as such a big issue.
But one thing is correct. The more threads you use, the higher the timeout should be.
@fakenickahl - I alluded to this privately with someone else. SER handles processing of freshly scraped lists better than it does verified link lists. I reasoned this to a verified list of URLs being hit by everyone else and thus the sites being larger, pages being larger, hosting bandwidth processing all of the different users SER connections, etc etc. This would obviously not apply to your own verified list.
@Justin that really makes a lot of sense! It never occured to me that this could be the reason, but doesn't this really only make sense on image comments, blog comments, and guestbooks? I'll try it again soon while having these platforms unticked anyways. I'm thinking that just because an article site for example is being visited a lot by SER would only make the load time longer instead of increasing cpu and ram usage.
Are you totally sure about that guys? I have identical servers, some run scrapes and some run verified lists. The ones running verified lists always have more submissions and a better submitted to verified percentage. I mean every single day on every single server. It has never once been the reverse.
@gooner, I'm sure that a larger percentage of my verified lists results in a submission than my scraped lists and also that my verified lists produces more verified links. My issue is that SER uses waaay more resources posting to a verified list compared to a freshly scraped list even though it's doing a lot more submissions per minute on my scraped list.
I just wanted to point out the fact that the verified list takes up a lot more resources than my scraped lists even though it's doing more submissions per minute on my scraped list. I guess the reason for fewer submissions per minute on my verified lists are due to the huge use of resources.
@fakenickahl - Gotcha, for me generally the higher the LPM the more resources used, no matter what the list i'm running is. I'm not saying you are wrong, just that i don't see that on my servers.
I use dedi's usually but i will running performance tests on a VPS next week so i'll see if i can re-produce what you are seeing and hopefully we can all make comparisons to improve performance.
@gooner, I just thought you didn't properly catch what my issue is and that's why I tried clarifying. It's interesting you're not seeing the same as I am, it might be due to a difference in how we are setting up projects. I've experienced the same problem on two dedicated servers now.
@fakenickahl - No worries brother, thanks for clarifying. Yea i'm thinking it could be a setting maybe. We are putting together a pdf with the settings we use, so when it's done you can check it out and maybe let me know if you do something differently and hopefully we'll find the cause.
I have having crazy cpu usage when importing verified lists too. I never thought it could be a verified list problem, but looking back at when i just had RAW scrapes, I had much, much, much lower cpu usage.
I was thinking the problem was either VPS issues or recent SER version issues. Interesting thread though. When I have burnt through this list I will retry using RAW scrapes and see if it is the solution to CPU.
Now that is odd... On one of my servers I didn't wait to burn through the veriifed list, I just cleared the targets and added 400k RAW scrapes and I am now at 10 - 50% cpu instead of 99% cpu.... Very strange.
Just to clarify, I was using the verified list previously and I imported it 3 different ways over the last week;
1)import from site list (right click menu)
2)pull from folder (options tab)
3)merged, split and imported as 25k lists
I even tried to rotate just 3 projects at a time and still had the cpu problem.
That's the same thing I've been seeing @Brumnick, but my RAM usage is also going through the roof on verified lists. I'd be great if we look further into this so we can give Sven something to work with if there is indeed something to fix.
HinkysSEOSpartans.com - Catchalls for SER - 30 Day Free Trial
edited April 2014
As of yesterday I'm seeing something similar.
I had some very bad CPU issues a few weeks ago (which I thought was due to VPS) but it seemed to fix itself without me doing anything special. (Tho I did untick "skip for identification" proxy setting which seemed to help a lot)
Everything was back to normal, SER was running at 300 threads and 60-90% CPU without ever topping out at 100%. Untill yesterday.
When I logged in the VPS, it was basically frozen with 100% topped out CPU. For some reason it now works REALLY sluggish and I can't go over 80-100 threads without using up 100% CPU.
Anyway, here's what I noticed:
When I run even 1 single project (it doesn't matter if it's 1 or 10 projects, the result is the same) using ONLY verified list on over 100 threads, it uses 100% CPU all the time (while posting). It doesn't seem to matter which project it is, as long as it uses a verified list, it's sluggish.
However, when I run my 3 scraping projects (which are scraping with the "Use URLs linking on same verified URLs" option), they run on 300 threads more or less as expected (around 90% CPU).
RAM on the other hand is as normal as it gets, doesn't go over 300mb tbh (I'm looking at it right now and it's 160mb, 99% CPU)
The only thing I don't understand is why this started happening basically over-night. :S
Skip for identification means you don't use proxies to identify the type of the website engine, before even posting to it. SER will use your ip to visit the webpage and check it out.
Comments
the network connection itself has not that much influence but the hardware does. A good router is the top priority here. Many threads need a lot open connections to be handled. Thats done by a router and if thats a bad one it can not handle it and breaks down (loses connection).
Another issue is the software. A normal setup system without firewall/anti virus is optimal here. Everything inbetwen slows things downs as they monitor the traffic and eventually can't proceed that fast.
The last thing you should care about is the network speed. Of course it helps to have a better one here but I don't see it as such a big issue.
But one thing is correct. The more threads you use, the higher the timeout should be.
The ones running verified lists always have more submissions and a better submitted to verified percentage.
I mean every single day on every single server. It has never once been the reverse.
I use dedi's usually but i will running performance tests on a VPS next week so i'll see if i can re-produce what you are seeing and hopefully we can all make comparisons to improve performance.
Could someone please explain how "Skip for identification" affects your project? Thanks!