SER v15.85 slows to a crawl on large submissions, older versions do not

seoaddict · January 2022

Hi @Sven,

With version 15.85, using 150 threads and 30 html timeout, after a few minutes, it slows from 150 threads... down to a crawl, only 10-20 threads being used.

At first, I thought this was due to "already parsed," so I enabled "Retry to submit to already failed sites," but while this removed "already parsed," it didn't fix the issue.

So as a last resort, I tested this exact same setup (same PC, same Internet connection, same project) on v14.70. It doesn't slow down at all. I've been running at 150 threads consistently for the past 30 minutes.

- - - - - - - - - -

Recreating this is quite simple. Take a list of 50-100k+ blog comments, import them, set to 150 threads and press go. After a few minutes, SER will slow down to a crawl and only submit at 10-20 threads.

I've tried a ton of different setups / options, but this happens every single time on the latest version, but not older versions. Note: the server / proxies I'm using are more than capable of running well over 150 threads.

Sven · January 2022

Testing this against a 100+ version old build and tell me it's the latest update is just not a correct test. Sorry, but tons of things happened since that very old build. I can not tell what might have made a change.

seoaddict · January 2022

The only thing I see differently on a surface level is the way the imported URLs are parsed. The older version seems to parse them in groups of ~300, while the new version parses them in groups of ~3000.

But this bug is easy to reproduce, even if it's not comparable to an older version. All you have to do is set to 150 threads and import a large list of blog comments (into a single project), within minutes SER will slow to a crawl and only submit at 10-20 threads.

If you stop the project then start it again, the same thing happens. It will start at 150 threads, then after a few minutes, go down to 10-20 threads. The only way to realistically submit with 150 threads (using a single project) is to stop and start it repeatedly every 10 minutes or so.

I have a programming background and usually have some idea of what could be causing issues, but on this one, I'm stumped. The best way I know how to describe it is that SER is capping itself. It's unable (or unwilling) to open more connections. I even started to wonder if this could be OS-related (Windows 10), but being as the old versions do not have this issue, it seems to be an SER issue.

seoaddict · January 2022

Also, I am only using CB / Text Captcha Solver for captcha solving, nothing else.

Sven · January 2022

Run one project, log to file and when you think it drops on threads, write down the time and send me the log file.

Also if you think it should start more threads, please click on help->create bugreport.

Also please send the project backup in question.

seoaddict · January 2022

I will do this now, where can I find "log to file"?

seoaddict · January 2022

Nvm found it

seoaddict · January 2022

Watching it closely, this issue is almost certainly related to the way SER is loading / parsing batches of links.

- On older versions, groups of ~300-600 were being constantly loaded, which resulted in a constant 150 threads.

- On newer versions, groups of ~3000-6000 are not added until the existing batch is almost complete. Even though there are plenty of links remaining in the "batch," SER will cap the threads (progressively lower, read below) until this "batch" of links is processed... instead of using 150 threads all the way until the end or loading a new batch.

The next batches are being loaded far too slow, which causes SER to dip down to 10-30 threads for a significant amount of time.

2600/6000 batch processed -> 60 threads
3400/6000 -> 40 threads
4000/6000 -> 35 threads
4300/6000 -> 25 threads
4500/6000 -> 20 threads
4700/6000 -> 15 threads
5100/6000 -> 10 threads

(Next batch STILL not loaded at this point)

Next batch was loaded around ~5500/6000, and now I'm back up to 150 threads for a few minutes.

Next batch:

1000/3200 -> 70 threads
1400/3200 -> 50 threads
1700/3200 -> 35 threads
1900/3200 -> 25 threads
2000/3200 -> 10-15 threads

Next batch (2100/3200):

Next batch (2000/3100):

Next batch (2000/3200):

Image: https://forum.gsa-online.de/uploads/editor/l5/fcmu6i06i5sm.jpg

Last batch (4300/6100):

Image: https://forum.gsa-online.de/uploads/editor/3m/viykfs1wyvzk.jpg

Old version - SER stays at a consistent 150 threads.

Notice the drastic increase in LpM using the old version because of this issue - currently a 300+ LpM increase (+100%) and still raising!

As the amount of parsed links per batch increases, the number of threads are decreased until the next batch is added (which takes far too long). Once the next batch is added, the threads temporarily boost back to 150... only to come down a few minutes later.

In the end, SER spends far more time at 10-60 threads than it does at 150. Either the batch adding needs to be sped up, or SER needs to be able to use the full amount of threads until the next batch is added.

seoaddict · January 2022

Upon further testing, the old version does in fact sometimes split the batches into 3000-6000. However, it's scheduling (or thread management) system seems to be far more effective... as even with the larger batches, it never dips below 150 threads.

Image: https://forum.gsa-online.de/uploads/editor/5l/ex81vps1z7js.jpg

If this was in the new version, I'd already be down at 10-30 threads, as you can see from the images above.

Sven · January 2022

OK, thanks for the research...I might have found something that would improve results in next update.

seoaddict · January 2022

Sadly the issue still persists. It's maybe... 5% better, but that's about it. I hope there is some way to fix this issue. During my testing last night, I found that the older version was able to achieve ~100% higher LpM than the newer version. I believe it's because of the way the newer version(s) slow down the threads so much, whereas the older versions keep pumping at high threads the entire time.

Edit: I have actually found some instances where the update did helps out quite a bit. It does seem that whatever you did, it's on the right track. I will continue testing to see if I can find any other factors that may be contributing to this (i.e. targets having problems, etc).

seoaddict · January 2022

@Sven

On the latest update, the speed does seem to be increased, but there's a big problem. SER is trying to re-submit to the same accounts over and over again, even though "Retry to submit to previously failed sites" is unchecked, and "Allow posting on same site again" is unchecked.

It loads the targets via:

07:05:51: [+] 922 possible new Target URLs from present accounts.

Even though it was not instructed to do this. This is especially bad when you're using SER for scraping search engines and posting (to URLs that require accounts), because it doesn't allow any scraping to happen. It keeps re-loading the same target URLs (ones that have accounts) and trying to post to them again.

I have tested and confirmed this does not happen on v15.86.

cherub · January 2022

I'm also getting the problem with SER trying to use present accounts when I don't want it to. Reverting to previous version.

Sven · January 2022

It will use present accounts and your project settings to submit more articles or create new accounts. Nothing has changed on that logic really.

Sven · January 2022

found one problem in previous build that was fixed in latest update.

seoaddict · January 2022

Sven said:

It will use present accounts and your project settings to submit more articles or create new accounts. Nothing has changed on that logic really.

Something has definitely changed because it didn't do that in any version (ever) before the most recent 2... Why would anyone want SER to automatically submit more articles when they didn't select "Allow posting on same site again" or "Retry to submit to previously failed sites"?

There's no way that anyone who uses contextual engines (that require accounts) can use this most recent version for submission.

Sven · January 2022

that latest update should have fixed this!?

seoaddict · January 2022

Sven said:

that latest update should have fixed this!?

Sadly, it did not fix it... it's still the same thing happening. It's particularly noticeable on contextual targets where an account is created. This didn't happen on 15.86 and all versions beforehand.

I noticed it because I use SER for scraping and posting to contextual engines. What happens is that SER keep trying to post to existing accounts, which leaves zero time for scraping. So basically, it keeps importing the same list of targets over and over again, and no scraping ever occurs. The only way I can scrape new targets (using SER) is by using 15.86 or before.

Sven · January 2022

but the code is the same as before...just that it's optimized

please send me the project backup for a closer review.

Stigliz · January 2022

Please, fix this bug

Sven · January 2022

I don't know what to fix really...it's working same as before for me. Noone sent me a project backup I asked for.

asd111 · January 2022

@Sven @seoaddict Any chance we can have access to older builds? Maybe not all builds but I think it would be good to have a few versions available

seoaddict · February 2022

This issue is now resolved... and now the importing of target URLs from parsed emails is functioning significantly better than before. Thank you @Sven!

SER v15.85 slows to a crawl on large submissions, older versions do not

Comments