Target URL cache - number of URLs multiplies in ONE cache within minutes

hans51 · October 2013

can be repeated = YES
since when? = many weeks or even few months

until FEW weeks ago below bug happened to VERY first project ever created
NOW same happens to very LAST Tier created

I import target URLs, up to several times every day - usually 1-8 thousand at a time into each unit (project and Tiers).

I noticed that ONE unit (NOW very LAST Tier, some time ago VERY first project ever) gets an unnatural HIGH number of URLs into its cache
unnatural HIGH = approx the total SUM of all URLs imported into ALL project / Ts found in ONE single unit-cache

several hours ago, I emptied the cache of the LAST Tier ever created = some 200'000+ URLs have been there from no-where accumulated wihtin past 2 or so days

then I imported some 7000 NEW target URLs into each unit (project and Ts)
NOW, just hrs later I find in that LAST T ever created a total of 60'000+ URLs

this behavior is many weeks or even few months old, just before always into FIRST ever project, now into LAST ever unit (T) added

just before I started writing THIS bug report, the URL number in a.m. cache was 60'000+
now it is 130'000+ (roughly DOUBLE) just within some 20 mins WITHOUT having added any new URLs anywhere

Sven · October 2013

and the urls look what way? Im sure it's the project itself that adds new urls to the queue after creating accounts.

hans51 · October 2013

@sven

100'000 URLs in half an hour added by project itself ???
u joking
with 33 threads and just several dozen URLs processed per hr
ending up with ten thousands of new URLs ....

these are regular URLs as added from SB = target URLs
ALL units get same type / quality of target URLs = 1 large URL file = randomized = then split into junks of equal size
then imported into all units one by one

these URLs in that ONE Y is leaked into that one T from all others by SER
leaked NOT created by project

and the point is that while each project / T has received thousands of URLs
always a similar / identical number of very similar high quality,

most "NEW URLs" never are submitted or used on THOSE other Ts or projects
when I run all or several units = then the LpM is just about 1+ (less than 2)

but it appears that ALL or almost all of the good new targets imported into each unit end up in that ONE T

when I switch OFF all other projects / Ts
and run only the one with all the targets

then of course MST are "already parsed" - may be 90+%
but inbetween the "already parsed" are all the good ones that were supposed to be for ALL OTHER Ts and projects

it was a BUG before
and is one NOW
just that the "target" has changed from very first project to very last T

all other projects or Ts are in no way affected and seem to work normally
currently I have 5 projects and 4 Ts running

ALL other units have a decreasing number of target URLs in cache = the imported number MINUS the processed number = correct processing

currently that one T has 218 457 URLs in its cache
while all other units have between a few dozen to a few thousand ( the 8000 recently imported MINUS the ones processed already the past 12 or so hrs)

as you can see above the current number is almost double from last reading when posting above bug report
and for MOST of the time that T was quiet = OFF except the last approx 1 hr

Sven · October 2013

So much text and still I don't know what the URLs look like. And what type of settings does this project use to get target URLs? Just by import or also other types? Maybe send me the project backup?

hans51 · October 2013

to get moving in submissions = ALL target URLs deleted and restarted from scratch
too may OTHER vital problems exist in direct import of target URLs to projects (vs import to global list)

for example

1. URLs ending with | (pipe) seem to "block" further use of target URLs - wy does SER hhave a pipe ending the target URLs ?? normally |PR value are removed and only plain text URL found in target URL cache ...

2. newest problem = with all SE OFF and all global list OFF = ONLY imported target URLs
SER works and target URLs seem to never be empty as it was the case some days ago (may be 1 or 2 upgrades back)
instead there is a large number submissions and verifications = 1000+ verifications and target URL cache stilll full ...
but from WHERE are the target URLs if ALL SE and global list OFF and SER running for 12+ hrs

until several days back I used this method to switch ALL target sources OFF to test particular URL lists directly imported into projects / Tiers
and after a while all target URL cache EMPTY (showing 0 URLs) as intended

3. that above mentioned number of has 218 457 URLs in its cache further increased for a while to 300'000
then after some 12+ hrs of THAT project INACTIVE
AND after a reboot of all machine
ALL = ALL target URL vanished !! and target URL cache empty
before reboot all target URLs still existed

for now ALL www resources fully busy but as soon as NEW same situations exist I may send project backup

Target URL cache - number of URLs multiplies in ONE cache within minutes

Comments