Skip to content

Clean-up not working

Ok, so I had like 300k url's in the verified site-list. This list has been pretty old, to which I also added a few other site-lists from other people... Things started to go really slow, lots of download failed, no engine matches and so on. So I said it's time to clean it up.

So I launched the tool, and I let it run for like 3 days... it pretty much kept my CPU running at 80 - 100% (it was usually 100%). Then it reachedt the end of the progress bar, and from those 300k url's it identified over 900k (what the hell?) but it wouldn't finish.... it kept on going identifying teh same url to a bunch of engines and it reached 1.4 million identified.. so I had to stop it.

When I check stats to verified list, I see 1.7 million (I began with 300k).... After I removed duplicates I ended up with 700k. In the whole proces, starting with 300k list, I got 700k list and another 130k unknown. How is this possible? 

Something is definitely broken.

Comments

  • goonergooner SERLists.com
    I believe it's because it may of identified the same URL to different engines, like blog comment and article for example. So you get more links than you started with.

    Personally, i never use cleanup - I think it does more harm than good.

    You are better off to move the verified lists to another folder. Have projects post from that folder and whatever ends up in verified at the end is a good link.
  • It worked one time in the past... and I trusted it would still work.. apparently, I was wrong :(
  • SvenSven www.GSA-Online.de
    Nothing is broken. It's just more accurate now. It adds a Wordpress based site e.g. to Wordpress-Article and Blog Cooment and so on. As usually those engines can be used on that site.
  • Ok, so explain why I got 900k identified from 300k list.. and the rest to 1.5 million (would've went going, but I just stopped it) identified was the same url that was identifying
  • SvenSven www.GSA-Online.de
    900k identified from 300k list << I thought I did explain it!?
  • Each url you try and identify can match multiple engines. For example, a url on a wordpress-powered site could match the Wordpress Article, General Blogs, and Trackback engines, maybe more.
  • Alright, but why wasn't it finishing, and passed the 900k mark it kept identifying the same url until I had to do an abort.... 


    Dunno what to say man... I'm pretty pissed now, I screwed up my list, instead of actually cleaning the none-working url's I inflated the sitelist and everything is so much slower now
Sign In or Register to comment.