Skip to content

Dirty verified lists kill your LPM...

Over time, your verified site lists will have links that go bad. If you are building links from your site lists and you have bad links, it will kill your LPM in a big way. A dirty site list is the difference between 20 LPM AND 100 LPM. Cleaning out your verified list will pay big dividends. The big problem occurs when you delete projects or directly import site lists. The cleanup functions in SER are tied to actual links in projects. If those actual links are deleted(deleted project), or never existed in the first place (imported site list) SER can't delete them. It appears that failed downloads hammer your LPM. If you are getting a lot of download failed, you probably have a dirty site list.
Tagged:

Comments

  • So what do you suggest? Using the "cleanup" function? Or dup remover? Because if the latter - I already use that (dup URLs and not dup domains though).
  • @pratik The cleanup functions will work if you haven't imported site lists or deleted projects. The problem with the built in cleanup functions is that they are tied to links in projects. If a link doesn't exist in a project, the bad links don't get cleaned out of the verified list.

    Here's an example: you buy a verified list (which I would recommend) and import it to your verified site list. A good portion of those URLs will be bad, and you won't be able to build links to them. They will now sit in your list, generating "download failed" or "no engine match" messages which kill your LPM. Since links were never built in your projects with those URLs, they can't be cleaned using the built in tools.

    Check out my other thread on the right way to import site lists for how to solve the problem.
  • @Sven can you confirm if it's the way @Satans_Apprentice is mentioning? If yes, could we implement a more revised cleanup function? :)
  • goonergooner SERLists.com
    @pratik - There is no possible way to clean up a verified links if the link was imported or the project no longer exists.

    SER has to check the link is present on the page to confirm it, how can it check that if the project that created it no longer exists? SER doesn't even know what link to look for in that scenario.
  • NO but there could be an option that says *Remove url from list if there is no possible way to submit a link.
    something like that, we create a dummy project with every engine selected and let SER run and voala a cleaned list.... of course there will be some false behaviors like if the website is loading slow, any proxy fail, any captcha fail... but anyways its worth to lose a couple of good links in exchange of removing thousands of bad urls from my list...

    SER have a failed list and this is where those urls are saved but, we are talking about URLS that SER have already submitted, verified or identified but now they are dead, so that option could be viable.
  • LeeGLeeG Eating your first bourne

    All you need to do is main options > advanced > tools > Clean-Up (check and remove non working)

    Then do all your global lists

    It checks links are live and a recognised engine. If the engine has changed, it sorts that as well

    It does not matter if you have killed a project that put the links into the lists. it still checks the links

    Over time, web sites will die or change what type of platform is on there

    Im looking at about 1 in 3 sites being dead. No idea what percentage has changed engine

     

  • goonergooner SERLists.com
    @rodol - That's true, but you can already do similar... Import the list into a new project and whatever verified url's that projects accrues is your clean list.

    Other options where list are compared or links are compared against past lists are apparently not a good idea because of the memory usage that would be required to load the whole list, according to Sven.
  • @leeg I ran it, and there are still a ton of dead links. It works through your projects.
  • LeeGLeeG Eating your first bourne

    It does not run through the projects

    It runs through the global lists

    Im checking over 21 million links from my global lists at present. That's just submitted and verified global lists

    Just doing everything. Engines that are no longer used anywhere etc inc in those lists

  • goonergooner SERLists.com
    @leeg - That identifies the engines like you said. But identified does not mean you can post a link there still. Trimming verifieds and posting to only viable targets increases LPM a lot. From 70 to 280 in my case.
  • @gooner Cool. So what's your exact process that you have adopted for cleaning the list and gaining an extra boost? Would be definitely interested to hear that. :)
  • goonergooner SERLists.com
    @pratik - If you don't have multiple installations or are not buying lists, the easiest thing to do is save your current verified list and remove it from SER. Start a brand new verified list and once you have 30k, or 50k verified links (whatever number you choose), then use that list for all projects.

    Once LPM goes low you will need to start from the beginning, brand new list -  that's why it's ideal if you have another installation to get a new verified list ready for when the current one has died (all projects have used all available links in that list).

    Does that makes sense?
  • SvenSven www.GSA-Online.de
    What @LeeG said is correct.
  • @sven I ran the cleanup, and there are still a ton of bad links in the verified list.
  • OK honestly I am little confused gooner lol.
  • SvenSven www.GSA-Online.de
    @Satans_Apprentice bad links come up how?
  • goonergooner SERLists.com
    edited January 2014
    @pratik - Look at it like this:

    - If you let SER scrape, then it scrapes random url's for random projects - Not very efficient.

    - If you post from verified list that is better because all projects will use url's that you know SER can post to. Much more efficient, but run SER for 6 months and you will find your verified list becomes full of dead links (Clean up only removes url's that SER can not identify - Identify does not mean postable - Like if you run "Sort and Identify" function, SER can not post to most of those url's even though it identified them)

    - So maximum speed is gained from using a freshly verified list for all projects. That's all it is really.

    But it's only really viable if you are doing big numbers with multiple installations.

    Here's a real-life example:

    - List scraped with SB and tested on a VPS

    - After one month collect all verified's from VPS and use for real projects on dedicated server. LPM will fly because all links are freshly verified (i.e. working links)

    - Start new verified list on VPS and after one month remove current list from dedicated server and put this new verified list from VPS in it's place.

    If you don't believe it is true go check your verified list, open the contextual engines and randomly check some links. You will see so many that redirect you to another page (article removed etc). SER will identify those url's as good during clean up because the engine is still the same.

    But in reality you won't get a link there or it will be deleted soon after verification.

    EDIT:
    Here's an example: I had 36,000 unique domains from verified contextual links, i started a new project and imported all of those links. First run i got 1200 verified, 1200 from 36000! I'm running them again to hopefully get a few more, but the rest are dead so no point keep trying to post to them.
  • But what if you used scrapebox to clean out your list, you have to save the links you have posted and run it true SB linkchecker, when the links are not there anymore you can delete that site. If that doesn't work you can at least clean out the sites that are offline with one of the addones in SB, would that work in any way?
  • goonergooner SERLists.com
    @pauliep - It would work i guess provided you know the url that should be on the page you are checking. With multiple projects it gets difficult to keep track of that stuff.
  • It isn't just the download failed URLs either. If you have URLs that you just can't build links to, they will get tried repeatedly by projects.
  • edited January 2014
    Bookmarked, this is awesome! Such a cool trick.
  • I'm trying the cleanup feature.  So far it's identified over 44 million and a little over 17 million in unknown.   This has been running for 3 days so far at 2k threads.  and it's only halfway through. :( 
  • why isnt it so (or maybe it is @sven) that if a project can't post toa link in a global list after X amount of times, it is removed?
    If this is the case, then you dont have to worry because that will clean out your verified list as it will remove after you run it and it doesnt work (so other projects wont keep trying to submit to).
  • BrandonBrandon Reputation Management Pro
    @tsaimllc this has been a debate for a long time and I don't remember the answer. From my memory, sites are never removed from Verified by SER unless you use the cleanup. It would make sense for them to be removed eventually, but I don't think that happens.
  • I am pretty sure it's because SER doesn't use databases, only .txt files. Cleanup works pretty well, but it doesn't clean out all of the links that can't be written to, only the ones that are dead. Once in a while, I would import my verified list to my projects, empty the verified list, and run it. You wind up with a squeaky clean list.
  • BrandonBrandon Reputation Management Pro
    @Satans_apprentice absolutely true!
Sign In or Register to comment.