Skip to content

Verified List Cleaner

Sven:

Could you create a process that would help us get the dead links out of our Verified Lists? For example, you could have SER delete URLs when they fail repeatedly, or move them to another directory. It would be something like right click Status=>Active Cleanup. I experienced what happens when you run SER off of a verified list with bad URLs. LPM plummets, and bandwidth usage goes crazy.

Comments

  • SvenSven www.GSA-Online.de
    show urls->verified urls->re-verify.
  • BrandonBrandon Reputation Management Pro
    I think he might be referring to the global verified list. In that case Options > Advanced >  Tools > Clean up.
  • @Sven:

    I could be wrong, but if I imported a verified list, or I deleted projects, this won't work since the bad URLs aren't associated with any projects. 

    I imported some verified lists with a lot of dead links, and my LPM went from 70 to 17, and my bandwidth usage skyrocketed, probably due to HTML timeouts. The older the verified list, the worse your LPM will be.
  • What I will be doing shortly is moving my Verified list to submitted, and turning off the search engines. Start off with an empty verified list. When the LPM comes to a halt, I have a clean list. It will work, but it will be painful.
  • @brandon: What will this feature do if I have imported verified lists?
  • @brandon: Thanks!

    I am running it. Damn. Another feature I didn't know about.
  • goonergooner SERLists.com
    @satans_apprentice - Sven mentioned in another thread that it only cleans url's that can not be associated with an engine (like the identify function), so it won't clean most of the dead links as usually the engine is still the same, just your link has been deleted and now redirects to another page.
  • @sven can you implement my suggestion?
  • SvenSven www.GSA-Online.de
    @Satans_Apprentice I though it was cleared up? What in detail was your suggestion now?
  • @sven see the top of the thread. I ran cleanup and its still bad.
  • goonergooner SERLists.com
    @satans_apprentice - I was asking the same question earlier and it seems the way to do it is this:

    Re-verify links on all projects
    Delete verified list (or copy to some other place if you don't want to lose it).
    Select: Options - Advanced - Tools - Add urls from project - Verified.

    SER will then copy all those re-verified links into your empty verified list, giving you a nice fresh list of only good links.


  • @gooner - If the urls in the verified list were imported, or the projects were deleted, won't you be throwing out good URLs? Doesn't re-verifying the links only apply at the project level?
  • goonergooner SERLists.com
    edited January 2014
    @satans_apprentice - Yep that's a good point, but then again if SER can't check the url for a link it expects to be present, it has no real way of verifying it's live.

    Re-post to url's is the only way to know they are good for sure i guess.
  • That's why I asked @sven for the function. You would right click on a project or projects and then STATUS=ACTIVE (clean verify list). SER would build links normally, but if a URL fails a set number of times it gets removed. If a link is verified, it gets removed and moved to a separate folder to eliminate the possibility of good links getting deleted. Eventually, the verified list is empty, and you have a clean verified list in another folder.
  • goonergooner SERLists.com
    Agree it would be awesome. @sven can it be done?
  • @sven - The reason for this request is that we have experienced some very poor performance running dirty verified lists. It absolutely crushes your LPM.

    With a clean list, I can run at 100+ LPM. With a dirty list, 17 LPM or worse.
  • SvenSven www.GSA-Online.de
    @Satans_Apprentice not possible as this would require it to load the whole file and modify it. To much processing time and memory lose.
  • @sven: How about this: can you provide a Status=>active function that would run through a sitelist in order, so that every URL gets used once? When the process is completed, the process would stop.

    I am cleaning the dirty list by running it from "Submitted" into an empty "Verified" folder. I would love to know when every URL has been tried at least once so I know the process is finished. It doesn't have to be perfect - it can run URLs twice, as long as I know that each one was tried once.
  • SvenSven www.GSA-Online.de

    1. In options, disable every search engine and just leave site lists enabled or even better start it with Active (site lists only).

    2. right click on project->import target urls->from fite lists....

    3. start

    Your project will inform you that it finished due to some "important messages".

  • goonergooner SERLists.com
    Very clever! Nice one @sven
  • simarcomsimarcom Sherbrooke, Qc, Ca
    So, I went through this thread and still don't know how to keep a clean verified list...?
    @sven : Could you please help me with this?

    I have been cleaning my list for the last 30 hours or so (yes, it's working non-stop since yesterday).
    I did Options > Advanced >  Tools > Clean up

    The progress bar is full but it keeps on running. If I stop it, will I lose the clean-up that was done?

    Do you know of another way to get this list clean-up?

  • BrandonBrandon Reputation Management Pro
    Remove dupe domains

    Create new project, check all platforms.

    Import Verified site list to new project.

    Delete ALL files in Verified folder.

    Run project.

    You'll be left with only newly verified sites in your Verified folder.
  • simarcomsimarcom Sherbrooke, Qc, Ca
    Thanks @Brandon !!

    But what if I already have 90 projects running? Should I do a backup, delete and restore after the process?


  • BrandonBrandon Reputation Management Pro
    Stop all of those projects and just run this one. Let it finish and after it finishes, let it run for at least a few hours so it can check all emails.

  • simarcomsimarcom Sherbrooke, Qc, Ca
    Got ya! thank you very much :o) @Brandon
  • edited May 2014
    It's funny I have the same problem. I have a verified list - old and new mixed - of about 1.2 million so I Thought it is time to clean it up. Then I went to options -> advanced -> tools -> clean-up list and selected the failed (in my case the failed is the verified) and let it run.

    After almost 2 days the progress bar is also at the end but it continues to run through URLs and is currently showing 12 Million URLs - which is weird since the list itself only had 1.2 million after dup URLs and dupe domains have been removed before I started that clean-up.

    See here: 
    http://cl.ly/image/3g3p3p0h1z1i

    So my question is now - when I abort that because clearly something is wrong when it shows a number of checked URLs that is 10x higher than the actual number of verified URLs in that list - but will I lose the clean-up results then?

    And maybe @sven can double check why it continues to check URLs even after the 1.2 Million URLs that are in the verified list have clearly been checked already.

    Brandon's cleanup process is different and makes sense, but then again the clan-up that is under Tools should work as well I guess - at least to get rid of dead URLs and it also re-identifies the URLs if I remember correctly.

    Update:
    I aborted the process. Then it finished the threads and then told me it has added these 12 Million URLs to my list. I was wondering what happened and ran dedupe or URLs and dedupe of domains on that failed list again and then it removed 11.x Million URLs again. So now the cleaned list - if it was cleaned at all is back to 1.2Million.
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    What i do with any list i get is import it using the Import and identified option, slect custom location to save it. Once the process is completed I either import that list as target urls's or I again run them thru the Import and identified process but this time i save them in the default identified location.

    Same with .SL sitelist, import them  in Failed folder, then I use a tool called TXT Collectior which combine the files in the different folders into a single txt file. I then import that single txt file using Import and identified, it will normally clean out the deadlinks.

    I know it is a bit of a time consuming process specially if the list are large, but you end up with a much cleaner list which translate into more submissions, high LPM
  • @Brandon What should I do I have a couple of verified .sl files. Like I did what you said but What if I want to verify the 2nd .sl file ? What should I do then? I can not dlete the files in the verified folder now as they are 100% verified website.

    Thanks in advance.
  • edited June 2014
    @yellohello If you're trying to combine both list can't you just import the 2nd .sl to a new project and add to your already cleaned .sl in your verified folder?

    You can import a .sl at the project level and the URL's will just be added to your already created site list.

    It sounds like you either have to make multiple generic projects to save the verified links

    OR

    Export your list, combine them with new ones (maybe you can import multiple lists to a project?) then delete them from the verified folder and run them all through 1 project.

    @sven If it takes too many resources to create a "clean site list feature" then what does the "Clean up" button do in the advanced > tools section do?

    When you import a site list (.txt or .sl) at the project level can you import multiple list, or does it replace what you've already added?

    Also I found this tutorial on sitelist cleanup on BHW. Hope it's ok to share here.

    http://www.blackhatworld.com/blackhat-seo/black-hat-seo/613204-how-keep-gsa-ser-lists-clean-increase-your-lpm.html
  • SvenSven www.GSA-Online.de
    The Clean Up in SER is not taking many resources as it is done sequencial. If it would be done automatically it would mean it has to load the whole file, find the line where the URL is, remove it and save the file. Imagine this for every URL done where SER submits to, very time consuming and a waste.
Sign In or Register to comment.