Skip to content

Are sites removed if they dont work

Are sites/links on my identified list removed if when submitted to they dont work? Maybe after X amount of times? Seems redundant to keep trying to post to the site sites if they wont work, and my identified list is like 2million so it would waste time if it doesnt (but it doesnt look like it does).

Comments

  • donchinodonchino https://pbn.solutions
    There is a feature "Remove non-working" in global settings - Advanced - Tools. This cleans your sitelists. Be aware that it takes hell of a time to clean that 2 million list.
  • goonergooner SERLists.com
    It cleans but only if the site is no longer online. If it just redirects to another page (as most links that are removed do), then it won't remove it from verified list.

    At least that's my understanding of how it works. If that's true then @tsaimllc has made a great request +1 for that.
  • SvenSven www.GSA-Online.de

    sites in site lists are never removed unless you do y "Clean Up" in options. And that removes everything if no engine could be detected to it.

  • I am running it now. it asked me which lists i want to clean, but are these verified, identified, or what?
  • SvenSven www.GSA-Online.de
    from all folders.
  • BrandonBrandon Reputation Management Pro
    After I clean my list, I reimport the list and get a lot of successful sites. I don't know why, but the cleaning is agressive.
  • Could be your HTML timeout. The site is there but it's slow, maybe?
  • ronron SERLists.com

    @Brandon - You lost me on your comment. Why do you need to re-import the list, and I was wondering if your comment on the 'aggressive' cleaning was a negative or positive thing?

    I tried using this cleaner some time back, and after 1 day of running it maybe got through maybe 5% of my verified list. I ended up stopping it because my LPM and linkbuilding was slowed down.

    I have a question for @sven:

    Let's say someone were to remove all duplicate domains in their verified list. And let's also say that they have two posts on a Burning Board forum website. Two completely different pages. Then they remove duplicate domains. So only one domain/URL is left. Will SER be able to find the second inner page that was deleted (or any other postable page on the website)? Or does SER need the exact URL of every inner page if it is going to post?

    The reason I ask all of this is that my number of verified entries over time has grown very large. Lots of dead links from 18 months ago. And lots of multiple posts on the same domain for junk tiers. I would love to trim the file (of course I already removed duplicate URL's). It was always my understanding that if you remove duplicate domains, you really are chopping off a bunch of legitimate links. I just want to make sure I checked in again before I did something rash. 

  • edited January 2014
    @ron I was a bit concerned about removing duplicate domains myself (in the past). However, I don't think there is anything to worry about, because of how SER works. I (assume) all links in verified lists get trimmed to last paths and checked for engines. for example:


    Links getting checked for an engine:

    In think case, it doesn't matter if you have 10 document links like this, removing the duplciate domains, will still find your engine link.

    However, there might be some cases where there are different platform installs in different folders, for example
    an article directory in "http://domain.com" and a forum installed in the same "http://domain.com/forum/" there are 2 different link types in this case:
    Article link from "http://domain.com/article=ID" and a forum profile from "http://domain.com/forum/profile/user=ID"

    Having only these 2 links in your sitelist and removing duplicate domain, it might only keep the article directory link (which is a closer path to domain root) thus losing the forum link on the same domain.

    If this is the case, removing domains might lower your sitelist. BUT, unfortunately I do not know how this works, only @sven knows... but assuming, sven is a very smart guy, he thought of this situation.

    My solution for this problem would be, when checking for duplicate domains, always delete the one with shortest path, in the above example would be: DELETE "http://domain.com/article=ID" and KEEP the "http://domain.com/forum/profile/user=ID" link in the sitelist. Doing this, will always keep the domain with the largest path, in your sitelist.

    Now, when SER grabs that link (http://domain.com/forum/profile/user=ID) it will check all paths for an engine.
    1. http://domain.com/forum/profile/ - no engine detected
    2. http://domain.com/forum/ - FORUM engine detected
    3. http://domain.com/ - ARTICLE engine detected
    So, it will post to both article and forum engines from this link.

    OK. There are my assumptions, as I have a bit of programming knowledge, and really @sven should clear this up, so we don't worry about it anymore.

    Cheers.
  • SvenSven www.GSA-Online.de
    @ron you can savely clear duplicate domains. Once the program identified the correct engine, it can go to every part of the site to leave links.
  • @ron not sure how many threads you are using when you clean up your list, I learned you can put like 1000 or more while doing it, it will work, just put your timeout a little higher too just in case. It goes thru the list a hell a lot faster.
  • BrandonBrandon Reputation Management Pro
    @ron here is an example. Here are screenshots of cleaning Moodle only:

    imageimage

    Save Unknown > Import URLs and sort in (import unknown URL file)

    Out of 191 urls that were declared "bad", 55 were reimported and are OK.

    image

    I don't know why it does this, I just know it's a best practice to save the Unknown from your initial Clean Up.
  • @Bradon that can be from a number of reasons. Some sites load very slow, some are down, then loading slow + your thousands of threads + connection that can't handle all your threads makes them load even slower => timeout
  • BrandonBrandon Reputation Management Pro
    @seowizzard I'm pretty sure (not 100% positive) that I've tested it with less threads. My timeout is 60 seconds on an extremely fast server.

    Regardless my point was that a best practice would be to reimport them, else you're losing a lot of good urls.
  • goonergooner SERLists.com
    @brandon - Did you use proxies for clean up? I suggest you don't.
  • BrandonBrandon Reputation Management Pro
    I use proxies the whole time. My point is that you'll lose sites if you just run the clean up. I'm sure running 1 thread and a 300 second timeout and 5 retries would eliminate that, but practically nobody does that.
  • goonergooner SERLists.com
    @brandon - You lose a lot less sites if you check without proxies, testing this a few times now and i will never use proxies for that again. Even with numerous retries and high timeout it still deletes perfectly good links.
  • ronron SERLists.com
    edited January 2014
    Thanks @Brandon @seowizzard @sven for your detailed explanations. That makes me a very happy guy to be able to do this safely.

    It is amazing over 18 months how many duplicate URLs and domains I have in my list. Not to mention the thousands/millions of dead links i have in my files.

    I think if you are going to use the clean list function, you must delete all the dup domains and urls first. My actual unique list (after trimming) was only 20% of the original size. What a huge waste of cleaning.
Sign In or Register to comment.