Skip to content

Feature Request : remove old non working domains from our global list

HI,

I use gsa ser for the past 2 years. This time you can imagine i gathered decent sized databases, the problem now is that they are too big for me. I need a way to delete [Download Falied] websites. I have 80%+ sites with this message because my database is old and it just saved all. Thing is that it takes me too much time to work with global lists, it should be faster but not in my case...
So if there could be an option that would check the platform of that target url, if it still is phpfox for example it will stay in the list, if it gets download falied message and its not due to proxy issue then it will get removed.

My identified list is 298+ mb and yes i DID removed the duplicate urls but duplicate domains aswell before checking the identified list size !

Let me know what you think @Sven and others.

Comments

  • SvenSven www.GSA-Online.de
    Thats on the to.do list already...will try to move it up in priority ;)
  • +1 for me too
    may be something like re-verification of URLs elsewhere in SER
    a ping to all sites in global list by category (identified, submitted, verified)
    a simple ping to each site
    started manually as needed
  • @eLeSlash

    while a built in SER version of live check surely would be fastest and best,
    there is an instant working solution for SB owners I just tested (still running) after above post

    use live check addon from SB
    load one by one the URL lists (from identified, etc) one engine at a time,

    run live check with HIGH connections (may be 2-3 times faster than for submissions or scrapes)
    run,
    then save / overwrite LIVE knks to original file in SER

    one at a time
    my list is only about 3 months old and most from recent 4-6 weeks SB
    yet some 35-50% are dead

    by tomorrow I may be finish and let you know the LpM improvements in efficiency

    it's many operations because you have to do file by file
    but it is an instant solution until same feature as one click verify in SER available
  • edited October 2013
    Thing is i have gsa ser for 2 years now. You can imagine alot of things changed, i have 443 .txt files on my identified folder.
  • BrandonBrandon Reputation Management Pro
    Same with me @eLeSlash, I combine all of my servers and have the same problem.

    I would prefer to see the sites removed from SER if they fail X times. After X fails they're removed from Verified Site Lists.
  • if you have SB = did you do a live check as a.m. ??

    last night I did most using SB ive check addon
    depending on engine up to 80-90 % were dead
    average about 35-40%

    its a solution while waiting for sven to have time to implement live checks of global lists into SER
  • SvenSven www.GSA-Online.de
    next version has this
  • *thumbs up*
  • Great addition
  • MrXMrX Germany
    Nice @Sven <3
  • BrandonBrandon Reputation Management Pro
    Working great @Sven!

    One question...I have hundreds of thousands of URLs for some platforms. If I tell it to recheck 3 times, will it do all three checks immediately?

    My concern is that if a site is overloaded and doesn't respond now, it might respond in a few hours after the other checks are done. Is there a way to check once, then move it to the back of the line if it fails?
  • SvenSven www.GSA-Online.de
    It does all checks one ofter the other.
  • Man... it's going to take HOURS to run through my global lists.
  • SvenSven www.GSA-Online.de
    Guess people have always something to argue ;/
  • edited October 2013
    I'm not arguing. I hope you didn't take it that way. I love this feature. I already knew it was going to take a long time when you implemented it lol.
  • donchinodonchino https://pbn.solutions
    It ran all night and i aborted it in the morning, unticked identified list in options (and moved away all the files from original folder), to only clean up successful and verified lists. Now it has ran all day and the number of processed links is larger than my successful and verified lists together (checked them from tools -> statistics). I wonder what it is cleaning up now.. the whole internet lol
  • donchinodonchino https://pbn.solutions
    Cause the yellow status bar is showing only 3mm out of 10cm.. so it's def cleaning up the whole internet...
  • donchinodonchino https://pbn.solutions
    @Sven I now checked and under statistics my Successful list has gone up by 20k while the clean up process. How is this logical?
  • SvenSven www.GSA-Online.de
    @donchino thats not logical at all no, do you use different folders for each site list?
  • donchinodonchino https://pbn.solutions
    I haven't changed anything from default setup.
  • donchinodonchino https://pbn.solutions
    @Sven to add here.. I was using "disable proxies" and retry 3x times. After clean up (which I aborted in the middle), I ran remove duplicates again and it removed 2-3k duplicates from url and domains. But I already ran it before the clean up also, so the clean up process was creating duplicates somewhy. Still I have 20k successful more after running clean up.
  • BrandonBrandon Reputation Management Pro
    Mine is running fine, it's been running for about 24 hours, but still going.  I tested it and verified it's working by importing all verifieds (192k) then testing that again this morning (102k) so it removed about 90k unworking domains overnight.
  • donchinodonchino https://pbn.solutions
    @Brandon but it didnt create you any duplicates?

    I don't see why it should work differently for me, as I haven't made any custom modifications to Ser, everything is set up as default
  • BrandonBrandon Reputation Management Pro
    @donchino duplicates of what? and where?
  • donchinodonchino https://pbn.solutions
    edited October 2013
    It is creating some duplicate urls and domains to sitelists (saw them when running "remove duplicates" after the clean up). Now I am running only verified list as it is the smallest and seems it will get it done. Will try the successful one also to run alone.
  • Just wondering if this might work better if it was done as part of the submission process rather than as a batch job?

    ie for verified links GSA will check them a few times, and if they persistently fail they will be removed. Could we not just check if a domain is failing each time we try and post to it and remove it after $x retries?
  • BrandonBrandon Reputation Management Pro
    Duplicates are created through the searching process all day long which is why it's so important to remove dupes on a regular basis.
  • SvenSven www.GSA-Online.de
    @namdas no, this will not work as it would require the program to parse the files each time it detects a failed submission. That would cause a total lose in speed.
Sign In or Register to comment.