Remove duplicates improvement

    A little improvement that should be very useful considering the number of txt files sitelists :

    Remove duplicate URLs only for blog comments, trackbacks and image comments.
    Remove duplicate domains for all other CMS.


  • I like, because I have some trouble from time to time when removing the dupes of the bigger files and GSA crashes.
    But I would like it more to have a feature to remove dupes for files that are bigger/lower 10 mb (as an example).
  • SvenSven
    Was added long ago (but forgot to update this thread).
  • With all due respect Sven it's been on my list to point this out too.  A week ago it took me hours to dedupe my lists, bit by bit on 2 machines.

    What would make it SO much faster is if we could select the text file names by dragging.  Now, I have to click checkboxes for ages, 800+ of them a few at a time because if I do more than 50 then the server grinds to a halt.

    It's just the selection process that needs improving, select all, select none, and toggle is not enough!
    Yes i agree, make it by main platforms and small platforms please, just like the left engines project window[Edit]. Also allow us to use shift and ctrl for multiple selections.
    Just imported another 15 million URLs across 2 installs... can't wait to dedupe them!

    We just need a way to select groups of link lists (maybe by type, like directory etc) so we don't ave to spend all day clicking checkboxes, pretty pretty please :)
  • AlexRAlexR Cape Town
    @team74 - totally agree! 

    Check out:

    I also spend all day clicking and selecting projects. :-)
  • Maybe time to make a new thread? Right now I have to open up the largest files in notepad++ and dedupe them that way to try and lessen the load for GSA.  Maybe I need to use linux to do this task!
  • AlexRAlexR Cape Town
    The other option would be if we could group the projects in the ser project folder. I.e group .prj files
