anyone know of a bulk (multimillion) URL level compare function tool other than xr?
 googlealchemist                
                
                    Anywhere I want
googlealchemist                
                
                    Anywhere I want                
            
                    The url level compare function that scrapebox has, to comepare a new list to an old scrub list to avoid processing any of the same urls again. I need something that can do this with two many million url lists. I can only find tools that do it on a domain level, I need something for specific url level as I dont want to lose inner url of many domains like comments.
thanks
                
                            thanks
Comments
~$ sort urls.txt > sortedurls.txt
~$ uniq sortedurls.txt > dedupedurls.txt
That's it. No regex or sed needed unless you want to trim the urls in some way. You can download cygwin and do this on a windows box too.