Skip to content

Comparing lists

Not sure if there is a free app - or an app for gsa users - but I saw that gsa remove dupes is much faster than the scrapebox one. I have a trouble when I have to remove urls containing smth or compare it on domain leavel when the list is bigger than 1 mln records - it's the scrappebox limitation. It's also a trouble when the first list is smaller than 1 mln - but the other lists we try to compare are like few million records - scrapebox is hanging all the time in such a case. Can you guide me an ap for this - I guess gsa module of remove duplicates as some separete addon could be a help here.

Comments

  • Any ideas on that?
  • goonergooner SERLists.com
    Scrapebox DupRemove addon can handle files larger than 1 million lines.
  • Yes - I know that - but only for the purpose of removing duplicates (btw the gsa remove dupes is much faster). It cannot remove urls on domain level or containing some lines.
  • i'm using gscraper for remove dups. 
  • goonergooner SERLists.com
    @busek - I'm not sure what you mean exactly, SB and GS do it and also SER dup remove i have used with sietlists of up to 7.5 million... It takes a long time but it works.
  • Thank guys for answers, but as I wrote in previous 2 posts - I am not looking to remove dupes - but to compare two lists and remove urls from one of them.
  • goonergooner SERLists.com
    @busek - Ohhhhh then you face a whole other challenge. As far as i am aware there is no commercially available software that can do that. I was searching for it as well.
  • Use compare plugin in notepad plus plus.
  • I think Text Wedge does this, but as ^^ says notepad++ is probably your best bet.
  • But have you tried doing it in notepad++? I can barely compare two lsits containing of few thousand results. I also know that even opening few million results file is hard. But I run it with a list of under one million and on the second I have only one keyword to compare - when I tried to compare them I got:

    image
Sign In or Register to comment.