Skip to content

Is there a way to dedupe like this?

So far I've scraped and deduped with scrapebox. However, when it comes to removing duplicates, SB will do the following:

List 1:
SITEA.com
SITEB.com

List 2:
SITEB.com
SITEC.com

after merging and removing dupes, the end file will be this:

List 3:
SITEA.com
SITEB.com
SITEC.com

However, is there also a way to dedupe like this:

List 1:
SITEA.com
SITEB.com

List 2:
SITEB.com
SITEC.com

-> dedupe ->
List 3:
SITEA.com
SITEC.com

In other words, a way to remove also the first occuring instance of a duplicate.

In the first example, there was a duplicate (i.e. 2x the same URL) but only one of those was removed in the final file.
In the second example, there was a duplicate (i.e. 2x the same URL) and both URLs were removed.

Hopefully I made sense. If you want to know why I want to do this feel free to ask but in order to not make this post even longer I'll just leave that out for now.

Would be great if someone had an idea, thanks!
Sign In or Register to comment.