Skip to content

GSA Adds new duplicate target URL's and "Remove Dublicate URLs" doesn't work on them

Hi,

I'm testing my new scraped list and I noticed some weird things. What I'm doing is, I created a template project which I dublicate (only options and data) and then import target URL list (my list that I want to test) and accounts data. My list consists of about 44k sites, this list consists of unique URL's only (not unique domains, just urls)

First of all, here are the settings I use:
image

When I set status of template to active (target url list was empty) and started GSA, it added around 200 redirects to target url list and started link building on them.

GSA starts running, everything goes okay for a few minutes, then I go to "Show remaining target URLs" and I see there are 46k targets now, and I see some dublicate ones.

I try to remove duplicate url's, I get this response: 
image

and then I check target url list again:
image
As you can see it doesn't delete the dublicate url's..

After 5-10 minutes, I check the target URL list again and I see there are 55k sites now:
image
Again, I delete the duplicates:
image
and check target list again:
image

Even though GSA removed some duplicate targets, there still are some duplicates left.. and I don't understand where does it get those extra targets, I have search engines disabled.

I had this problem in the past too, it definitely isn't a bug of a new version or something.

I would really appreciate it if you guys can briefly explain me why all this happens, do I miss something in settings?

Thanks in advance!

Comments

  • edited June 2014
    Hi.
    How do you removing duplicates?

    You could try like this: enable just 1 engine and make your experiment with only 1 engine, say the most frequent in your list.


  • I delete duplicate URL's from Right click > Import target URL's >  Delete Duplicate URLs

    Well, even without adding my own list, even if I just make a new project and set it to active without adding any target url's, GSA adds some url shortener/redirect target list to it and start building links.
  • edited June 2014
    Well yes there are some engines with this behaviour. They are web 2.0 mostly, and fixed redirect sites like that and so on.  ;) Hence my suggestion to try on 1 type of engine.
  • You probably have scheduled posting enabled. If you have it enabled then that is filling up your cache as it registers an account multiple times if you have enabled it to do so.
Sign In or Register to comment.