Skip to content

Large discrepancy between total verified and verified lists

edited October 31 in Need Help
I'm a little confused as to why there's such a huge difference in the total number of URLs verified by my project and the total number of URLs in my verified sitelist directory files.

I scraped and filtered lists of target URLs by engine to push through GSA to build verified lists for another project. My identified lists contained around 500k target URLs and after running the project for around 24 hours GSA is showing the total number of verified URLs at 63k, and this is the only GSA project I currently have and it's setup to create 1 link per url.

I then imported my verified directory into scrapebox just to double check that everything was good with the verified lists, but there's less than 8,000 URLs between all the sitelist files.

Anyone have any idea why the actual verified lists are missing so many URLs?


Comments

  • sickseosickseo London,UK
    Have you removed duplicate domains from your site list? Your screenshot is showing lots of duplicate posts as it's only posted on 5444 unique domains.

    What did you use scrapebox to check? If the links were live or if there were duplicates?
  • edited October 31
    sickseo said:
    Have you removed duplicate domains from your site list? Your screenshot is showing lots of duplicate posts as it's only posted on 5444 unique domains.

    What did you use scrapebox to check? If the links were live or if there were duplicates?
    Duplicate urls were removed from the targeted lists since the project using the verified lists will limit posting by domain. However, I did make a mistake on the project producing those results since I did not see that "per URL" had been checked on.

    I happened to spot the "per URL" issue about 12 hours ago and disabled it, but since then the verified lists still only have a total of 8300 URLs, when an additional 20k URLs have been verified since.
  • sickseosickseo London,UK
    Removing duplicate urls and removing duplicate domains are 2 different things. So make sure to run the "remove duplicate domains" option and make sure that all the right engines are selected before processing it. There are some engines that aren't selected by default.

    The identified list with 500k targets is seriously big. If you've scraped these then I would not expect anywhere near 100% success rate. Scraping normally yields 1% verified links, depending on footprints. So anything above this is considered good.
  • sickseo said:
    Removing duplicate urls and removing duplicate domains are 2 different things. So make sure to run the "remove duplicate domains" option and make sure that all the right engines are selected before processing it. There are some engines that aren't selected by default.

    The identified list with 500k targets is seriously big. If you've scraped these then I would not expect anywhere near 100% success rate. Scraping normally yields 1% verified links, depending on footprints. So anything above this is considered good.
    I’m aware, intentionally left duplicate domains so the targeted lists would only have unique urls for different pages. Everything was pre-filtered & sorted by PI.

    The issue I’m trying to convey is that I expected all unique URLs with verified links to have been added to the verified lists if it does not already exist. However, that  does not appear to be the case.

    Anyone know what determines if an when a url is added to a verified list?
  • sickseosickseo London,UK
    I would expect some engines such as blog comments to have unique urls added as each url can be posted on, so each url would have to be added.

    Other engines such as articles shouldn't work that way as you only need the domain added to the verified list once for the software to post on that domain. There is no benefit in adding multiple article urls from the same site to the verified list.

    This is why you should remove duplicate domains excluding blog comments and some other engines which are already deselected when you use this option.
  • Got it, thanks
Sign In or Register to comment.