500k url, 500 verified

I scraped 500k urls using scrapebox, imported them into SER. captcha solving: only CB, no manual captcha service. 100 threads. 10 private proxies. 12 hours later, the list was done, with only 500 verified. Does this sound right?


  • Of course you remembered to remove all the duplicate urls, right? Also if I'm not mistaken then you should also remove duplicate domains for everything but blog comments and image comments. If you havn't done either then it makes sense as you get tons of duplicates when scraping.
  • hard to say. depends what footprints you used to scrape...if you cleaned list before many urls you were promoting...etc.
  • edited January 2014
    I removed duplicate URLs but not duplicate domains. I scraped for articles, wikis, and social. I was promoting 20 urls, I was trying to build web 2.0s to my web 2.0s if that makes any sense.
