Skip to content

GSA SER URL scraper question

When I use the URL scraper(search online for URLs), does it include an URL in 'Identified' count only if it's a new URL(not already in a global list)? Or all scraped URLs are counted?

Comments

  • SvenSven www.GSA-Online.de
    edited October 2014
    It does not check for duplicates. It adds it as new entry. You have to de-dupe afterwards.
  • So the duplicates can be present even within the scrape? (let's say I have scraped 1000 identified - are they unique or not?)
  • SvenSven www.GSA-Online.de
    the scraped should be unique. Though they might already be in the site lists.
  • Cool, is it showing unique domains(except for blog posts etc) or just unique URLs?
    Also, when scraping SER visits the sites to determine which engine is it, right?
  • SvenSven www.GSA-Online.de

    1) unique urls

    2) yes

  • OK. A lot of forum users recommend to scrape with ScrapeBox because they say it's faster. Now I can see why - scrapebox doesn't visit the URLs and also doesn't give unique URLs in the scrap, but SER has to do that work anyway when importing. So it looks like scraping with scrapebox doesn't make any sense and just adds extra work(importing). What are your thoughts on this @Sven?
  • SvenSven www.GSA-Online.de
    Well to be honest I also don't see why ScrapeBox would be a better option. I can not really see why it woul dbe faster either.
Sign In or Register to comment.