duplicate urls

0
Hello,

I have a question regarding GSA SER ...

When GSA SER removes duplicate urls, does it remove duplicate urls from each .txt file containing the urls?

Or does it remove the duplicate urls from all files that have in the "Verified URLs" folder?

Allan

Comments

  • 0
    SvenSven www.GSA-Online.de
    Accepted Answer
    it removed dupes per file.
  • 0
    Accepted Answer
    GSA is indexing duplicate contents which takes up additional license count. 
    GSA considers each URL as a unique entity (unless their URLs are exactly similar) and compares their content checksum for duplicates. 
    Remove URLs which have duplicate content.
    GSA has Infinite space detection (Crawl and Index > Duplicate Hosts ) to configure the number of identical documents required to detect duplicate contents. It can also be used to detect repetitive path or query strings. Infinite space detection happens during crawling and hence if the URL has already been crawled, recrawl is necessary for the contents to be deleted.
    If two content servers serve the same content (with different URLs), then they can be defined as duplicate hosts (Crawl and Index > Duplicate hosts)). 
Sign In or Register to comment.