When we are using SB and/or other tools to scrape for URLs to add into GSA, is there a certain format we have to adhere to? i.e. Do we need to trim to root or append anything before adding?
Definitely don't trim to root, but you should almost always remove dupe domains. If you trim to root you will miss things like http://www.example.com/pligginstall/. Removing dupe domains will lose /pligginstall2/ and /jcowinstall/ but it will be less likely that someone has installed a bunch of open source software packages and it's not worth the time IMO.
It can be beneficial to run your scrapes in separate batches. According to Sven, you should remove duplicate URLs for blog & image comment sites and remove duplicate domains for everything else.
Comments