Skip to content

Scraping questions

1. Do you guys de dupe at the url level or the domain level?
2. How do you import these links into gsa ser to yield the highest amount of verified? Importing directly to 5 projects and just letting it whirl? 
3. Do you have a dedicated machine just for the processing of these links? 
4. If not for the google proxy ban, url footprints are always better than word footprints?


  • SvenSven

    1. you can do both with letting the checkboxes as default. It will remove domains for engines there the URL is not relevant and remove url duped for engines like blog comments keeping the urls with comment forms intact.

    2. import directly is improving speed

    3. no, not me

    4. yes

Sign In or Register to comment.