Global Website Blacklist - What do you guys use?
Hey guys just wanted to ask here what everyone is using on their global website blacklist to filter out bad results to speed up checking and sending?
Personally after running this here are the most common ones I found that help me the most.
*.gov
*.edu
*.xml
*.pdf
*.zendesk.com
Now since I am only scraping US websites, I also include a list of all domain extensions for anything outside of the us such as
*.ca
*.eu
*.co.uk
*.uk
*.jp
*.vn
*.au
*.nl
What is everyone else doing?
Personally after running this here are the most common ones I found that help me the most.
*.gov
*.edu
*.xml
*.zendesk.com
Now since I am only scraping US websites, I also include a list of all domain extensions for anything outside of the us such as
*.ca
*.eu
*.co.uk
*.uk
*.jp
*.vn
*.au
*.nl
What is everyone else doing?
Comments
!.aspx
!.gz
*.blogspot.com
*.hub.biz
!rss
!podcast