Skip to content

Global Website Blacklist - What do you guys use?

Hey guys just wanted to ask here what everyone is using on their global website blacklist to filter out bad results to speed up checking and sending? 

Personally after running this here are the most common ones I found that help me the most.
*.gov
*.edu
*.xml
*.pdf
*.zendesk.com

Now since I am only scraping US websites, I also include a list of all domain extensions for anything outside of the us such as
*.ca
*.eu
*.co.uk
*.uk
*.jp
*.vn
*.au
*.nl

What is everyone else doing?

Comments

  • Few more that I have noticed that are a pattern

    !.aspx
    !.gz
    *.blogspot.com
    *.hub.biz
    !rss
    !podcast

Sign In or Register to comment.