[3 Feature Requests] Bad Words - Making It Super Useful!
AlexR
Cape Town
I've been analysing my log files today and have noticed that a massive amount of sites are getting thrown out due to my bad words list. It's not a big list, but a lot is getting rejected because of this.
I know there 2 options in GSA for badwords:
1) In URL/Domain
2) On page
With a little digging, it seems that it is the second filter (i.e. finding the bad word on the page from other comments, etc) that is causing the difficulty.
FEATURE REQUEST 1 - look at the PAGE TITLE & DESCRIPTION when checking bad words:
I'd like a third option also look at the PAGE TITLE & DESCRIPTION when checking bad words. This would make this a perfect option to select. Looking at the URL/Domain is too restrictive in my opinion, while selecting the entire page is just throwing out too many good targets because of 1 bad word being found! (It seems that the bad words in domain a rejecting domains like "blogspot.com" when you have the bad word "gspot" - see below)
FEATURE REQUEST 2 - when it has found xx or more bad words on the page, then it rejects the page.
Have the option to set bad word trigger amount. I.e. when it has found xx or more bad words on the page, then it rejects the page. This way, it can reject pages that have been targetted by bad sites and reject them, rather than rejecting a very good site, because 1 comment has 1 bad word! I have found this happening far too often and so many good sites have been rejected because of this!
FEATURE REQUEST 3 - Set a .txt file as the source, so you can edit 1 file rather than 100+projects!:
Allow us to set a standard .txt file as the source for both bad words on page and bad words in domain. This way, we can have 1 or 2 bad words lists as a .txt file, and link the 100 projects to them. We can then edit or update 1 .txt file, without having to edit 100+ projects. (Yes, I know you can select multiple and edit options, but I have different SE's and option settings, so it merges these option settings when I select multiple projects). This would be super super useful!
Comments
FEATURE REQUEST 1 - right now it takes a look at all visible text on a html page. I don't think a separation would make a difference
FEATURE REQUEST 2 - sounds useful yes
with the following words in URL". I also have them in file now. Anyway
we have to trace duplicate entries manually for both bad words and urls
in filter.
It also would be great if filter was possible to delete duplicate urls/bad
words during import process automatically, like many other software do.
My +1 for feature to specify how many times bad words must appear on the page to skip that site would also be great. For the page where "sex" appears for example 2-3 times doesn't necessarily mean this is porno site, it can be some medical theme site, but where "sex" continuous-solid whole page it's definitely not for seo.