[3 Feature Requests] Bad Words - Making It Super Useful!

AlexR · January 2013

I've been analysing my log files today and have noticed that a massive amount of sites are getting thrown out due to my bad words list. It's not a big list, but a lot is getting rejected because of this.

I know there 2 options in GSA for badwords:

1) In URL/Domain

2) On page

With a little digging, it seems that it is the second filter (i.e. finding the bad word on the page from other comments, etc) that is causing the difficulty.

FEATURE REQUEST 1 - look at the PAGE TITLE & DESCRIPTION when checking bad words:

I'd like a third option also look at the PAGE TITLE & DESCRIPTION when checking bad words. This would make this a perfect option to select. Looking at the URL/Domain is too restrictive in my opinion, while selecting the entire page is just throwing out too many good targets because of 1 bad word being found! (It seems that the bad words in domain a rejecting domains like "blogspot.com" when you have the bad word "gspot" - see below)

FEATURE REQUEST 2 - when it has found xx or more bad words on the page, then it rejects the page.

Have the option to set bad word trigger amount. I.e. when it has found xx or more bad words on the page, then it rejects the page. This way, it can reject pages that have been targetted by bad sites and reject them, rather than rejecting a very good site, because 1 comment has 1 bad word! I have found this happening far too often and so many good sites have been rejected because of this!

FEATURE REQUEST 3 - Set a .txt file as the source, so you can edit 1 file rather than 100+projects!:

Allow us to set a standard .txt file as the source for both bad words on page and bad words in domain. This way, we can have 1 or 2 bad words lists as a .txt file, and link the 100 projects to them. We can then edit or update 1 .txt file, without having to edit 100+ projects. (Yes, I know you can select multiple and edit options, but I have different SE's and option settings, so it merges these option settings when I select multiple projects). This would be super super useful!

AlexR · January 2013

@Sven -

1) Does it also use partial matches with bad words? Like in "ass" will reject page "badass" or "assume" or in "ass-crazy"

2) What about when it's in the URL/domain? Like "www.site.com/this-is-a-badass-website"?

E.g.: Check this log out...

[-] 04/38 filter "gspot" matches domain of http://sbynews.blogspot.com/2011_03_15_archive.html

When I view this, the only reference when I press ctrl find, is "gspot" in the URL "blogspot"

Or here is a static page with no matches when I search the page:

05/38 filter "gspot" matches domain of http://scholarshippalace.blogspot.com/2012/10/funza-lushaka-teaching-bursary.html

And I can't find "gspot" except in the domain "scholarshippalace.blospot.co.at"

Ozz · January 2013

1) exact match only

2) not sure about this

Sven · January 2013

I will change it to accept only exact words for domains as well (right now it matches them even if not exact). If you still want to match gspot everywhere, you can use *gspot* as a mask.

AlexR · January 2013

@sven - Thanks! That will be far more useful for it to be exact match for URL's.

Also would it be possible for us to use a macro to load the bad words in both domain and page?

This way, we can have 1 or 2 bad words lists as a .txt file, and link the 100 projects to them. We can then edit or update 1 .txt file, without having to edit 100+ projects. (Yes, I know you can select multiple and edit options, but I have different SE's and option settings, so it merges these option settings when I select multiple projects). This would be super super useful!

Sven · January 2013

next version allows you to add a macro in the list

AlexR · January 2013

@Sven - you're on form! Thanks! This will save hours of work and make it so much more efficient.

I know you're super busy...Would feature 1 & 2 be possible at some point in the future when you have time?

FEATURE REQUEST 1 - look at the PAGE TITLE & DESCRIPTION when checking bad words:

FEATURE REQUEST 2 - when it has found xx or more bad words on the page, then it rejects the page.

(2 is more important than one in my opinion)

Sven · January 2013

FEATURE REQUEST 1 - right now it takes a look at all visible text on a html page. I don't think a separation would make a difference

FEATURE REQUEST 2 - sounds useful yes

AlexR · January 2013

@Sven - Feature 2 would be awesome. If feature 2 is implemented, then feature 1, could be removed since you can set level of bad word occurrence.

AlexR · January 2013

@sven - "next version allows you to add a macro in the list"

How do you do this? Just to check I add it as a string item. So I only have 1 item in the list, and this is the macro.

%spinfile-<filename>%

NocT · January 2013

^ Also curious about this

Sven · January 2013

@GlobalGoogler, %spinfile% as that would use one random line from it. You want %file% which will use all the content in the file.

zuzuzuzu · January 2013

I have bad words in the file but they are different from the the "sites
with the following words in URL". I also have them in file now. Anyway
we have to trace duplicate entries manually for both bad words and urls
in filter.

It also would be great if filter was possible to delete duplicate urls/bad
words during import process automatically, like many other software do.

My +1 for feature to specify how many times bad words must appear on the page to skip that site would also be great. For the page where "sex" appears for example 2-3 times doesn't necessarily mean this is porno site, it can be some medical theme site, but where "sex" continuous-solid whole page it's definitely not for seo.

[3 Feature Requests] Bad Words - Making It Super Useful!

Comments