Filter out non-English characters from collected keywords

Somehow it seems I'm gathering a lot of non-english keywords.
I know i can stop collecting keywords from target sites, and using them to find new target sites, but I'd like to keep doing this to ensure a broad keyword list.
But the non-English keywords only slow down the process of finding new sites, because i'll block whatever it finds as soon as it discovers that the page isn't in English.
Would it be possible to filter the language of the keywords, as well as the target pages?  Or at least exclude any non-English / high-unicode characters from the anchor texts?


