GSA SER Patterns

Hello.. I have scraped a lot of links and gonaa import them via the platform identifier..Now before importing them i would like to filter out links which i dont want to get imported such as wordpress articles and whois.. Is there any specific pattern that gsa ser uses to identify them? If yes i would like to know them so that i can remove them using another tool before importing them to gsa ser..

