Skip to content

Country filter still giving me websites not in my region

Hi All,

Any advice here would be appreciated (newb here) - I've set up GSA Form Submit and run a link scraper just to get things started/build a list. I want to target a specific region (for example Germany) and I filtered out all other countries on the "Filter" tab and selected "Germany" only.

Once I've run GSA, I get a list of websites that, when I "check & scrape", it flags 100% of the websites as "filtered out" as they're all from outside Germany and thus irrelevant to my list - why would GSA be listing/scraping those sites in the first place if it was honoring the "regional" country filters?

I updated the filter by going to:
Tools > Project Settings > Filters

"Accept website with the following language" - Only German selected
"Accept only websites from the following countries" - Only Germany selected

Comments

  • SvenSven www.GSA-Online.de
    So your problem is not that filtering doesn't work, but that sites get added to the project and are filtered out afterwards?
    Sorry, but things like language and region can only be checked later on when the page is downloaded.
  • Thanks Sven, so GSA is going to fetch ALL website info it can possibly get its hands on, without being selective in any way, and hope that some of them match my filter criteria? Seems a bit clumsy to me though as surely it would be able to pick up things like IP address etc. and do some determination downloading & then scanning GEO etc?
  • SvenSven www.GSA-Online.de
    Well yes and no...this IP match is done before downloading the page and not cause unnecessary traffic.
    BUT in case of the language, it has to download the page and check the source and text to determinate it.
    But imaging this filtering would happen right after the import or search engine parsing...you would cause a block on the ip resolve and decrease speed.
Sign In or Register to comment.