Can GSA ignore the language of a source?
henningnet
Germany
Hey guys, hey Sven,
I tried to download like 450 articles (custom sources via URL) from one site with the "Same Article function", which is in German.
In the source code its
Is there any way to tell GSA to ignore the language? Lets say I would like to scrape a multilanguage website also, for this it would make sense?
Thanks for the help!
I tried to download like 450 articles (custom sources via URL) from one site with the "Same Article function", which is in German.
In the source code its
<html class="no-js" lang="en-US"> <!--<![endif]--> |
Of course all articles are in German. So GSA won't download any articles when its set to German, because it tells me that its an unwanted language in the source. Changing the setting in GSA to "English" then gives me "unwanted unicode language DE/SI etc". So it again does not download the articles. |
Is there any way to tell GSA to ignore the language? Lets say I would like to scrape a multilanguage website also, for this it would make sense?
Thanks for the help!
Comments
that was fast!
Any idea whats happening here? Filter is empty. Number of articles is set to Max, number of words is set to 1-30000, everything else is unchecked.
I'm willing to try this as beta and check the results I'll get if you want.