PDF's and Images being scraped when only article sources checked?

September 2022

I only have article text sources enabled, no image sources but I'm seeing a lot of these pdf urls being scraped but doesnt seem to be able to read any of the content as they all say 0 chars?

[11:19:01] Result 1206 - Length of 0 chars is to short. for https://www.domainimscraping.org/wp-content/uploads/2019/12/blahblah.pdf

Also seeing a lot of image urls being scraped as well which im not sure what the point would be as there is no text there to scrape?

September 2022

That 0 chars indicates that the url was filtered before and not downloaded

September 2022

Sven said:

That 0 chars indicates that the url was filtered before and not downloaded

thanks, so thats not taking up any additional resources and slowing anything down? why is it even showing up in the log then?

September 2022

just to be complete

PDF's and Images being scraped when only article sources checked?

Comments