Skip to content

PDF's and Images being scraped when only article sources checked?

I only have article text sources enabled, no image sources but I'm seeing a lot of these pdf urls being scraped but doesnt seem to be able to read any of the content as they all say 0 chars?

[11:19:01] Result 1206 - Length of 0 chars is to short. for https://www.domainimscraping.org/wp-content/uploads/2019/12/blahblah.pdf

Also seeing a lot of image urls being scraped as well which im not sure what the point would be as there is no text there to scrape?

Comments

  • SvenSven www.GSA-Online.de
    That 0 chars indicates that the url was filtered before and not downloaded
  • Sven said:
    That 0 chars indicates that the url was filtered before and not downloaded
    thanks, so thats not taking up any additional resources and slowing anything down? why is it even showing up in the log then?
  • SvenSven www.GSA-Online.de
    just to be complete ;)
Sign In or Register to comment.