Regarding scraping of public domain images

MerryBadger · January 2023

Hi,

I want to scrape high resolution images from public domain sites. This should be perfectly legal and lot of fun, but I got one problem. I am not sure how to scrape for example every page within one category. Often there will be a site like this:

"Category such and such"

lots of images

<---1,2,3,4,5,6,7,8,9..............234 --->

How do I set it up to scrape all those sites and make sure that it pulls out the image and not the thumbnail (preferably even the highest resolution version of that image as they often have more than one version due to download preferences)

Thanks for your attention

All the best

Mr. Badger

Sven · January 2023

You have settings to filter images by size or width/height. Thats where you can filter out the thumbnails.

You have also settings to define that an URL must have a certain word or path in it. Thats where you can skip parsing and downloading images.

Then you choose ADD->Parse URL and then it should get you all the images you want. If you have further questions, send me the URL and I have a look.

MerryBadger · January 2023

Hi, I would really like to see an example of how you would set it up. Could you send me a snapshot?

The URL I want (and I want the highest possible resolution) is:

https://freevintageillustrations.com/illustrations/vintage-animal-illustrations/

I hope you can teach me how so I can utilize this great tool in the best possible way.

All the best
Mr. Badger

MerryBadger · January 2023

one more thing. I want to set it to only parse images with a minimum of 3873x4814 pixels

Sven · January 2023

Latest Update lets you parse this site via keywords.

However, when doing it next time manually, you need to enter the URL like:

https://freevintageillustrations.com/illustrations/vintage-animal-illustrations/?sf_paged={1-51}

Levels to parse set to 1

Set min/max resolution in settings

MerryBadger · February 2023

Hi, the image spider is not responding anymore. Is there a way to reset all the settings to default?

Sven · February 2023

Sorry no, but we better try to solve the issue if there is one.

Can you still click buttons in GUI or is it frozen? If not, click HELP->Create bugreport.

Regarding scraping of public domain images

Comments