Help Regarding Hrefer Scraping

March 2025

I dont know where i am doing wrong but i am unable to scrape list for gsa ser using hrefer. I am totally confused where to put the footprints from gsa ser to hrefer.

I tried adding keywords to words and footprints to additive words. I am getting 0 links collected.

can anyone help me regarding this.

March 2025

Hrefer does work, but I'd recommend using the above selected engines only. Search engines such as Google, yandex, baidu, rambler are best avoided, unless you have some very special (expensive) proxies. Best to have them disabled as it can use up all the threads.

The proxies you use to scrape with make a lot of difference. I've tested with rotating mobile proxies as well as static datacenter and static residential proxies. They all work pretty good. One cheap option is rotating datacenter proxies from Litport. https://litport.net/pricing/datacenter-proxies But you will get better results by spending more on your proxies.

Other than that, you just put your footprints in "additive words" and add a list of keywords to the words file.

Love your username!

March 2025

@sickseo

Thank you man for helping me. I have been reading all your methods, threads and comments in this forum and really a great fan. And I am implementing almost everything you have shared over here. Hence the name arose "Sickseofan". I will try the above and share data.

March 2025

@sickseo

This is what I am getting now links are being collected.

March 2025

Probably best to disable the filter. Look under parsing options in the menu and check the box marked "Disable filtering harvested links filter". Otherwise it will only collect links that match your filter.

Image: https://forum.gsa-online.de/uploads/editor/7f/k67sjljriau4.png

Only useful to use this option if you want very specific urls. Usually when using footprints with inurl:

For example, if your footprint is inurl:board.php?bo_table=free&wr_id=

then in your filter you will want to put: board.php?bo_table=

This will make the software collect urls that contain that filter in the url. It will check the urls of the sites that are scraped and filter accordingly.

If your footprints are in quotes, for example: "board.php?bo_table=" then it will scrape a different set of sites including blog comments or other T2 sites that other users use to backlink gnuboard sites. The search engine will return results where this footprint is found anywhere on the page.

In this case you want to disable the filter to make sure you collect all the links from the serps. Most of the scraped sites will not have "board.php?bo_table=" in the url so hrefer will not save those links.

Generally speaking, I normally disable the filter to collect everything related to the footprint. Then import target urls into a GSA SER project and let the software sort which links work and which ones don't.

It's quite surprising what GSA SER will find as a working link.

July 22

Glad you got it working, @Sickseofan. @sickseo's advice is exactly right.

Hrefer's URL filter is too strict for most GSA SER footprints (especially quoted text footprints). Disabling it is the standard practice. Just remember to pair this with a solid rotating proxy setup in Hrefer to avoid search engine IP bans during the scrape, and let GSA SER handle the actual link verification afterward. Much cleaner workflow

Help Regarding Hrefer Scraping

Comments