Help Regarding Hrefer Scraping
I dont know where i am doing wrong but i am unable to scrape list for gsa ser using hrefer. I am totally confused where to put the footprints from gsa ser to hrefer.
I tried adding keywords to words and footprints to additive words. I am getting 0 links collected.
can anyone help me regarding this.
I tried adding keywords to words and footprints to additive words. I am getting 0 links collected.
can anyone help me regarding this.
Comments
Hrefer does work, but I'd recommend using the above selected engines only. Search engines such as Google, yandex, baidu, rambler are best avoided, unless you have some very special (expensive) proxies. Best to have them disabled as it can use up all the threads.
The proxies you use to scrape with make a lot of difference. I've tested with rotating mobile proxies as well as static datacenter and static residential proxies. They all work pretty good. One cheap option is rotating datacenter proxies from Litport. https://litport.net/pricing/datacenter-proxies But you will get better results by spending more on your proxies.
Other than that, you just put your footprints in "additive words" and add a list of keywords to the words file.
Love your username!
Only useful to use this option if you want very specific urls. Usually when using footprints with inurl:
For example, if your footprint is inurl:board.php?bo_table=free&wr_id=
then in your filter you will want to put: board.php?bo_table=
This will make the software collect urls that contain that filter in the url. It will check the urls of the sites that are scraped and filter accordingly.
If your footprints are in quotes, for example: "board.php?bo_table=" then it will scrape a different set of sites including blog comments or other T2 sites that other users use to backlink gnuboard sites. The search engine will return results where this footprint is found anywhere on the page.
In this case you want to disable the filter to make sure you collect all the links from the serps. Most of the scraped sites will not have "board.php?bo_table=" in the url so hrefer will not save those links.
Generally speaking, I normally disable the filter to collect everything related to the footprint. Then import target urls into a GSA SER project and let the software sort which links work and which ones don't.
It's quite surprising what GSA SER will find as a working link.