Skip to content

Help Regarding Hrefer Scraping

I dont know where i am doing wrong but i am unable to scrape list for gsa ser using hrefer. I am totally confused where to put the footprints from gsa ser to hrefer.

I tried adding keywords to words and footprints to additive words. I am getting 0 links collected.

can anyone help me regarding this.

Comments

  • sickseosickseo London,UK


    Hrefer does work, but I'd recommend using the above selected engines only. Search engines such as Google, yandex, baidu, rambler are best avoided, unless you have some very special (expensive) proxies. Best to have them disabled as it can use up all the threads.

    The proxies you use to scrape with make a lot of difference. I've tested with rotating mobile proxies as well as static datacenter and static residential proxies. They all work pretty good. One cheap option is rotating datacenter proxies from Litport. https://litport.net/pricing/datacenter-proxies But you will get better results by spending more on your proxies.

    Other than that, you just put your footprints in "additive words" and add a list of keywords to the words file. 

    Love your username!

  • Thank you man for helping me. I have been reading all your methods, threads and comments in this forum and really a great fan. And I am implementing almost everything you have shared over here. Hence the name arose "Sickseofan". I will try the above and share data.

    Thanked by 1sickseo


  • This is what I am getting now links are being collected.


  • sickseosickseo London,UK
    edited March 20
    Probably best to disable the filter. Look under parsing options in the menu and check the box marked "Disable filtering harvested links filter". Otherwise it will only collect links that match your filter.


    Only useful to use this option if you want very specific urls. Usually when using footprints with inurl:

    For example,  if your footprint is inurl:board.php?bo_table=free&wr_id=

    then in your filter you will want to put: board.php?bo_table=

    This will make the software collect urls that contain that filter in the url. It will check the urls of the sites that are scraped and filter accordingly.

    If your footprints are in quotes, for example: "board.php?bo_table=" then it will scrape a different set of sites including blog comments or other T2 sites that other users use to backlink gnuboard sites. The search engine will return results where this footprint is found anywhere on the page.

    In this case you want to disable the filter to make sure you collect all the links from the serps. Most of the scraped sites will not have "board.php?bo_table=" in the url so hrefer will not save those links.

    Generally speaking,  I normally disable the filter to collect everything related to the footprint. Then import target urls into a GSA SER project and let the software sort which links work and which ones don't.

    It's quite surprising what GSA SER will find as a working link.

Sign In or Register to comment.