Skip to content

Not many verified links

Hello,  I have been searching for the answer to this question on many forums and i hope that i get one here.  My question is this, i use scrapebox to get a list of sites to enter into gsa.  This is how , i get the footprints from gsa (say article footprints) I copy them.  I go to scrapebox and enter in a list of keywords and merge that list of keywords with the file of footprints that i just got from gsa.  I scrape and scrape and scrape.  In the end i have this huge list of urls that i need to do something with.  This is what i do.  First thing that i do is to remove duplicates, so now i have a list of deduped urls.  My question is this, do i use this list of urls in GSA? 
Another way that i heard to do this is to remove duplicates, then trim to last and then finally trim to root and then use this list in GSA?
I am confused on which method to do.
Can some please explain in detail on how to correctly scrape using scrapebox and then the setting to clean the list in GSA?  I would really appreciate it, thanks in advance,


  • goonergooner
    You can just import them straight into projects. No real need to mess with them really.

    Or if you want to be sure, set up 2 projects... On the first project import as is and on the 2nd project trim to root first.

    Then see which gets more links.
  • I have done it the way that you suggest above and with a list of 45k, just to try, i ended up with 20 verified links, that just not seem right to me.  I would have thought that i would have had a lot more.  When you are cleaning your list in GSA, what site do you use for the url?  I just did a search and found a wiki that was realated and used that and a few keywords and anchors. Brought in a new article and emails and got 20 verified links, which really got me made. Something else is wrong, but i don't know what that something is.  I am for sure doing something wrong. Suggestions? Thanks
  • goonergooner
    Well aside from the obvious things, like checking your proxies work, ensuring your timeout is high enough and you have selected all engines and you have good emails etc.

    My guess would be that you haven't scraped great targets. The default footprints used in SER are sometimes not the best. You need to trim them down to find the ones that gets best results, the only way to do that is to scrape them individually and test how many URLs you get.

    Once you have that nailed down you can look at finding better footprints than those in SER.
  • ronron
    edited May 2014

    I think @gooner can chime in here with more exact percentages, but I believe an 'average' % for making verifieds out of a scrape is 2%.

    What your data is telling me is that your issue is on the scrape. You simply need to learn better methods to do it. And there is nothing like investing in your own knowledge to get better.

    I would recommend the course from @donaldbeck here:

    I bought it, I liked it, and he did an excellent job with the videos. I think that is a great starting point for you.

    p.s. Scraping is an art form. You can only get better through trial and error. The better you become at scraping, the more you will help yourself. This isn't one of those deals where you ask a question and get an answer. It's one of those deals where you just need some good guidance to get started in the correct direction, and then just get better from experience.

  • goonergooner
    @ron is right, in the beginning 2% is a respectable number.

    Once you get better you can look for 5% or higher verifieds from a scrape (excluding engines that always yield a low percentage like Wordpress or a high percentage like XpressEngine).

Sign In or Register to comment.