Not many verified links
Hello, I have been searching for the answer to this question on many forums and i hope that i get one here. My question is this, i use scrapebox to get a list of sites to enter into gsa. This is how , i get the footprints from gsa (say article footprints) I copy them. I go to scrapebox and enter in a list of keywords and merge that list of keywords with the file of footprints that i just got from gsa. I scrape and scrape and scrape. In the end i have this huge list of urls that i need to do something with. This is what i do. First thing that i do is to remove duplicates, so now i have a list of deduped urls. My question is this, do i use this list of urls in GSA?
Another way that i heard to do this is to remove duplicates, then trim to last and then finally trim to root and then use this list in GSA?
I am confused on which method to do.
Can some please explain in detail on how to correctly scrape using scrapebox and then the setting to clean the list in GSA? I would really appreciate it, thanks in advance,
Steve
Another way that i heard to do this is to remove duplicates, then trim to last and then finally trim to root and then use this list in GSA?
I am confused on which method to do.
Can some please explain in detail on how to correctly scrape using scrapebox and then the setting to clean the list in GSA? I would really appreciate it, thanks in advance,
Steve
Comments
Or if you want to be sure, set up 2 projects... On the first project import as is and on the 2nd project trim to root first.
Then see which gets more links.
My guess would be that you haven't scraped great targets. The default footprints used in SER are sometimes not the best. You need to trim them down to find the ones that gets best results, the only way to do that is to scrape them individually and test how many URLs you get.
Once you have that nailed down you can look at finding better footprints than those in SER.
I think @gooner can chime in here with more exact percentages, but I believe an 'average' % for making verifieds out of a scrape is 2%.
What your data is telling me is that your issue is on the scrape. You simply need to learn better methods to do it. And there is nothing like investing in your own knowledge to get better.
I would recommend the course from @donaldbeck here: https://forum.gsa-online.de/discussion/7406/ultimate-gsa-ser-list-building-video-guide-video-case-study/p1
I bought it, I liked it, and he did an excellent job with the videos. I think that is a great starting point for you.
p.s. Scraping is an art form. You can only get better through trial and error. The better you become at scraping, the more you will help yourself. This isn't one of those deals where you ask a question and get an answer. It's one of those deals where you just need some good guidance to get started in the correct direction, and then just get better from experience.
Once you get better you can look for 5% or higher verifieds from a scrape (excluding engines that always yield a low percentage like Wordpress or a high percentage like XpressEngine).