Skip to content

Using Scrapebox to scrape contextual/relevant lists - advice needed

I'm still a beginner when it comes to SEO in general, and especially using Scrapebox/scraping in general.

I believe, after some consideration, building my own list is the way to go, so I've downloaded/installed a copy of Scrapebox and have begun to figure out how to scrape a list to use with my GSA projects.

I'm not clear what strategy to use or how to go about this; so any help in this area would be appreciated.

PLAN A

For instance, I believe one way to build a list is to use footprints in Scrapebox. That basically means I downloaded a massive list of footprints from the internet - about 1276 keywords.

I then merged these with keywords from my site - I did this to get a list that was relevant to my niche/my competitors had used. After Harvesting, I ended up with 3,004 websites (removing duplicate URLs and duplicate domains).

My plan is simply to import this list to GSA and start building links on them.

PLAN B

My worry is that the 1200 footprints I downloaded, then merged with my keyword, are so generic that I won't find any quality sites. I don't know how to go about  getting contextual links another way.

My other idea is to get a list of my competitors and put their URLs in Footprint Factory to create a list of footprints that way. The put those footprints into scrape box to harvest sites for GSA.

I'm thinking this is a way to make them relevant to my site. I'm not actually sure about the next part (sorting them and importing them into GSA), but I think I'll figure it out once I know what I'm doing on this end.

PLAN C

Buy a list from Loopline, as this seems to be the preferred service around here.

Any input/advice from any experienced members would be greatly appreciated.

tia!

Comments

  • shaunshaun https://www.youtube.com/ShaunMarrs
    Crazy busy right now mate but I just had a quick look at your posts and here is a little feedback.

    For Plan A SER has its footprints available in its footprint studio feature under tools so you can get specific platform ones rather than getting loads of mixed ones from the internet. Also dont worry about them being niche relivant. I posted a list of footprints here with their base returns, the higher the number the better.

    For Plan B You can do that if you like but in my eyes it a waste of time because the vast majority of them wont be usable by SER and you will be scraping for these footprints wasting time when there was never a chance SER can post to them.

    For Plan C this is what I currently do and I am happy with it.

    There is also a Plan D, link extraction. It is crazy effective if you arnt bothered about the niche of the domains and it can scale exponentially for the first few days.


    Although I used to flip flop with what I used between my own list and purchased lists I am pretty set on a purchased list as I use SER totally differently than I used to so I dont see a point in wasting the time and effort of investing.

    My priorities would be...

    1 - Buy a list from Loopline and learn how to filter it for what you need.
    2 - Link extraction to build up a base list.
    3 - Footprint scraping.
  • thanks for your input sean! I was hoping you'd post :)

    A few things for me to think about here; especially link extraction. That's a new concept to me - does this involve having a seed list then running the harvester on that to extract more links? Then brininging those links (After filtering) into SER?

    Any further insight appreciated.

    Thanks again!
  • shaunshaun https://www.youtube.com/ShaunMarrs
    Yea basically.

    You need a basic list of blog/guestbook/image comment. Put them in the link extractor and select the internal button only then let it run. This will pull a fair few internal links for the blogs. If you are just starting out your seed list will probably be very small so run your internal link extractor on the list the first link extractor kicked out. I have never actually did this but in theory it will kick out more links.

    Then with the list this one kicked out change the radio button to external links and extract them. Essentially what it will now do is go off and get all the links those pages are linking to, SER cant post to most of them ut I then run this list through PI and then through a SER/CB rig that has does no other job than to verify these extractions and some keyword scrapes that I throw into its folders.

    I dont do this anymore as it took too much time and effort but it was very effiencient. I used to have a Server from SolidSEOVPS dedicated to this process and it would eat millions of URLS per day without breaking a sweat.
  • so i'm getting the message - it would be better for me to subscribe to Looplines list because to set up an efficient rig to be churning out good links for SER this way is very involved/costly in itself.

    ok thanks :)
Sign In or Register to comment.