Skip to content

Scraping General Q: How Can I Get Better Results?

DeeeeeeeeDeeeeeeee the Americas
edited October 15 in GSA Scrape Genie
I have tried scraping a few times. I have always just been discouraged by the results. Especially the fact that there are so many web sites that are super-authority sites and few targets to post links on. :( 
Is this normal to have some URLs that do not fit what you're looking for? 
I have noticed this every time I attempted to scrape targets. Right now I'm using some GSA SER footprints plus a KW.  But even years ago, I tried scraping and this also happened. 
Just need some guidance. Thanks, all....

Comments

  • sickseosickseo London,UK
    Where are you having issues?

    Are you scraping too many sites that don't work with the software? - You need to adjust your footprints - maybe test the footprint manually before running it through your scraping bot.

    Or are you just not returning any sites when scraping?

    There is no way to target just authority sites. You have to go through the process of scraping and testing to build the site list. Some will be authority sites, but most will be DR0-DR20 sites.

    These can still have value, but you'll need to run extra campaigns to boost their DR value to make them valuable. Just running tiers with your site list will eventually boost the authority of all sites in your list.
    Thanked by 1Deeeeeeee
  • DeeeeeeeeDeeeeeeee the Americas
    edited October 15
    sickseo said:
    Where are you having issues?
    Hmm...To be totally honest, I have never had any success with this aspect of SEO.   My attempts years ago with Scrapebox were dismal.   
    sickseo said:
    Are you scraping too many sites that don't work with the software? - You need to adjust your footprints - maybe test the footprint manually before running it through your scraping bot.

    Or are you just not returning any sites when scraping?
    Using GSA Scrape Genie, I've found actual postable target URLs(!)  in the last year, with projects that use footprints for a single engine and no KW.  So that was better than I've ever had scraping work, but I wanted to be able to have a project use multiple footprints as well, and with KWs,.

    What I'm getting now having imported ALL Article Footprints from Engine pane in SER PLUS three KWs, each on a different line, are sites that are not good targets. :worried: Like, literally NONE.
    Is that to be expected?

    I am trying to see if KW "footprint" works better than
    "Footprint""KW"  <--- as I had this
    and
    "Footprint" KW

    Hmm....That seems to be one issue. 
    Is that usual as well? I'm getting actual results on Bing with KW first, followed by footprint. Footprint first, and there were no results. (manual searches)

    I am getting only one result for  a few, like a site with a list of footprints. :|
  • sickseosickseo London,UK
    Deeeeeeee said:
    What I'm getting now having imported ALL Article Footprints from Engine pane in SER PLUS three KWs, each on a different line, are sites that are not good targets. :worried: Like, literally NONE.
    Is that to be expected?
    Yes, that sounds about right. Simply because most of the default engines don't have working sites anymore. They were spammed to death years ago. You need to fine tune your list of footprints so that you only use ones that return sites that work.

    You'll need to test each main footprint first, one by one. If it yields no results, that means that adding a keyword on the end will also yield zero results.

    You'll eventually filter out the non-working footprints and end up with a list of footprints that do work. So don't be dis-heartened by the lack of sites. 

    For article sites, the only engines you'll still find sites for are:

    gnu board
    dwqa
    osclass
    classipress
    bbpress
    wp foro
    buddypress
    moodle
    joomla k2
    drupal
    question 2 answer
    xpress engine

    If it's not on that list, you won't find sites for it. It's a similar situation for forums, social network and wiki sites - only some of these engines will have working sites avaiable. You'll just have to test each one and see what results you get. You only need to test it once - if you don't scrape anything with the footprint, then move onto the next one.

    That's why I invested into the new sernuke engines. The git alikes package yields the most sites - 2000+ No other engines in the software will yield that many sites anymore.

    3 years ago we used to get thousands of gnu board sites - not anymore - it's in the hundreds now.

    7+ years ago we had thousands of joomla k2 sites - not anymore I'm down to 3 sites lol Google have made it impossible to scrape these with footprints - they've blocked them.

    These engines are all public link sources - they will go through a cycle of being spammed to death and eventually site owners abandon their sites.

    Different search engines will also behave differently - as you mentioned with Bing - I see similar things happenning with seznam, yandex, aol and duck duck go. They have different search operators which is worth researching further into. What works in Google won't necessarily work in other search engines.

    That's why you should test the footprint first manually. Only then can you automate it once you've tested that it works manually. Just keep testing and adjust your strategies. Sounds like you're on the right track, so keep at it.
    Thanked by 1Deeeeeeee
  • DeeeeeeeeDeeeeeeee the Americas
    edited October 15
    sickseo said:

    Yes, that sounds about right. Simply because most of the default engines don't have working sites anymore. They were spammed to death years ago. You need to fine tune your list of footprints so that you only use ones that return sites that work.

    You'll need to test each main footprint first, one by one. If it yields no results, that means that adding a keyword on the end will also yield zero results.

    You'll eventually filter out the non-working footprints and end up with a list of footprints that do work. So don't be dis-heartened by the lack of sites. 

    .....

    That's why you should test the footprint first manually. Only then can you automate it once you've tested that it works manually. Just keep testing and adjust your strategies. Sounds like you're on the right track, so keep at it.

    Ohhhhhhhhhhhhhh  OK. I get this now. :relieved:  I totally understand. I've been clueless and tried and tried but just moved on to other things each time bc this was all I got. I REALLYYYYYYYYYYYYYYYY appreciate the info, this will keep me eating, I hope. lol Sad about the shrinking Internet -- I think about a lot.
  • londonseolondonseo London, UK
    sickseo said:
    Deeeeeeee said:
    What I'm getting now having imported ALL Article Footprints from Engine pane in SER PLUS three KWs, each on a different line, are sites that are not good targets. :worried: Like, literally NONE.
    Is that to be expected?
    Yes, that sounds about right. Simply because most of the default engines don't have working sites anymore. They were spammed to death years ago. You need to fine tune your list of footprints so that you only use ones that return sites that work.

    You'll need to test each main footprint first, one by one. If it yields no results, that means that adding a keyword on the end will also yield zero results.

    You'll eventually filter out the non-working footprints and end up with a list of footprints that do work. So don't be dis-heartened by the lack of sites. 

    For article sites, the only engines you'll still find sites for are:

    gnu board
    dwqa
    osclass
    classipress
    bbpress
    wp foro
    buddypress
    moodle
    joomla k2
    drupal
    question 2 answer
    xpress engine

    If it's not on that list, you won't find sites for it. It's a similar situation for forums, social network and wiki sites - only some of these engines will have working sites avaiable. You'll just have to test each one and see what results you get. You only need to test it once - if you don't scrape anything with the footprint, then move onto the next one.

    That's why I invested into the new sernuke engines. The git alikes package yields the most sites - 2000+ No other engines in the software will yield that many sites anymore.

    3 years ago we used to get thousands of gnu board sites - not anymore - it's in the hundreds now.

    7+ years ago we had thousands of joomla k2 sites - not anymore I'm down to 3 sites lol Google have made it impossible to scrape these with footprints - they've blocked them.

    These engines are all public link sources - they will go through a cycle of being spammed to death and eventually site owners abandon their sites.

    Different search engines will also behave differently - as you mentioned with Bing - I see similar things happenning with seznam, yandex, aol and duck duck go. They have different search operators which is worth researching further into. What works in Google won't necessarily work in other search engines.

    That's why you should test the footprint first manually. Only then can you automate it once you've tested that it works manually. Just keep testing and adjust your strategies. Sounds like you're on the right track, so keep at it.

    @sickseo - is it that Google has blocked joomla k2 sites or Joomla made some changes?
    Thanked by 1Deeeeeeee
  • sickseosickseo London,UK
    google has blocked the footprint from returning results. Footprints may work with other search engines but I haven't found any that work yet.
    Thanked by 2londonseo Deeeeeeee
Sign In or Register to comment.