Skip to content

Lets get a poll: who scrapes their own lists and who lets GSA scrape and why?

I've heard compelling arguments for both and Im still not quite decided tho Im still veering towards letting GSA scrape due to the obv hands free nature tho experimenting with adding lists and blasting currently. So Im interested to hear others opinions on the matter one way or otehr and the reasons for such.
«1

Comments

  • goonergooner SERLists.com
    I think you are already know my answer so i won't bother :P lol
  • Scraped 1.6 million from that BHW guide. Run it and got 34 verified from 15k submitted. Gave up (for now at least) and let SER do its thing instead.
  • lol. sounds similar to my experience, tho not quite as bad as that.
  • goonergooner SERLists.com
    I think scraping with @2take2's method is a lot more successful.
  • I usually only feed in lists for lower tiers (blog comments, trackbacks, guestbooks, image comments), for everything else I let SER scrape as it goes.
  • I generally do it both ways, let SER scrape, and scrape my own targets with SB as well.

    @gooner - if you merge in some 'stop' words after you have merged the keywords with the footprints it works even better. ;)

    @judderman - that seems a bit low, if I use the link extraction technique I generally get about 10k verified from a list of 1,000,000 targets, but that's just following the steps and not sorting any urls in any way throughout the process.
  • goonergooner SERLists.com
    Cheers @2take2 i'll give it a whirl
  • I was letting SER do it's thing but I keep getting proxies banned on SE's which kills all the LPM. I am experimenting with scraping now. 

    I tried the BHW method and got like 5k links the first time. Trying the other method now using ScrapeBox which is to basically imitate SER in how it scrapes "Foot Print + Keyword". Obviously using different proxies for these. 

    I think the only part the 2nd method is missing in language detection which I haven't figured out yet. So if anyone has a SC or GScraper plugin or any other tool that can rake a url/domain and determine the language of the site, please let me know,
  • I'm new to SER, but list method is working much better for me. Scraped about 1.2mil urls  with "extraction technique" and got already 9k verified urls, but it passed only 600-700k now. Stopped because of spamvilla is down ((( . 

    But I haven't got succeed with SER auto scraper. Maybe Im doing something wrong. But I really don't know what? I use proxies, I use 200k kws list, ticked google, bing and yahoo searchengines. SER worked for 2 days and found only 150 places to post (it was t1 all contextual).

  • @2Take2 - yeah mate, that's what I was thinking. I still have the list, I'll cut it up into smaller files and see if that helps. I was running through a junk tier on a test project but gave up in the end. That was a few updates ago, I'm back up to 40+lpm now so I'll try it again. Been a busy few days, moving home etc so haven't been into testing much.
  • that is more likely an issue with your filters/options than scraping method^
  • OK PP, thanks, will look into it again.
  • Been scrapin' with Gscraper for 6 months. It definitely pays off to build your own list. BTW, Spamvilla is a fucking joke. Everyday a problem.
  • Always our own scrapes - it's just faster, we get a higher volume and it's more convenient. 
  • Scraping using scrapebox using custom footprints and keywords seems to work better, as u get more target sites.
  • My proxies got banned just too quickly, so i started to scrape with mixed success though. Joomla sites gave the worst results, but i did not use advanced operators, like inurl: as i have only private proxies.  

     If you can suggest place where to get free, tested public proxies would be good. 
  • i would also be ready to pay some reasonable fee. 
  • brandon on here i think mentioend a service which fits what you are tlkaing aobut but he didnt say specficially at the time so try asking him
  • @peterparkerI am not willing to pay more than 5 - 10 $, otherwise i can use private proxies for the same purpose. 
  • edited November 2013
    Their is actually an endless amount of sites to scrape if you stick to it and create tons of footrprints. It takes some practice and persistence. Not bragging here but I have over 150,000 unique Drupal sites in my database to post to from the past 6 months (not sure percentage verified). . Using tools like Gscraper or Scrapebox make this possible.
  • @sweeppicker - do you use some additional paid services or just CB ? I get tons of unique sites, but most of them use recaptcha or other captcha forms that cb cannot solve.
  • I use Spamvilla and DeathbyCaptcha but I don't always use the later service because with that many registrations it gets expensive real fast. 
  • Scrape with hrefer > all. Nothing else even comes close. Used them ask very extensively. Even custom scrapers aren't much better
  • edited November 2013
    @sweeppicker - are you just plain scraping to get that list of drupal sites, or are you looking at extracting external links, then trying to post to those as targets? And have you altered the GSA engine file for drupal at all? Help 110% appreciated!!!
  • @GodfreyWagstaff. I did both. I only targeted english. I can only image how big a list you can build targettng all the other languages. I did not alter GSA engine file for Drupal. I don't know coding.
  • @sweetpicker how are you only targeting english with scraping? This is one thing I am having problems with. I am ending up with a bunch of non-english sites in my scraped sites. 
  • Just use english language preference in GScraper. Most of your results will be english speaking sites. I get some .ru, etc, but its all good. STill useful.
  • Great Post ... Interesting read..
  • 1linklist1linklist FREE TRIAL Linklists - VPM of 150+ - http://1linklist.com
    I scrape my own lists. I tend to work with millions of links per day, and GSA is a great link-poster, but is no high-level data mining utility.

    For the average Joe, ahrefs can be a pretty solid source of LinkLists.
  • I scrape my own lists.
Sign In or Register to comment.