Lets get a poll: who scrapes their own lists and who lets GSA scrape and why?

PeterParker · November 2013

I've heard compelling arguments for both and Im still not quite decided tho Im still veering towards letting GSA scrape due to the obv hands free nature tho experimenting with adding lists and blasting currently. So Im interested to hear others opinions on the matter one way or otehr and the reasons for such.

gooner · November 2013

I think you are already know my answer so i won't bother :P lol

JudderMan · November 2013

Scraped 1.6 million from that BHW guide. Run it and got 34 verified from 15k submitted. Gave up (for now at least) and let SER do its thing instead.

PeterParker · November 2013

lol. sounds similar to my experience, tho not quite as bad as that.

gooner · November 2013

I think scraping with @2take2's method is a lot more successful.

cherub · November 2013

I usually only feed in lists for lower tiers (blog comments, trackbacks, guestbooks, image comments), for everything else I let SER scrape as it goes.

2Take2 · November 2013

I generally do it both ways, let SER scrape, and scrape my own targets with SB as well.

@gooner - if you merge in some 'stop' words after you have merged the keywords with the footprints it works even better.

@judderman - that seems a bit low, if I use the link extraction technique I generally get about 10k verified from a list of 1,000,000 targets, but that's just following the steps and not sorting any urls in any way throughout the process.

gooner · November 2013

Cheers @2take2 i'll give it a whirl

Wizzardly · November 2013

I was letting SER do it's thing but I keep getting proxies banned on SE's which kills all the LPM. I am experimenting with scraping now.

I tried the BHW method and got like 5k links the first time. Trying the other method now using ScrapeBox which is to basically imitate SER in how it scrapes "Foot Print + Keyword". Obviously using different proxies for these.

I think the only part the 2nd method is missing in language detection which I haven't figured out yet. So if anyone has a SC or GScraper plugin or any other tool that can rake a url/domain and determine the language of the site, please let me know,

Liolik · November 2013

I'm new to SER, but list method is working much better for me. Scraped about 1.2mil urls with "extraction technique" and got already 9k verified urls, but it passed only 600-700k now. Stopped because of spamvilla is down ((( .

But I haven't got succeed with SER auto scraper. Maybe Im doing something wrong. But I really don't know what? I use proxies, I use 200k kws list, ticked google, bing and yahoo searchengines. SER worked for 2 days and found only 150 places to post (it was t1 all contextual).

JudderMan · November 2013

@2Take2 - yeah mate, that's what I was thinking. I still have the list, I'll cut it up into smaller files and see if that helps. I was running through a junk tier on a test project but gave up in the end. That was a few updates ago, I'm back up to 40+lpm now so I'll try it again. Been a busy few days, moving home etc so haven't been into testing much.

PeterParker · November 2013

that is more likely an issue with your filters/options than scraping method^

JudderMan · November 2013

OK PP, thanks, will look into it again.

sweeppicker · November 2013

Been scrapin' with Gscraper for 6 months. It definitely pays off to build your own list. BTW, Spamvilla is a fucking joke. Everyday a problem.

mmtj · November 2013

Always our own scrapes - it's just faster, we get a higher volume and it's more convenient.

splendour · November 2013

Scraping using scrapebox using custom footprints and keywords seems to work better, as u get more target sites.

RayBan · November 2013

My proxies got banned just too quickly, so i started to scrape with mixed success though. Joomla sites gave the worst results, but i did not use advanced operators, like inurl: as i have only private proxies.

If you can suggest place where to get free, tested public proxies would be good.

RayBan · November 2013

i would also be ready to pay some reasonable fee.

PeterParker · November 2013

brandon on here i think mentioend a service which fits what you are tlkaing aobut but he didnt say specficially at the time so try asking him

RayBan · November 2013

@peterparkerI am not willing to pay more than 5 - 10 $, otherwise i can use private proxies for the same purpose.

sweeppicker · November 2013

Their is actually an endless amount of sites to scrape if you stick to it and create tons of footrprints. It takes some practice and persistence. Not bragging here but I have over 150,000 unique Drupal sites in my database to post to from the past 6 months (not sure percentage verified). . Using tools like Gscraper or Scrapebox make this possible.

RayBan · November 2013

@sweeppicker - do you use some additional paid services or just CB ? I get tons of unique sites, but most of them use recaptcha or other captcha forms that cb cannot solve.

sweeppicker · November 2013

I use Spamvilla and DeathbyCaptcha but I don't always use the later service because with that many registrations it gets expensive real fast.

JamPackedSpam · November 2013

Scrape with hrefer > all. Nothing else even comes close. Used them ask very extensively. Even custom scrapers aren't much better

GodfreyWagstaff · November 2013

@sweeppicker - are you just plain scraping to get that list of drupal sites, or are you looking at extracting external links, then trying to post to those as targets? And have you altered the GSA engine file for drupal at all? Help 110% appreciated!!!

sweeppicker · November 2013

@GodfreyWagstaff. I did both. I only targeted english. I can only image how big a list you can build targettng all the other languages. I did not alter GSA engine file for Drupal. I don't know coding.

Wizzardly · November 2013

@sweetpicker how are you only targeting english with scraping? This is one thing I am having problems with. I am ending up with a bunch of non-english sites in my scraped sites.

sweeppicker · November 2013

Just use english language preference in GScraper. Most of your results will be english speaking sites. I get some .ru, etc, but its all good. STill useful.

AlexJones · August 2014

Great Post ... Interesting read..

1linklist · August 2014

I scrape my own lists. I tend to work with millions of links per day, and GSA is a great link-poster, but is no high-level data mining utility.

For the average Joe, ahrefs can be a pretty solid source of LinkLists.

AlexJones · March 2015

I scrape my own lists.

Lets get a poll: who scrapes their own lists and who lets GSA scrape and why?

Comments