Sven, if we all ask nicely... would you build us a better Scrapebox? Please??
Anybody hating Scrapebox as much as I do? And Gscraper isn't much better either...
Now, with the new Proxy Scraper out, a GSA Scraper would be the perfect addition to your portfolio - I'd instantly buy any early beta :-)
Comments
You just have to think out of the box
Open scrapebox addon link extractor and import these URLs Set the number of threads to 1000 or maybe 500
And select only internal links and click start.
Once it completes you should have a huge list of URLs . Use the scapebox spilt text files features. Tools ---- > text file tool and split the internal link text file Witt every 10000 lines
Now open the link extractor addon select you preferred thread number . Import internal links text file ( the one you just split into 10000 each text files ) select external option and run it .
The external links from every 10000 internal link list could be 500K - 1 million
Why don't use proxies ?
Since you are not using any SE therefore you Server IP won't be banned. The websites which SB is reading won't be banned because it is just reading it NOT registering or submitting backlink.
You can use 5-10 proxies to make your original server IP hidden but I don't use these proxies because these proxies don't have 1 GBPS speed which my dedicate server has and makes the whole process less time consuming.
Please note that this link extractor is a RAM eater . I use a 64 gb server .
Don't use the internal links more than 10000 URLs Cuz the total external links when more than 5 million and crash and your whole time will be wasted .
I hope this helps you .
Good Luck
I also use platform identifier to make things much easier for SER. So my SER isn't busy removing non- identified platforms
You can also do the same thing with just SER (although the Scrapebox method is faster if you have the time to do it by hand or once you automate it). The SER one is more automated out of the box tho.
You get a single project, load up all the verified blog comments in there, untick all "how to find target urls" options other than "use urls linking on same verified..." and just put it in "Search Only" mode.
Optionally you can setup another project that just posts blog comments and then you can feed the verifieds from that project to the search project.
But at the end of the day, you're just getting urls that other people are using and you need more than that to build a proper list.
And btw it's funny that this thread popped up here as I just finished writing an almost 2,000 words long Scrapebox tutorial.
http://seospartans.com/scrapebox-scraping-tutorial-easy-56-million-links-day/
It was supposed to be much shorter and more compact but hopefully it's still useful.
every SER Campign of my Projects have unique urls ( I.e. the urls which aren't used before by me on SER )
You can really go further and do some more critical thinking On SER and can achieve a a 300+ VPM on non-verified site lists. I currently get these stats from just 300 Threads. I have done some stuff to make sure my Submission and verification ratio is best. I only use CB as captcha solver. All Imported links are new and non verified urls and Global site list is Un-Ticked.
Yeah it was a known method at a time and it's a very fast way to get those urls that everyone has and you would scrape anyway.
And that's some solid VPM (unless you're hitting only blog comments / pingbacks / trackbacks / indexer, which I'm assuming you're not). I'm assuming you use only a heavily filtered out platform list?
It's what I used to do and while I really appreciate that kind of optimization, it feels like you're leaving out a lot of links on the table.
What I'm doing right now is gathering a huge identified list with all platforms and then processing that list weekly and using the verified list that comes out of that.
Also, I appreciate the "no verified list" policy you got there but I feel like that's shooting yourself in the foot a bit, don't you think? I mean with so many SER users out there, the sites on your site list are going to be used by some people, it's just a matter of how many. Re-using your list a couple times won't make much difference in those numbers.
The List of Urls I use don't have more than 5 Outbound Links ( All Internal Links are not Included here ) Ofcourse other GSA SER users can spam it by chance as there are trillions and zillions of unique urls on internet but before they get their hand on my 10% of my urls I would have benefited with those links which pass their link juice. But if I spam a url let say 10 times it's clear that my links will not get that much authority which i could get by spamming one time.
And Why think too much building a verified list when you make 3-5 mil Unique Urls Everyday and sleep nicely without thinking much of footprints, raped backlinks etc
Hell, if it's working than more power to you, there's no point in messing with a proven formula.
And if you're using those platforms than that are some really solid numbers. What is your submitted to verified ratio if I may ask? (In other words, what does your LPM look like when doing those 226 VPM?)