Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sven, if we all ask nicely... would you build us a better Scrapebox? Please??

Anybody hating Scrapebox as much as I do? And Gscraper isn't much better either...

Now, with the new Proxy Scraper out, a GSA Scraper would be the perfect addition to your portfolio - I'd instantly buy any early beta :-)


Best Answer

  • SvenSven www.GSA-Online.de
    edited May 2016 Accepted Answer
    well Proxy Scraper has a "Search Engine Parser" in it. Meight have a look into that...

Answers

  • @Sven - do so and be very frightened. My programming skills are very limited, but I ran a few tests.

    Common sense seems to be that "GSA SER isn't that great at scraping"... when comparing numbers, that seems to be true at first sight: ScrapeBox produces much larger result lists in the same timeframe.

    GSA SER is slower, but only because scraping ressources are shared with the program's others tasks. If you wrap your mind around a serious scraping tool (maybe in combination with PS?) that's able to use connect proxies... that would rock.

    I have the impression that Google is getting more and more serious about blocking Scrapebox - unlike in other scenarios, Bing is a serious option here. Just a thought :)
  • I scrape 5-10 million unique URLs a day with scrapebox without any proxies

    You just have to think out of the box
  • @ronit Can you share how far you went to get out of the box ? Looks like I am too stuck within a big box . I am tired of getting proxies burned and cant find any affordable way of scraping links and you are saying you did it without proxies. Thats amazing
  • @ronit I usually pride myself with being an out-of-the-boxer - but it seems I just can't wrap my mind around SB. I burn all my proxies scraping about 500k a day.

    I'm very interested in some input - if you share your insights with me, you won't regret it, I promise 
    B-)

    (Don't tell me to use other engines than Google... been there. Massive duplicates... low id-rate by GSA SER)
  • @ronit Or did you literally mean "out of the [scrape]BOX" - cause that's pretty much my line of thinking right now?
  • Get yourself some url list let's say 10000 URLs

    Open scrapebox addon link extractor and import these URLs Set the number of threads to 1000 or maybe 500

    And select only internal links and click start.

    Once it completes you should have a huge list of URLs . Use the scapebox spilt text files features. Tools ---- > text file tool and split the internal link text file Witt every 10000 lines

    Now open the link extractor addon select you preferred thread number . Import internal links text file ( the one you just split into 10000 each text files ) select external option and run it .

    The external links from every 10000 internal link list could be 500K - 1 million


    Why don't use proxies ?

    Since you are not using any SE therefore you Server IP won't be banned. The websites which SB is reading won't be banned because it is just reading it NOT registering or submitting backlink.

    You can use 5-10 proxies to make your original server IP hidden but I don't use these proxies because these proxies don't have 1 GBPS speed which my dedicate server has and makes the whole process less time consuming.

    Please note that this link extractor is a RAM eater . I use a 64 gb server .

    Don't use the internal links more than 10000 URLs Cuz the total external links when more than 5 million and crash and your whole time will be wasted .


    I hope this helps you .


    Good Luck



    I also use platform identifier to make things much easier for SER. So my SER isn't busy removing non- identified platforms
  • @ronit THANK YOU!!! Never tried this approach until... today :)
  • I am really impressed with this method. Thank you for the insights @ronit
  • @ronit that is really a genius and out of the Box Approach. Thanks
  • Hinkys posted that method on BHW years ago. It is also basically what SEO List Builder does, but SLB has more customizable options. You need a lot of ram to run it though
  • HinkysHinkys SEOSpartans.com - Catchalls for SER - 30 Day Free Trial
    edited May 2016
    Yeah, it's great method to get you loads of links quick (especially useful if you're just starting out).

    You can also do the same thing with just SER (although the Scrapebox method is faster if you have the time to do it by hand or once you automate it). The SER one is more automated out of the box tho.

    You get a single project, load up all the verified blog comments in there, untick all "how to find target urls" options other than "use urls linking on same verified..." and just put it in "Search Only" mode.

    Optionally you can setup another project that just posts blog comments and then you can feed the verifieds from that project to the search project.

    But at the end of the day, you're just getting urls that other people are using and you need more than that to build a proper list.

    And btw it's funny that this thread popped up here as I just finished writing an almost 2,000 words long Scrapebox tutorial.

    http://seospartans.com/scrapebox-scraping-tutorial-easy-56-million-links-day/

    It was supposed to be much shorter and more compact but hopefully it's still useful.


  • SvenSven www.GSA-Online.de
    @ronit isn't that the same function you have in SER with options->advanced->Parse verified URLs (others linking on same URL)
  • @ronit thanks for the share! going to try it this morning.
  • kijix84
     I figured out the method on my own. although it obvious that other users of SB will be aware of this technique as the addon used is common and makes sense what it does. 

    Hinkys A very nice article on scraping urls thank you for your contribution. 
    As said earlier I like to use GSA Only for Backlinks submission and verification. Using SER to scrape urls is a bad idea therefore i use SB for a much lightning speed and new urls. I don't allow SER to save any verified urls as i find them RAPED urls and Raped urls aren't good for my money sites in terms of SEO. 
    every SER Campign of my Projects have unique urls ( I.e. the urls which aren't used before by me on SER ) 

    You can really go further and do some more critical thinking On SER and can achieve a a 300+ VPM on non-verified site lists. I currently get these stats from just 300 Threads. I have done some stuff to make sure my Submission and verification ratio is best. I only use CB as captcha solver. All Imported links are new and non verified urls and Global site list is Un-Ticked.
    image

    Sven  I agree SER has the ablity to do the job but it's much slower and as i said it's an Insult to make SER Find urls.
  • HinkysHinkys SEOSpartans.com - Catchalls for SER - 30 Day Free Trial
    @ronit
    Yeah it was a known method at a time and it's a very fast way to get those urls that everyone has and you would scrape anyway.

    And that's some solid VPM (unless you're hitting only blog comments / pingbacks / trackbacks / indexer, which I'm assuming you're not). I'm assuming you use only a heavily filtered out platform list?

    It's what I used to do and while I really appreciate that kind of optimization, it feels like you're leaving out a lot of links on the table.

    What I'm doing right now is gathering a huge identified list with all platforms and then processing that list weekly and using the verified list that comes out of that.

    Also, I appreciate the "no verified list" policy you got there but I feel like that's shooting yourself in the foot a bit, don't you think? I mean with so many SER users out there, the sites on your site list are going to be used by some people, it's just a matter of how many. Re-using your list a couple times won't make much difference in those numbers.
  • Hinkys  It might put you in a shock but the platforms i use are article,social bookmark , social network , video and wiki . and all engines are CDF . yes, as mentioned earlier i use platform identifier to sort out my engines.    

    The List of Urls I use don't have more than 5 Outbound Links ( All Internal Links are not Included here ) Ofcourse other GSA SER users can spam it by chance as there are trillions and zillions of unique urls on internet but before they get their hand on my 10% of my urls I would have benefited with those links which pass their link juice. But if I spam a url let say 10 times it's clear that my links will not get that much authority which i could get by spamming one time. 
    And Why think too much building a verified list when you make 3-5 mil Unique Urls Everyday and sleep nicely without thinking much of footprints, raped backlinks etc
  • HinkysHinkys SEOSpartans.com - Catchalls for SER - 30 Day Free Trial
    @ronit
    Hell, if it's working than more power to you, there's no point in messing with a proven formula.

    And if you're using those platforms than that are some really solid numbers. What is your submitted to verified ratio if I may ask? (In other words, what does your LPM look like when doing those 226 VPM?)
  • I don't remember but it isn't much higher than vpm. It could be between 260-280
  • @ronit - How do you make such successful non-verified, non-spammed lists if you're reverse engineering other people's spam (SB method you mentioned initially?)
  • 2Take22Take2 UK
    edited May 2016
    Alternatively, you can just scrape for the default generic data / text footprints that SER leaves when it posts. Obviously you'll get a list that's already been spammed a bit though...
Sign In or Register to comment.