Skip to content

New Google Scraper About To Come!

24

Comments

  • steelbonesteelbone Outside of Boston
    Would like to test drive it if possible :)
  • edited April 2014
    @‌banditim 50$ for 500 million links sounds fare especially if those were deduped links. Suggestion: how about if you integrate a web crawler that crawls the entire web. Each page the crawler will visit would match it with a footprint. You then build a search engine that searches the crawled pages for keywords and filter the searches with engines.
  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @bencrabara - Crawling the 'entire web' is a little overkill, but we definitely do have some extra crawling outside of just user's scrapes to allow us to get even more footprints and websites for specific engines :)
  • This sounds really fantastic!
    Scraping lists is always a very time and power consuming process, so this service can be really impactant!


  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @raperez - Thanks very much! We sure hope it does :)
  • I'll like to try your service as well. I'm using gscraper with their monthly proxies service, so a better alternative will be great.
  • KaineKaine thebestindexer.com
    @BanditIM

    Thank you for no response, i know now that I do not would hire me with you ;)
  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @Kaine - That's extremely odd -- I had typed out a very long response to you and was awaiting YOUR response haha. No need to get bitter - I'll retype it and send the PM again (not sure what happened to it).
  • KaineKaine thebestindexer.com
    edited May 2014
    Surely a strange bug, I got nothing ;)

    Needless to sent in MP. Posted answered here, we are between man.
  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @Kaine - I'm sorry, I'm a bit confused with your response... do you not want me to reply now?
  • KaineKaine thebestindexer.com
    edited May 2014
    My English is very bad, I await your response.
  • KaineKaine thebestindexer.com
    @BanditIM

    This deal interested you ?

  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @Kaine - I'll look into it all later today, thanks for all the information... just really busy right now :(
  • BanditIM this is an interesting concept.

    How good will it be at importing massive footprint lists?  By massive I mean around the 5M mark (something I am about to start on)?  And how quick would it return the restults?

    Also, when charging for results, will you be charging for the number after or before the dedupe?
  • Yea I'd pay like 30-70$ for this after testing for myself or hearing reviews and comparisons with the other 2
  • is the beta out yet ?
  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @Flembot - We haven't gotten to the point of testing 5 million, but we allow users to copy/paste or Upload files with their respective footprints and/or keywords, so it shouldn't be an issue. Result return speed will vary depending on how many of your search terms we have in our cache. If most of your search terms are in our cache, you could get 100 million results back in a matter of minutes :). As for the charging of the results, to keep the comparisons even with Scrapebox and GScraper, we will charge before de-dupe. We obviously know there will be a lot of dupes in there, and will keep that in mind, but we want to make the perfect comparison with the other tools to prove we WILL be cheaper and more efficient than the others.

    @sumusiko - Sweet, great to hear!

    @spammasta - Not yet. We still got a couple weeks to make sure the whole system is ready to go. I wanted to make this thread now to get our beta testers signed up. I will stop accepting new beta testers fairly soon.
  • @BanditIM - interested to see the pricing then, because with gscraper you can use the free version to scrape all day long. With dedupe enabled it is really hard to reach the url limit.

    Will there be the ability to filter results? For example: I doubt anyone wants to get google webcache results at all.
  • Really nice idea. Count with me to test it as always.
  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @Ferryman - With the free version you still have to provide your own proxies and server costs though. If you're talking about just using like 1 or 2 threads to scrape with your plain IP, I think that's a far stretch haha. We may offer like 10,000 free scrapes for all users each month or something so it is comparable to that free scraping.

    As for filtering results, we'll add in all those extra features as we progress. They are simple and easy to add in, but just take some time because there are so many of them to add. We want to get the core idea down before getting too far ahead of ourselves. I do have to ask though -- I've been scraping for years and haven't encountered my scraped results coming back with webcache results... does Google somehow show this in the searches? Mind giving an example?


    @Peisithanatos - Great man, thanks for the interest!
  • I am interested too!
  • The idea is of caching looks great, since google is getting smarter with proxies every single day.
    Though it comes to refresh rate of cached queries.
    If you are going to call it like 2 weeks to 1 month they will be quite useless.


  • edited May 2014
    @BanditIM - Yes, ofc you have to provide your own proxies :D Still, for the $30 mentioned above you get enough  reverse proxies to use all day long. For $100 I wouldn't even bother getting the service unless it is really phenomenal.

    About the webcache - weird, I am geting them a lot on gscraper (every third result or so). Doesn't really bother me as I just dedupe according to unique domains.

    Would be nice if there was an option to get weekly, monthly, daily etc results so you could just get the fresh ones instead of scraping the same thing over and over again.


  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    @derdor - Completely agreed. We plan on  monitoring certain popular footprints (i.e. - "powered by wordpress") and see how often the search results update when using a handful of keywords with those footprints. Right now we're seeing around 3-10 days will be a good number to start off with on the refresh rate.

    @Ferryman - Regarding reverseproxies, of course you can scrape all day long with those, but $30 gets you like 10 ports... once we do some case studies of certain thread amounts, we will have a good idea of what we will need to charge to be competitive. 10 ports, or 1000 ports, it'll all scale pretty evenly, so the number of links you can scrape in a month using that route will be less than the number of links you can get with us at the same price :). Also note, we see our service as little more premium (but won't charge for it) due to the fact it can auto-scrape 24/7 and auto-FTP, something the other software's cannot do.
  • @BanditIM, looking forward to it. :)
  • I want to be on the beta list =).
  • BlazingSEOBlazingSEO http://blazingseollc.com/proxy
    Just want to keep everyone in the loop, we haven't forgotten about you guys :). With the upcoming release of our new text captcha system, the scraper has been put down one priority, but it's very very close! Check out the easy-to-use dashboard that'll get you scraping links 24/7 in a matter of a couple of minutes:

  • Sounds awesome - I would definately be interested in this...
  • @banditIM after trying your email service and spin service ( too bad you removed it :( ), I'm waiting to get my hands on this. I always had trouble with scraping and proxies so this will be the better option for me. On a completely unrelated note the ads in the screenshots show south Indian actress :p
  • @BanditIM
    If possible please add me to the Beat-Tester List.

    Till now i use ScrapeBox and Gscraper

    Thanks Marc
Sign In or Register to comment.