@Vijayaraj - Haha, those aren't a part of our website, it's part of postimg.org . But thanks so much for the encouraging words, we're very excited to get this rolling out!
@momba12 - Correct. We won't ever disclose or sell them for any kind of lists, but we'll simply make the scraping process for all users faster by giving you the scraped results if another user has already scraped them. Why waste resources scraping for the same keywords over and over again?
@jgf213 - We had to put it behind our new text captcha system because that was a much higher priority, but we're hoping we can roll out some beta testers in the next 2-3 weeks.
Quick update everyone - we are very very close! After running some tests today our system came back with some incredible statistics.... an average of 400 links per SECOND! That's right, with the few servers we have right now running the system we will be able to handle over 30 million links a day, and that's not even mentioning with the help of our cache system!
We have one last thing left to do and that's to find enough proxy sources so that we never run out with the volume we expect to use. With that being said, if you have or know of anyone who sells good Google-passed proxies, please shoot me a PM. If it's not a 'well known' or easily findable source, I will make sure to reward you with a lot of free scraping when we go live
I never satisfied with the scrapebox did to get its scraping job done at the moment. Willing to try your service asap when it go alive mate. Let me know if you're ready to set
I am in for the beta tests. I have several servers runnning with hrefer and gscraper 24/7. I would happily try and break your system! LMAO Dashboard looks nice and user friendly.
One little feature request is a "sieve fliter" which is the main reason why I love using hrefer over gscraper. It filters a lot of the crap urls as it harvests. You can filter in the same way with gscraper, but it requires a lot of extra steps with the final list. It works a lot better if results are filtered automatically as it scrapes.
Just an update for all the new posters, we are testing the system right now, but as you may have seen in the recent SER update our newest Text Captcha System (like askMeBot) was implemented a couple days ago. We've been hard at work getting that prepped and working properly so we can go live with it instead of just beta testers.
Make sure to sign up for our email list that will contain information regarding the release and updates of the Google scraper (no additional information will be set - you have my word)
Newest update - we're getting close everybody! Stats are rolling in and data is looking good. Need some additional work on our proxy handler, but once that finishes up we will be in search of many proxy sources that will allow us to scrape FAST.
If anyone reading this has used GScraper (or Scrapebox), what was the fastest links per minute or links per second that you have received before? Need to know what we have to beat
Thanks @miren ! I forgot to note in my previous post that we are only using a handful of crappy public proxy lists during our testing phase (hence the low links per minute / second). May I ask how many proxies you're using and how much you're paying for them that is achieving those results? Maybe even provide me here (or PM if you need) the source you're using to achieve that? Thanks pal!
@miren@Brandon@Seljo -- Thanks guys for the input . We're seeing that once we buy the proxy sources we have on our list we will be able to easily obtain these numbers - so that's awesome!
First off, anyone is is truly interested in this system should sign up to the mailing list if you want to be a beta tester. I will not send any promotional emails out -- ONLY emails regarding this system will be sent to you. The list can be found here:
About the update -- we're confident in being ready for beta users within the next few days. The system is currently scraping at 30 links per second with the couple of public proxy sources we currently use. This theoretically means... we increase the proxy sources = increase in links per second. Testing will commence throughout the week and another update will be given shortly.
Note: The auto-footprint scraper isn't quite working yet, so for the first beta release that will not be functional. The dashboard for it is there, but the backbones of it working is a very complex AI project using natural language processing and heuristics and it's occupying a lot of time right now. It will be done though
Comments
One little feature request is a "sieve fliter" which is the main reason why I love using hrefer over gscraper. It filters a lot of the crap urls as it harvests. You can filter in the same way with gscraper, but it requires a lot of extra steps with the final list. It works a lot better if results are filtered automatically as it scrapes.
With services like these your proxies will become useless for future use.
You been begging for proxy sources to make your service work for a long time.
Why would anyone in their right mind tell you their proxy source.
Why would I give you my source that allows me to scrape at 100k per minute?
So you could kill them with your service LOL. No thanks
First off, anyone is is truly interested in this system should sign up to the mailing list if you want to be a beta tester. I will not send any promotional emails out -- ONLY emails regarding this system will be sent to you. The list can be found here:
http://bit.ly/newgooglescraper
About the update -- we're confident in being ready for beta users within the next few days. The system is currently scraping at 30 links per second with the couple of public proxy sources we currently use. This theoretically means... we increase the proxy sources = increase in links per second. Testing will commence throughout the week and another update will be given shortly.
Note: The auto-footprint scraper isn't quite working yet, so for the first beta release that will not be functional. The dashboard for it is there, but the backbones of it working is a very complex AI project using natural language processing and heuristics and it's occupying a lot of time right now. It will be done though