@banditim 50$ for 500 million links sounds fare especially if those were deduped links.
Suggestion: how about if you integrate a web crawler that crawls the entire web. Each page the crawler will visit would match it with a footprint. You then build a search engine that searches the crawled pages for keywords and filter the searches with engines.
BlazingSEO http://blazingseollc.com/proxy
@bencrabara - Crawling the 'entire web' is a little overkill, but we definitely do have some extra crawling outside of just user's scrapes to allow us to get even more footprints and websites for specific engines
raperez
This sounds really fantastic! Scraping lists is always a very time and power consuming process, so this service can be really impactant!
Thank you for no response, i know now that I do not would hire me with you
BlazingSEO http://blazingseollc.com/proxy
@Kaine - That's extremely odd -- I had typed out a very long response to you and was awaiting YOUR response haha. No need to get bitter - I'll retype it and send the PM again (not sure what happened to it).
Kaine thebestindexer.com
edited May 2014
Surely a strange bug, I got nothing
Needless to sent in MP. Posted answered here, we are between man.
BlazingSEO http://blazingseollc.com/proxy
@Kaine - I'm sorry, I'm a bit confused with your response... do you not want me to reply now?
How good will it be at importing massive footprint lists? By massive I mean around the 5M mark (something I am about to start on)? And how quick would it return the restults?
Also, when charging for results, will you be charging for the number after or before the dedupe?
sumusiko
Yea I'd pay like 30-70$ for this after testing for myself or hearing reviews and comparisons with the other 2
spammasta
is the beta out yet ?
BlazingSEO http://blazingseollc.com/proxy
@Flembot - We haven't gotten to the point of testing 5 million, but we allow users to copy/paste or Upload files with their respective footprints and/or keywords, so it shouldn't be an issue. Result return speed will vary depending on how many of your search terms we have in our cache. If most of your search terms are in our cache, you could get 100 million results back in a matter of minutes . As for the charging of the results, to keep the comparisons even with Scrapebox and GScraper, we will charge before de-dupe. We obviously know there will be a lot of dupes in there, and will keep that in mind, but we want to make the perfect comparison with the other tools to prove we WILL be cheaper and more efficient than the others.
@spammasta - Not yet. We still got a couple weeks to make sure the whole system is ready to go. I wanted to make this thread now to get our beta testers signed up. I will stop accepting new beta testers fairly soon.
Ferryman
@BanditIM - interested to see the pricing then, because with gscraper you can use the free version to scrape all day long. With dedupe enabled it is really hard to reach the url limit.
Will there be the ability to filter results? For example: I doubt anyone wants to get google webcache results at all.
Peisithanatos
Really nice idea. Count with me to test it as always.
BlazingSEO http://blazingseollc.com/proxy
@Ferryman - With the free version you still have to provide your own proxies and server costs though. If you're talking about just using like 1 or 2 threads to scrape with your plain IP, I think that's a far stretch haha. We may offer like 10,000 free scrapes for all users each month or something so it is comparable to that free scraping.
As for filtering results, we'll add in all those extra features as we progress. They are simple and easy to add in, but just take some time because there are so many of them to add. We want to get the core idea down before getting too far ahead of ourselves. I do have to ask though -- I've been scraping for years and haven't encountered my scraped results coming back with webcache results... does Google somehow show this in the searches? Mind giving an example?
The idea is of caching looks great, since google is getting smarter with proxies every single day. Though it comes to refresh rate of cached queries. If you are going to call it like 2 weeks to 1 month they will be quite useless.
Ferryman
edited May 2014
@BanditIM - Yes, ofc you have to provide your own proxies Still, for the $30 mentioned above you get enough reverse proxies to use all day long. For $100 I wouldn't even bother getting the service unless it is really phenomenal.
About the webcache - weird, I am geting them a lot on gscraper (every third result or so). Doesn't really bother me as I just dedupe according to unique domains.
Would be nice if there was an option to get weekly, monthly, daily etc results so you could just get the fresh ones instead of scraping the same thing over and over again.
BlazingSEO http://blazingseollc.com/proxy
@derdor - Completely agreed. We plan on monitoring certain popular footprints (i.e. - "powered by wordpress") and see how often the search results update when using a handful of keywords with those footprints. Right now we're seeing around 3-10 days will be a good number to start off with on the refresh rate.
@Ferryman - Regarding reverseproxies, of course you can scrape all day long with those, but $30 gets you like 10 ports... once we do some case studies of certain thread amounts, we will have a good idea of what we will need to charge to be competitive. 10 ports, or 1000 ports, it'll all scale pretty evenly, so the number of links you can scrape in a month using that route will be less than the number of links you can get with us at the same price . Also note, we see our service as little more premium (but won't charge for it) due to the fact it can auto-scrape 24/7 and auto-FTP, something the other software's cannot do.
Just want to keep everyone in the loop, we haven't forgotten about you guys . With the upcoming release of our new text captcha system, the scraper has been put down one priority, but it's very very close! Check out the easy-to-use dashboard that'll get you scraping links 24/7 in a matter of a couple of minutes:
Sounds awesome - I would definately be interested in this...
Vijayaraj India
@banditIM after trying your email service and spin service ( too bad you removed it ), I'm waiting to get my hands on this. I always had trouble with scraping and proxies so this will be the better option for me. On a completely unrelated note the ads in the screenshots show south Indian actress
Comments
Scraping lists is always a very time and power consuming process, so this service can be really impactant!
How good will it be at importing massive footprint lists? By massive I mean around the 5M mark (something I am about to start on)? And how quick would it return the restults?
Also, when charging for results, will you be charging for the number after or before the dedupe?
@sumusiko - Sweet, great to hear!
@spammasta - Not yet. We still got a couple weeks to make sure the whole system is ready to go. I wanted to make this thread now to get our beta testers signed up. I will stop accepting new beta testers fairly soon.
As for filtering results, we'll add in all those extra features as we progress. They are simple and easy to add in, but just take some time because there are so many of them to add. We want to get the core idea down before getting too far ahead of ourselves. I do have to ask though -- I've been scraping for years and haven't encountered my scraped results coming back with webcache results... does Google somehow show this in the searches? Mind giving an example?
@Peisithanatos - Great man, thanks for the interest!
Though it comes to refresh rate of cached queries.
If you are going to call it like 2 weeks to 1 month they will be quite useless.
@Ferryman - Regarding reverseproxies, of course you can scrape all day long with those, but $30 gets you like 10 ports... once we do some case studies of certain thread amounts, we will have a good idea of what we will need to charge to be competitive. 10 ports, or 1000 ports, it'll all scale pretty evenly, so the number of links you can scrape in a month using that route will be less than the number of links you can get with us at the same price