Skip to content

[PSA] A Few Things To Be Aware Of With OCR Captcha Services.

shaunshaun https://www.youtube.com/ShaunMarrs
edited December 2016 in GSA Search Engine Ranker
So I decided to make this post in an attempt to show you guys what's actually happening with these OCR captcha services like Captcha Tronix and the BlazingSEO OCR service in the hope it will be able to save some people some money as well as save system resources on their VPS'/Servers to be put to better use elsewhere. I am fully aware that there are limits with OCR technology and these services may actually help some people with some projects but I just want to do a quick walk through talk through of my experience and perhaps open the eyes of some people out there as to what's actually happening with them.

I became aware of this a while back with Captcha Tronix and a few other services but only recently realized the service offered by blazingSEO let the user see the apparent captcha solves in the users dashboard so I could get a better understanding of what's happening.

The first hustle with this type of service in my opinion, is the whole software threads/captcha threads price point such as the layout below from CaptchaTronix (there are more using the same type of layout though).

image

Now the way I understand it is the captcha threads are open threads to their service sending/receiving captcha data and the software threads are the suggested software threads to be ran with that package. As you can see the service charges $47.77 per month for 20 active captcha threads with they 100 software threads.

Now my first Issue with this is that there is no indication as to if these apparent suggestions are if Captcha Tronix is being used as a primary or secondary Captcha Solving service as this will massively effect the ratio of the number of software threads you can have running to the live captcha threads. If it is your primary service then the software thread count will have to be much lower in relation to your captcha thread count as it will be taking so much more of the load. If it is your secondary captcha service then your software thread count can be much higher as tools like GSA CB will be taking a fair amount of the load with services like Captcha Tronix taking either what CB fails on or the captcha types you tell it to send to the service.

My second issue with this type of plan is that it does not take into consideration what you are building with your tools. For example if I have 100 threads active on my tool and I Have 10 active projects running. 9 of them are building blog comment links with one of them building contextual articles, although thread allocation will not be even between the projects for this example lets pretend it is to make math easier also lets pretend the full 100 threads are being used for posting, again to keep the math easier.

So we have 90 threads building blog comment links for you, many blog comment platforms do not require a captcha to be completed for your submission to be made so there are a fair few software threads here that wont be making any captcha thread requests. Now lets take a look at the 10 contextual article threads, this type of link takes much longer for automated tools to complete due to the number of actions required to submit the link. I have no idea what the average time to complete this link type is but lets say 10 seconds. So we have 10 software threads that are taking 10 seconds to build a link. How many times do you think these threads are going to presented with Captchas? Again lets presume every registration is sucessful for ease of math. The tool will probably be presented with a captcha once during account registration but some do also require you solve another captcha before submission so lets say it has 10 captchas to solve for the registration and throw in another 3 for submissions so 13 in total over the space of 10 seconds with 20 available captcha threads. Even if this service was your primary captcha service in this example its thread allocation is way off and it gets even worse when you put tools like CB in that have excellent solve rates for things like the Drupal and Mollom captchas amongst others.

Now, for ease of maths in that example I did not include things like domain time outs, no engine matches and domains being offline completely, when these are taken into account their system of software/captcha threads becomes even worse so don't waste your money going for a higher package that you will not need!

Now, the Second hustle! So many people I talk to fall for this guys and I also fell for it until I did some testing a while back! Now you will have to bear with me here as I dont know how CB is coded so I will have to do a few presumptions but you can test everything I say in this thread for yourself if you see any problems with my methodology.

So this is where my presumptions come in, when using CB to forward captchas to these types of services it contacts the service, sends the image, the service then processes the image and returns a reply. Due to receiving reply from the service CB presumes it is correct and updates its success % in the bottom right. CB is unable to double check this type of captcha, it just has to take the services word for it and thats the end of my presumptions.

So I decided to do a little testing with the OCR Captcha service offered by blazing SEO as it lets the user see the last 50 processed captchas in its dashboard. I set a few projects up and told SER to only send the ReCaptcha captcha type to the service, there are a number of ReCaptchas the service claims to have a solve rate for. Now let me repeate, this service may get some captcha types correct and be able to help people but this is just my experiance.

Anyway here is the first few captchas in the dashboard.

image

As you can see, none of them are correct.

image

Due to the service sending a reply to CB, CB thinks it is getting accurate replies as you can see in this screenshot by the service solve rate on the right hand side (this will only show up if you have a secondary captcha service enabled). Here is a screenshot of the captchas in the CB log.

image

As you can see, totally wrong replie but the services are telling CB that the captcha is correct as you can see from the Green text in the screenshot. Now i'm not sure if this next one is a bug or if it is taking advantage of the way CB works.

image

As you can see, the top captcha in the image is marked as correct even though its message says "Captcha Is Unsolvable By Your OCR." So I let the service run for a while longer not seeing a single captcha even close to being solved, heres more of what it was telling CB was correct.

image

Here is the results on CB as you can see it thinks the service got 181 captchas correct with 100% accuracy when in fact it got 0 correct, I felt I had seen enough so stopped the test.

image

So there you go guys, now where people are tripping up here is trusting the 100% solved rate and not taking the time to actually check what these services are sending back to you! I have did it myself, with SER going at full pelt it is crazy hard to see the CB log and check the responses but not only can it save you cash it can also save you system resources to put into building links rather than getting captchas marked wrong. Ever since Eve closed its door I have had the exact same problem with every one of these services I have tried so check and double check if these are worth your cash!

Comments

  • Woah, that's really bad. I've used Captcha Tronix before and the results were pretty good - I never checked the exact success rate, but I was getting more verifieds than when not using it. Seems like they have some updating to do...
  • 1linklist1linklist FREE TRIAL Linklists - VPM of 150+ - http://1linklist.com
    edited December 2016
    Not to get to involved in this thread, but we have been developing our own captcha/OCR service over the last few months (So obviously I'm biased..)

    The fact that they do not return an error message when they get a "Image not supported" or an answer that is under the minimum character count IS gross negligence.

    We had to figure out how to do that simply for internal testing (So we could know how well we were actually solving captchas!)

    For simply wrong answers, no, we cannot really know we got it wrong - Only that the OCR decoded it successfully.

    It has kind of put us in a bind with launching the service - with everyone else sending false solve rates out, were going to look terribly noncompetitive when we report our results properly.

    To go ahead and play devils advocate however, this new type of recaptcha is a tough nut to crack. For old style text/blob/house sign captchas we hit around 70-80% easily. But we cant get more than 20-30% on these street sign style captchas.
  • shaunshaun https://www.youtube.com/ShaunMarrs
    edited December 2016
    1linklist I will try it out if you like and post here with the results of it.

    Do you have a dashboard that shows the last xx captchas processed by your service?
  • 1linklist1linklist FREE TRIAL Linklists - VPM of 150+ - http://1linklist.com
    Hey @shaun,

    Not yet - but we will. We have not yet setup a user frontend / ui. Right now were still tweaking all the OCRs and behind the scenes stuff.

    We will be opening up for a public beta soon, so at that time I'll make sure to get you in the door :)
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    @1linklist  would be interested to try your OCR when it is ready for beta
  • shaunshaun https://www.youtube.com/ShaunMarrs
    1linklist See the difference getting involved in the community does, people are crying out to beta for you :).
  • edited December 2016
    Since the last ReCaptcha update Captcha Tronix has become useless. I opened up a support ticket to address the ReCaptcha problem. They responded and told me that they were not supporting ReCaptcha at that same moment.
  • @Shaun is there any service you advise next to GSA CB? (And for RankerX / Tier1 2captcha). 
  • shaunshaun https://www.youtube.com/ShaunMarrs
    For RX I only use 2 captcha, for SER I only use CB right now as I see no reason to spend cash on the links with the method I use but I am planning to test a blog commenting method soon that will use 2captcha.
  • it's just because I'd see many of these captcha's; 
    https://gyazo.com/34353170cd94e3f09dd2fa205b6aa60b

    They do not seem to be that hard... 
  • shaunshaun https://www.youtube.com/ShaunMarrs
    It's more the retention rate of links, build out 10,000 on a burner and track the link loss ever day for a month.

    You will soon see there's no reason to spend money on SER captcha a.
  • I am almost at the 10K :-)
  • 1linklist1linklist FREE TRIAL Linklists - VPM of 150+ - http://1linklist.com
    Those are called K-captchas guys. Its actually one of the hold ups with getting our public beta up. Those and Recaptcha are pretty much the only ones really in popular use.

    Funny enough, were having a harder time with K-captcha than recaptcha. K-captcha is tough because the characters are all the same color, and are very difficult to "Segment" (Seperate for analysis)
  • I've been working on a captcha OCR solution for the last few months, and I'm pretty confident that most captchas which can be read by humans can be solved with high accuracy by using dedicated OCR algorithms. 

    Here are some numbers I managed to achieve:
    - recaptcha street signs - 80-90% accepted answers
    - kcaptcha 95%+ accuracy
    - phpBB 90%+ accuracy (depending on version)
    - drupal 95%+ accuracy
    - vbulletin 85%+ accuracy
    - myBB 90%+ accuracy
    ... and a few more

    As you can see these are (nearly) human accuracy rates. My solution has just two minor downside:
    1) the algorithms are computing expensive
    2) each algorithm is dedicated and can solve only one captcha type, thus it's close to impossible to support thousands of captcha types - it would require developing thousands of algorithms. That's why I focused on the hard captchas and use CB for the easy ones.

    If you want I can throw in some screens from GSA CB or my admin panel and show some results.

    The service is still in private/beta mode - when production ready I plan to put it on the marketplace here. I still have few open spots for beta testers, so if you'd like to check it out for yourself and get some captchas solved for free, just drop me a message.


  • edited June 2017
    shaun I like your blog but here very fishy smell. I know lot of OCRs sucks including 1captcha. The vendor always a mess as you seen his market place threads you can see lot of reputed members ask for refunds and ended up with nothing. Captcha tronix far better than end of the day if it doesn't work least you can ask for a refund. 

    1linklist is the same person or just his lust affiliates.  You guys are running a great agenda.I was wondering what's happening and now i can see.

    I don't want to say anything else. Just my thought and i have seen you recommending @1linklisat your blog. 

    Haven't used their ocr service and will not.

    @anyone - if i am wrong please check vendor's services, and you can see what repeated members saying about it. Or try it yourself and check.

    edited for typos. Pardon me i am from indo.
  • shaunshaun https://www.youtube.com/ShaunMarrs
    @Creed thanks for bringing this to my attention, in all honesty, I haven't used 1Captchas or 1linklist in a while as I no longer use SER as my exclusive source of T1 links so I don't really care about a number of unique domains coming out of it anymore so just use GSA CB for my captchas and Looplines list.

    I also noticed their support has slowed right down recently with me having to chase them up multiple times about the similar problems to asiavirtualsolutions.

    I have flipped all of the generic premium list links on my blog over to Looplines service as I use his list every day and I have reached out to XCaptcha to beta test their service to perhaps flip the OCR links over to them. I won't be flipping them to Captchatronix though as the service was a total waste of time when I used it and the guy is essentially scamming people through his instant link indexing service.
  • @shaun  That make senses. I also use only GSA CB for my all projects. Used looplines some time ago. Don't you use serengines ? 
  • shaunshaun https://www.youtube.com/ShaunMarrs
    @Creed nah, I had a beta key but I don't really like it for a few reasons, every now and then I check it out but always turn away from it. My main automated web 2 tool is still Ranker X but as I always say, that is only because it is the best out of a bad bunch of tools.
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    @shaun
    For RankerX are you using one of the build in wizard templates or did you create your own. If your own, mind sharing the layout
  • shaunshaun https://www.youtube.com/ShaunMarrs
    @royalmice I keep it nice and simple as below for my RX blasts. The thing that I think matters more in RX is the site list your site selections. Out of all the targets available for Social Networks and Web 2.0s I use about 30 in total, I used to use Folked from the Social Bookmarks module too but I dropped it recently as that was the only target I valued from the full module.

    I have another one for tier one that has only one Social Network module in it so that if the account gets banned I don't lose 5 pages of backlinks but the screenshot above is my RX T2.
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    @shaun nice thanks for sharing --  how long do u run the projects for - i.e (a day, week, month )
  • shaunshaun https://www.youtube.com/ShaunMarrs
    Each project runs for a day, I control link velocity with the number of URLs for it to build links to in the project, for the example the above screenshot will produce around 100 links before alive checking so it might have something like 20-40 T1 URLs in it so each URL is getting 2-5 links per day.
  • 710fla710fla ★ #1 GSA SER VERIFIED LIST serpgrow.com
    @shaun Quick question, do you build GSA links to both RankerX tiers?
  • shaunshaun https://www.youtube.com/ShaunMarrs
    @710fla yea mate. I have moved onto using manual web 2.0s as a T1 now though as it offers what I would class as better domains and it has much better link retention. There was probably about a 4-6 week period between my T1 RX domains and my T1 manual web 2.0 domains being set off and the ones with manual web 2.0s as T1 have already caught the T1 RX ones in the SERPs and in some cases taken over them.

    Its not a fair comparison though as the keywords used for them are totally different but it defo seems stronger. Also, they are still climbing slower than my sites from last year that only used SER links in their tiers and although I have a fair few pages on page three of Google now, nothing has got any higher yet but the sites are still pretty young. 

    I use the below diagram right now with pretty heavily customizations on the settings of my tools. SER C - SER Contextuals, SER N - SER Non-Contextuals and the automated web 2.0s are from RX in my case but you could use SERE or SeNuke or whatever is available


  • 710fla710fla ★ #1 GSA SER VERIFIED LIST serpgrow.com
    edited June 2017
    @shaun Thanks for the diagram I have a similar setup with a small list of high DA 301 shorteners that are made using GSA since the link retention is pretty good. You're right about the link retention I stopped building a second contextual links to my contextual GSA links since it's a waste of resources due to the fact that the domain may go down or page be removed by the site owner. 

    It's interesting how the manual web 2.0s are ranking better than the automated web 2.0s, makes me wonder if Google can somehow tell they are being made using a program.

    Doesn't it leave a footprint if you use the same verified list for both tiers of web 2.0s? For example if you had 1000 unique domains wouldn't it be weird if both the automated and manual web 2.0s have those same 1000 backlinks? I've had sites rank using the same list but I feel like it's only a matter of time until Google starts acting on it.

    For example if I had a Wordpress parasite site as my main site. I blast it with 100 contextual links throughout the month. Then I blast the tier 1 web 2.0s (4 or 5) with the same 100 domains throughout the month.
  • shaunshaun https://www.youtube.com/ShaunMarrs
    @710fla The manual web 2.0s don't necessarily rank better, they last longer so the build up of link juice can pass to the MS. I don't think its Google detecting it, I think its the actual web 2.0 service. They are getting better at detecting automation and delete the account but with a manual web 2 it sticks to their guidelines and after the initial link loss in the first week or so, they seem to last.

    I know what you mean about using the same list for the two tiers of web 2.0s and in all honesty I have no idea how Google view it.
Sign In or Register to comment.