[PSA] A Few Things To Be Aware Of With OCR Captcha Services.

So I decided to make this post in an attempt to show you guys what's actually happening with these OCR captcha services like Captcha Tronix and the BlazingSEO OCR service in the hope it will be able to save some people some money as well as save system resources on their VPS'/Servers to be put to better use elsewhere. I am fully aware that there are limits with OCR technology and these services may actually help some people with some projects but I just want to do a quick walk through talk through of my experience and perhaps open the eyes of some people out there as to what's actually happening with them.

I became aware of this a while back with Captcha Tronix and a few other services but only recently realized the service offered by blazingSEO let the user see the apparent captcha solves in the users dashboard so I could get a better understanding of what's happening.
The first hustle with this type of service in my opinion, is the whole software threads/captcha threads price point such as the layout below from CaptchaTronix (there are more using the same type of layout though).

Now the way I understand it is the captcha threads are open threads to their service sending/receiving captcha data and the software threads are the suggested software threads to be ran with that package. As you can see the service charges $47.77 per month for 20 active captcha threads with they 100 software threads.
Now my first Issue with this is that there is no indication as to if these apparent suggestions are if Captcha Tronix is being used as a primary or secondary Captcha Solving service as this will massively effect the ratio of the number of software threads you can have running to the live captcha threads. If it is your primary service then the software thread count will have to be much lower in relation to your captcha thread count as it will be taking so much more of the load. If it is your secondary captcha service then your software thread count can be much higher as tools like GSA CB will be taking a fair amount of the load with services like Captcha Tronix taking either what CB fails on or the captcha types you tell it to send to the service.
My second issue with this type of plan is that it does not take into consideration what you are building with your tools. For example if I have 100 threads active on my tool and I Have 10 active projects running. 9 of them are building blog comment links with one of them building contextual articles, although thread allocation will not be even between the projects for this example lets pretend it is to make math easier also lets pretend the full 100 threads are being used for posting, again to keep the math easier.
So we have 90 threads building blog comment links for you, many blog comment platforms do not require a captcha to be completed for your submission to be made so there are a fair few software threads here that wont be making any captcha thread requests. Now lets take a look at the 10 contextual article threads, this type of link takes much longer for automated tools to complete due to the number of actions required to submit the link. I have no idea what the average time to complete this link type is but lets say 10 seconds. So we have 10 software threads that are taking 10 seconds to build a link. How many times do you think these threads are going to presented with Captchas? Again lets presume every registration is sucessful for ease of math. The tool will probably be presented with a captcha once during account registration but some do also require you solve another captcha before submission so lets say it has 10 captchas to solve for the registration and throw in another 3 for submissions so 13 in total over the space of 10 seconds with 20 available captcha threads. Even if this service was your primary captcha service in this example its thread allocation is way off and it gets even worse when you put tools like CB in that have excellent solve rates for things like the Drupal and Mollom captchas amongst others.
Now, for ease of maths in that example I did not include things like domain time outs, no engine matches and domains being offline completely, when these are taken into account their system of software/captcha threads becomes even worse so don't waste your money going for a higher package that you will not need!
Now, the Second hustle! So many people I talk to fall for this guys and I also fell for it until I did some testing a while back! Now you will have to bear with me here as I dont know how CB is coded so I will have to do a few presumptions but you can test everything I say in this thread for yourself if you see any problems with my methodology.
So this is where my presumptions come in, when using CB to forward captchas to these types of services it contacts the service, sends the image, the service then processes the image and returns a reply. Due to receiving reply from the service CB presumes it is correct and updates its success % in the bottom right. CB is unable to double check this type of captcha, it just has to take the services word for it and thats the end of my presumptions.
So I decided to do a little testing with the OCR Captcha service offered by blazing SEO as it lets the user see the last 50 processed captchas in its dashboard. I set a few projects up and told SER to only send the ReCaptcha captcha type to the service, there are a number of ReCaptchas the service claims to have a solve rate for. Now let me repeate, this service may get some captcha types correct and be able to help people but this is just my experiance.
Anyway here is the first few captchas in the dashboard.

As you can see, none of them are correct.

Due to the service sending a reply to CB, CB thinks it is getting accurate replies as you can see in this screenshot by the service solve rate on the right hand side (this will only show up if you have a secondary captcha service enabled). Here is a screenshot of the captchas in the CB log.

As you can see, totally wrong replie but the services are telling CB that the captcha is correct as you can see from the Green text in the screenshot. Now i'm not sure if this next one is a bug or if it is taking advantage of the way CB works.

As you can see, the top captcha in the image is marked as correct even though its message says "Captcha Is Unsolvable By Your OCR." So I let the service run for a while longer not seeing a single captcha even close to being solved, heres more of what it was telling CB was correct.

Here is the results on CB as you can see it thinks the service got 181 captchas correct with 100% accuracy when in fact it got 0 correct, I felt I had seen enough so stopped the test.

So there you go guys, now where people are tripping up here is trusting the 100% solved rate and not taking the time to actually check what these services are sending back to you! I have did it myself, with SER going at full pelt it is crazy hard to see the CB log and check the responses but not only can it save you cash it can also save you system resources to put into building links rather than getting captchas marked wrong. Ever since Eve closed its door I have had the exact same problem with every one of these services I have tried so check and double check if these are worth your cash!
The fact that they do not return an error message when they get a "Image not supported" or an answer that is under the minimum character count IS gross negligence.
We had to figure out how to do that simply for internal testing (So we could know how well we were actually solving captchas!)
For simply wrong answers, no, we cannot really know we got it wrong - Only that the OCR decoded it successfully.
It has kind of put us in a bind with launching the service - with everyone else sending false solve rates out, were going to look terribly noncompetitive when we report our results properly.
To go ahead and play devils advocate however, this new type of recaptcha is a tough nut to crack. For old style text/blob/house sign captchas we hit around 70-80% easily. But we cant get more than 20-30% on these street sign style captchas.
Not yet - but we will. We have not yet setup a user frontend / ui. Right now were still tweaking all the OCRs and behind the scenes stuff.
We will be opening up for a public beta soon, so at that time I'll make sure to get you in the door
They do not seem to be that hard...
You will soon see there's no reason to spend money on SER captcha a.
Funny enough, were having a harder time with K-captcha than recaptcha. K-captcha is tough because the characters are all the same color, and are very difficult to "Segment" (Seperate for analysis)
Here are some numbers I managed to achieve:
- recaptcha street signs - 80-90% accepted answers
- kcaptcha 95%+ accuracy
- phpBB 90%+ accuracy (depending on version)
- drupal 95%+ accuracy
- vbulletin 85%+ accuracy
- myBB 90%+ accuracy
... and a few more
As you can see these are (nearly) human accuracy rates. My solution has just two minor downside:
1) the algorithms are computing expensive
2) each algorithm is dedicated and can solve only one captcha type, thus it's close to impossible to support thousands of captcha types - it would require developing thousands of algorithms. That's why I focused on the hard captchas and use CB for the easy ones.
If you want I can throw in some screens from GSA CB or my admin panel and show some results.
The service is still in private/beta mode - when production ready I plan to put it on the marketplace here. I still have few open spots for beta testers, so if you'd like to check it out for yourself and get some captchas solved for free, just drop me a message.
1linklist is the same person or just his lust affiliates. You guys are running a great agenda.I was wondering what's happening and now i can see.
I don't want to say anything else. Just my thought and i have seen you recommending @1linklisat your blog.
Haven't used their ocr service and will not.
edited for typos. Pardon me i am from indo.
I also noticed their support has slowed right down recently with me having to chase them up multiple times about the similar problems to asiavirtualsolutions.
I have flipped all of the generic premium list links on my blog over to Looplines service as I use his list every day and I have reached out to XCaptcha to beta test their service to perhaps flip the OCR links over to them. I won't be flipping them to Captchatronix though as the service was a total waste of time when I used it and the guy is essentially scamming people through his instant link indexing service.
For RankerX are you using one of the build in wizard templates or did you create your own. If your own, mind sharing the layout
I have another one for tier one that has only one Social Network module in it so that if the account gets banned I don't lose 5 pages of backlinks but the screenshot above is my RX T2.
Its not a fair comparison though as the keywords used for them are totally different but it defo seems stronger. Also, they are still climbing slower than my sites from last year that only used SER links in their tiers and although I have a fair few pages on page three of Google now, nothing has got any higher yet but the sites are still pretty young.
I use the below diagram right now with pretty heavily customizations on the settings of my tools. SER C - SER Contextuals, SER N - SER Non-Contextuals and the automated web 2.0s are from RX in my case but you could use SERE or SeNuke or whatever is available
It's interesting how the manual web 2.0s are ranking better than the automated web 2.0s, makes me wonder if Google can somehow tell they are being made using a program.
Doesn't it leave a footprint if you use the same verified list for both tiers of web 2.0s? For example if you had 1000 unique domains wouldn't it be weird if both the automated and manual web 2.0s have those same 1000 backlinks? I've had sites rank using the same list but I feel like it's only a matter of time until Google starts acting on it.
For example if I had a Wordpress parasite site as my main site. I blast it with 100 contextual links throughout the month. Then I blast the tier 1 web 2.0s (4 or 5) with the same 100 domains throughout the month.
I know what you mean about using the same list for the two tiers of web 2.0s and in all honesty I have no idea how Google view it.