I have some 5 million captchas to solve, and I am running some tests with GSA Captcha Breaker. It's good so far -- I'm able to process ~100,000 captchtas in 24h, with a success rate of 65%. But what is good can sometimes get better.

Here's the deal: 
* all the 5 million captchas I have to solve are of the same type. 
* by inspecting GSA Captcha Breaker's log I noticed that CB sometimes assigns my captchas to say the "email" category, and sometimes to "other" and sometimes "unknown". (Btw, is the log saved anywhere? I couldn't find it.) 

So my question is: can I improve CB's performance (success rate and/or speed) for the captchas I am trying to solve? If so, how? Should I create a new captcha type (perhaps starting from the types currently assigned by CB) and the tell CB that the captcha is from that type -- and then training GSA by solving some unsolved captchas myself? 

I looked at CB's documentation, but it seemed less than ideal. Also, I did look into some tutorials -- the ones by Ozz in this forum -- but perhaps I am missing something very basic as I couldn't even get started.

Any help, tips, pointers are much appreciated.


  • BrandonBrandon Reputation Management Pro
    Where are you getting the 65%?
  • s4nt0ss4nt0s Houston, Texas
    Also, maybe post the captcha type that is being detected incorrectly sometimes. 
  • Thanks Brandon -- not sure what "where" means, but here are some examples of how the captchas look like: A3PQ2B and FXNAB4

    Oh, I realize that the 65% I referred to is not correct -- that's the case when I set my app to try send the captcha to CB 10 times, so it's probably more like 6.5%

    Please not, I am not only concerned about the success rate; if I can improve speed without compromising accuracy, that's also a big win -- I can always retry failed captchas. So, I don't know if ruling out some captcha types wouldn't help, would it?
  • Ok, so I inspected the log more carefully. Here's what happens:

    -- the captchas match 6 types
    -- most of the captchas macht the type [Others] (that's the right domain btw)
    -- when the captcha matches a different type ([eMail] DirectBox being the second most common type), it takes CB much more time to solve it (often 20 times more)
    -- according to CB, [Others] has 50% success rate.

    So I guess my questions now are:
    1. if I unmark all captcha types other than [Others], will speed improve without compromising accuracy
    2. how can I improve the success rate of this particular captcha type? for exampe, if I solve a few unsolved captchas by hand, will that help CB? If so, how many captchas are we talking about?

    3. Is CB's log *saved* anywhere (or can I save it)? I couldn't find it.

  • SvenSven
    1. unchecking the types is ok as long as you use the option to treat unchecked types as not present, else if SER sees that type being unchecked it skips it.

    2. you can improve this using the SDK. double click on it and add a reasonable amount of captchas from the same type and try finding better filters that make this more readable.

    3. log is not saved. I don't think thats in any way useful. You can however save solved/unsolved captchas.

  • I unchecked all types other than the right one of course, and both speed and sucess rate improved dramatically.
