edited February 2013 in Need Help
Perhaps a really dumb question and there is a manual - but I really can't find it: In the SDK Dialog of CB there is a Menu "Tools" where I can find "Auto Improve". As far as I understand this function, it's a function to train CB to get better solving rates (?). Which folder is used to do this? Is it the captcha_systems folder or is there the already used data stored?

Another stupid question: Are the user added captchas automatically integrated or where do I have to put or open the GCB files?

These functions are looking very interesting - but I'm really unsure about how to use them. Thanks for every hint ;)


  • OzzOzz
    edited February 2013
    no, not a dumb question as in this case you could do more harm than good when "auto improve" with small sample size or bad collection of samples.

    "auto improve" will brute force all captchas IIRC. i never used it as i just click 'brute force' when i train a catpcha i work on atm. please read my guides to get a deeper understanding how you should do it.

    regarding your second question.
    shared captchas are reviewed by sven and then implemented to CB with the next update.
  • Thanks for your answer Ozz - I think this helps me a little bit further. In the meanwhile I found your (great) SDK trhead with more information.

    The last thing I ask myself now is, if there is a function to collect correct (!) data and use this for improving the already available captchas. As far as I got this is only possible manually. Would this be something for the "feature request" list? I imagine the power if everybody would do this and we could improove the detection rates as a "swarm".
  • OzzOzz
    edited February 2013
    you mean something like a "sample captchas database", right?
    this is an idea i also had in mind but on the other hand you don't know if anybody is renaming the captchas with the right answer.
    because of that i do that by hand which takes some time and is such a stupid work, but then i know that my samples are correct.

    one thing that could happen for instance is that the ratio of "solved" and "unknown" captchas isn't right, so someone only uploads "unknowns". the result of this is that the ratio isn't balanced and will result in a worse solving rate for formerly "solved" captchas.
    another thing that could happen is that someone upload the "solved" folder for a particular captcha type without reviewing the answers. this will also result in an unbalacened solving rate as CB would think that the answer it gave while testing it were correct, but in reality they are not.

    however, when i upload a captcha i'll get asked by CB if i want to upload the samples too, than i agree to that so at least Sven has more samples.
  • AlexRAlexR Cape Town
    @ozz - I have a number of people who if you give them a big file of captchas could rename them with the correct captchas. This way you could have a sample size of 2000 correctly labelled captchas.

    I asked Sven if this would help and he said "Thanks, but no" as he had a big enough sample size for all captchas already and it wouldn't add value. 
  • OzzOzz
    edited February 2013
    2000 of one particular type or 2000 of different types? just upload them if they are differently.
  • AlexRAlexR Cape Town
    I have someone I can assign the task of naming captchas correctly, but Sven said he didn't need a bigger database. I'd only do it if it improved captcha solve rates somehow...
