SDK for Dummies - How to improve existing captcha definitions

OzzOzz
edited December 2012 in Need Help
Captcha Breaker has some neat tools on board to improve the solving rate of a captcha.

Basically there are two ways to improve them.
1) Fine tuning an existent definition
2) Add another solving process for captchas which CB wasn't able to give an answer for. This means if 'Process 01' couldn't solve the captcha you can define 'Process 02' that will try to solve it. You are able to define up to 10 processes for each captcha.


Fine Tuning

Often times it helps a lot if you change some parameters to your existing definition. To name the most important parameters you need to take a closer look on
- Threshold
- Scale

Other parameter are also worth investigating but both mentioned above are the most important ones in my opinion. As both correlate often times you will only get an effect if one of them will be fine tuned.

Example A:
- Threshold 50 %
- Scale 200 %
If we fine tune 'Threshold' and get a value of 45 than this value if optimised for a captcha which was scaled up to 200%. Most likely you won't notice any changes if we try to fine tune the scale parameter.

Example B:
- Threshold 50%
The definition has no 'Scale' parameter up to this point. Once we have fine tuned 'Threshold' to 45% you can try to add the 'Scale' parameter afterwards (= best result you will get is with a scale of 200 %). In this case it is worth trying to do such and fine tune 'Scale' after adding it to the definition.

If or not parameters are correlating to each other is depending on which parameters and in which order they appear. Because of that it is often times worth a try to all different kinds of testing.

This is how 'Fine Tuning' works with the help of 'Test focused Filter Parameter' (-> right click parameter -> click 'Test focused...')
image
In the first run we test it with a wide range of values. In this case I did it for instance with a minimum of 20 and a maximum of 80.

The first run gave us just a tiny improvement. I need to remind you though that the more sample captchas you have the better definitions could be made.
image

Next thing we try is to even optimize the value to its decimals. Once again we click 'Test filters...' but this time the minimum value will be 40, the maximum value 42 with an increase of just 0,1.
image

Sadly enough I got no further captchas solved  this times but sometimes this works wonders if you test it with a sizable captcha database.

But I tweaked the scale percentage and got a result of 30% solved captchas with a sample size of 50 captchas of this type.
image
This time it was worth a try to test the 'scale' parameter because the 'remove-objects' filter as well as the 'remove-dust' filter were correlated to the scale size.

We leave that definition for now and try add another solving process for further improvement.


Add another Process

Captcha Breaker has the unique feature to add another solving process for captchas it wasn't able to give an answer in its first process.
Before we start with 'Process 2' we testing all captchas with the original definition again and delete all captchas with an answer afterwards (regardless of wether the answer was correct or not).

To do that you need to click 'Add / Edit / Delete' -> 'Delete' -> 'All with none empty result'.
image

Next thing we need to to is to increase the 'Process' value to 2.
image

You can either 'Brute Force' for 'Process 2' with the left captchas again or try something different. In this case i added a simple filter for demonstration purposes and this filter has to be used with when "brute forcing".
image
When asked if it should always use current filter when brute forcing we say 'Yes' (or 'Ja' if you are using a german OS ;) )
This run will take some time again but not as long as in the very first 'Brute Force' run because we test it with a smaller sample size of captchas this time.

After the extra run is done and fine tuning this process we have gotten the following result for the remaining captchas.
image

You'll see that some captchas could be solved that weren't solvable in 'Process 1'. Now we are adding all captachs that were delete before ('Add / Edit / Delete' -> 'Add' -> sample folder) and click 'Test' to get the results with both processes.
image

As you can see we improved our captcha definition from 24% to a solving rate of 40% with fine tuning the original definition and by adding an additionally process. You can even add more processes (up to 10 overall) but for that you need a very large database of sample captchas.

This example has shown that it is worth the time to optimise the captcha definitions to maximize the results. Hopefully many users will participate in creating a large captcha database for testing of each type and try to improve existing definitions or adding new ones for the benefit of all. 

One last word. Everyone is able to use the SDK on each of his PC(s). After the trial of CB has expired the SDK will remain full functionally so you are able to play with it on your home PC when your license of Captcha Breaker is running on your VPS!

Comments

  • Is CB out now?
  • OzzOzz
    edited January 2013
    no, but I'm sure it will be in the next year, haha.

    as we beta testers are now allowed to post pictures and showing the features you can bet that it won't take that long anymore before CB will be released to public.

    because of that I wrote this tutorials so everybody can participate in improving CB from the start when it is available.

    @all: please don't use this thread for off-topic discussions regarding CB or its SDK.
  • LeeGLeeG Eating your first bourne

    Comes in handy for those of the beta testers that are feeling their way round the software

    I have asked dumb questions where I was looking with my eyes shut

    Ozz has just made it easier to use

    As with any new software, it takes time to learn how to use it and hit the top end submissions, which it is capable of doing with the use of mk1 brain

  • New to CS, but was curious why we need to fine-tune this ourselves? Cant the CS developer just provide us an update that improve existing captcha definitions automatically?
  • @ninjaphp - The abbreviations are very similar CS vs CB. CB is captcha breaker and the name of this subforum.

    The above tutorial is for Captcha Breaker NOT Captcha Sniper. You will not find the above features in captcha sniper.
  • @allplease stay ontopic
  • AlexRAlexR Cape Town
    @ozz - been playing around on the CB SDK. Can you add a little section to this thread on how the files work. 

    Like how do I improve the solve rate of Wordpress Blue?

    1) Do I right click, open in SDK editor. 
    2) Load some sample images? If so, what sizes, I see there are different size images in my Unsolved --> Wordpress Blue folder.
    3) Do I then enter in the answers?
    4) What does "use image" or "use hash values" or "fixed results" (i.e what are static chars or static images?)
    5) What does Brute force mean?
    6) When you add a filter, how can you test if it's improved on a capcha? So I add "normalize" how do I see if that has improved it or detracted from it? I.e. can it now detect maybe 2 of the 4 instead of 0 out of 4, etc.



  • OzzOzz
    edited January 2013
    1) right click -> SDK or just double click
    2) add/edit/delete -> add -> from file or folder (obviously only the captchas of that certain type)
    3) yes -> add/edit -> enter correct answers for all
    <-- don't forget to save them in a new folder and rename them (right click --> save dowloaded files to folder and -> rename files according to answers)
    4) hash = letters are always on the same place (static) with the same background
    fixed = captcha is a fixed image
    5) just click some buttons and test by yourself, damn ;) 
    with brute force you test with certain types of filters in trial and error mode
    6) click test and wait for the result

    be careful to only add unsolved captchas, because it distorts the reults and lower the verification rate when formerly solved captchas get unsolved due to new filters. therefore you need a good sample size of solved and unsolved of certain type of captcha with the correct ratio of solved/unsolved.

    when in your solved folder 200 captchas and in your unsolved folder 100 than the ratio is 2:1.
  • AlexRAlexR Cape Town
    Thanks @ozz!

    1) Let's say I add a new filter, is there a way for it to only get tested on 1 captcha? Often it's a bad choice, and you can quickly see on 1 captcha, it's just running it through the whole captcha list at the moment?

    2) "(obviously only the captchas of that certain type)" - by type or size within type? (I see there are different size folders within type="Wordpress Blue"


  • OzzOzz
    edited January 2013
    Here is what I like to do when I need to sort captchas.

    - Copy certain captcha folders into a new folder 'samples'
    - Download and install 'Irfan View'

    image

    image

    - mark and delete all unwanted images OR mark all wanted images into a new folder
    <-- with Irfan View you can sort all images by size, height, .... (right click -> sort..)


  • AlexRAlexR Cape Town
    edited January 2013
    1) Is there a way to remove lines out of a captcha? I.e. if there is a continuous sequence of pixels, 200 long and 3 wide, remove it completely? This will assist greatly! So often there are a few lines that distorts the captcha and reduce the solve rate. 
    image
    2) Is it possible to join two lines? i.e. often after filters it creates a gap in the chars, where an existing line was for example. If you could specify "join all lines x px wide by y pixels long" this would improve rate again. I.e. it would restore the gap in the char that the removed line/object removed from the char. 
    3) I don't fully understand the "use fixed results - only for static images." I have 100 unsolved captchas with manually solved filenames. 
    a) If I go to "use fixed results" and press clear, then calculate, and then press test it solves at 100%. What filter is it applying? (I'm testing on some Mollum captchas)
    b) In this "use fixed results" when you click on the "images window" and select a different captcha the filtered window doesn't update so you can't see effect. 
    4) If a platform has 5 different captcha sizes e.g. 50x100, 55x100, 60x100, 110x105, etc, how are these handled? Does each get a process? Or do you dump all of the captchas into a single folder?

    Just playing around with the SDK and want to see how I can get it to solve some more captchas. 
  • OzzOzz
    edited January 2013
    1) "remove dust" and "remove objects" do this, but it will not work with the mollum captcha as the lines are connected with each character. because of that the character+line has more than 200px for example.
    Maybe some kind of filter with degrees could work. 'Only remove lines from 40° to 45°' for example, but i don't know if thats doable or usable at all as I asked for this during beta tests already.
    2) "fill holes" filter, but most propably you fill more holes than 'Rocco Sifredi'. An '8' for example could't be readable after the holes in an 8 got filled.
    3) fixed results is for static images which are captchas which just use a stack of images that occour again and again without any changes
    a) because results are fixed it uses the hash of that image which results in 100% for that particular image. but just for this particular
    4) you can seperate them in different definitions or keep them together.  you can use "fixed height" filter first for example and "brute force" them with that filter. of course there is some room for improvements if seperation makes sense. or use another process if sizeA give no result.
  • AlexRAlexR Cape Town
    @Ozz - many thanks!

    1) For current definitions, how big is the sample size generally to give the current success/failed %? Is it based on 100, 200, 1000 captchas?

    Still playing around to get a better feel. Just so many things to work on at the moment!
  • Trevor_BanduraTrevor_Bandura 267,647 NEW GSA SER Verified List
    When users make these changes to CB, and it improves the solved ratio, does this some how update the solve definitions for everybody?
  • SvenSven www.GSA-Online.de
    I manually check the the shared captchas and add them in case the solve rate or speed is better.
Sign In or Register to comment.