SDK for Dummies - How to make your own captcha definition

OzzOzz
edited January 2013 in Need Help
Creating your own captcha definition with the SDK of Captcha Breaker is fairly easy. This post will show you how to create a new definition step by step.


Preparation

At first you need stack of sample captchas. Depending on how complex the captcha type is you need at least 10-20 for very easy captchas and 100+ for the more complex ones. Basically there are two ways to obtain captchas.
1. safe unknown/unsolved captchas to file until you have enough samples collected
2. open a homepage with the captcha type and
a) copy/paste the image location and download them with the help of the SDK (add/edit/delete -> add -> from URL)
b) if that doesn't work you need to safe the captcha by hand with your browser (safe image as...). sometimes its even necessary to delete your browser cookies each time.


Create the script

1. Open SDK -> 'File' -> 'New'

2a. Fill out all necessary fields in Captcha Info:
- Name
- Type
2b. Fill out all optional fields:
- Alternative Names
- Description
- Sample URL
- Icon File (which needs to be downloaded)
image
when done click -> 'File' -> 'Safe As' to safe the basic script to file in your 'captcha_systems' folder of CB

3. Add sample captchas to SDK in Captcha Filter + Test ('Add / Edit / Delete' -> 'Add' -> 'Folder' or 'URL')
image

4. Edit the answers for captchas ('Add / Edit / Delete' -> 'Enter Correct Answer for All' -> 'All with Edit me' or 'Just Everything') or use a human captcha service if possible ('Use decaptcher')
image

5. After all answers were edited 'Safe downloaded images to folder' (create a new subfolder named 'samples' in your destination folder of the captcha)

6. Go back to 'Captcha Info' tab and click 'Detect from loaded images'
image
Sometimes it is necessary to complete the 'Charset' with missing characters but this is not the case if you have the answers for a lot of sample captchas.
Safe the script to file again.

7.  Go to 'Captcha Filter + Test' tab and click 'Brute Force' -> 'Find best solutions' -> 'All Images'
image
This will take a while but depends on the number of sample captchas, difficulty level  and the size/dimension of the captcha type. 

If you are satisfied with the result of "Brute Force" you have to save the definition and you are good to go to use it with your tool of choice.
image

How to further optimise that captcha definition will be covered in another guide.

Comments

  • edited December 2012
    Thanks for this guide. It looks pretty easy. I'm sure the competition between this program and CSX will mostly benefit us as users! and that is a good thing. 
  • In the last screenshot you see the "Processing number 1 filter"

    What does this mean? 

    Can we speed up the brute force process by raising that number? 

    If that's the case will the results still be as accurate?
  • SvenSven www.GSA-Online.de
    @honor90, that is the filter-set you can define. You can define up to 10 different filter-sets. The program will try filter 1 to 10 until one solution is made (first one is taken).  The brute force speed can not be improved by that.
  • Thanks Sven for the clarification! 
  • edited January 2013
    I've added 3 of my own captcha types now but I can't seem to find them in the main list, they are just titled "Guestbook User Added 1", ""Guestbook User Added 2", ""Guestbook User Added 3" etc. 

    But I can't seem to find them in the main list.
  • SvenSven www.GSA-Online.de
    edited January 2013

    make sure you saved them in the captcha_system folder + do a refresh

    than use the search (popup menu)

  • Awesome, thanks Sven, I was saving them in a different folder, everything is showing up just fine now.
  • SvenSven www.GSA-Online.de
    i will try to put out a warning if that ini is not saved into the right folder.
  • edited February 2013
    Here is a video I made showing me breaking the tblog captcha (before it was shared by me) in CB and in CS. I hope you can see how far superior captcha breaker is to captcha sniper just in usability and the filters it provides.  I was also able to get it up to 88% with using an additional filter.

    There are many parts in the video that are fast forwarded 5x - 10x.

     


  • s4nt0ss4nt0s Houston, Texas
    Cool video. At the end you should have put what the final results were for both software like CS solve rate x % and CB solve rate x %. Kinda hard to follow.

    Either way, awesome video!
  • now that i saw this vid I will test it with a few type of captcha's to see what results will i get.
  • LeeGLeeG Eating your first bourne

    Dumb question.

    What setting do we use to crack maths type captchas

  • SvenSven www.GSA-Online.de
    just configure it the way it solves the plain chars you see on it. CB will check if this looks like a match thing and returns the result after the solve was done.
  • edited October 2013
    Hi, newbie here! A few questions.

    1 - Is there a way to test for processing number other than the first one? Because when I click test, it only teses process numer 1.
    2 - How do you batch download captcha images (in one screenshot you put something like %nr% in the url)
    3 - Whats the difference between the OCR? Why do they come up with different results?
    4 - What is the main objective in the SDK in order to facilitate the character recognition? Eliminate the background (dots, dust, lines, etc)? Differenciate the letters from the rest?

    Regards!
  • SvenSven www.GSA-Online.de

    1. the program asks you to test all or current when you click it.

    2. That depends on the site and captcha. Some will create a new captcha on the same ur. If thats not the case see if you have some parameters that look like numbers...exchange that with %nr% and CB will insert a random number. If that does not help use this...

    Enter just a small part of the captcha URL (without http://) and enter the URL from the webpage that shows the captcha. This will get you a new captcha each time.

    3. Because different engines are used here with different methods...sometimes one works better than the other on different types. The result of all 3 is joined and the best looking one is used as end result. E.g. if you see 2 reply with ABCD and just one replies with 4BCD ... it means that ABCD might be more correct. Of course several other methods are used to take out the most of all the possible results.

    4. The main object is to create a image filter list that gets the best OCR/MASK/HASH result ;)

  • Trying to learn training captcha breaker - so how does the software know (or how can I tell it) which rule to use when. On the example of punBB - there are about 11 types there. How does the software know which one is which?
  • SvenSven www.GSA-Online.de
    Actually on many thing. Size of image, image type, used colours, number of seen objects and so on.
  • I can create a new folder to save the new solutions captchas? I ask this because I do not know how to identify the platform of each image.

    I created a folder called "My Captchas" within the "Unknown" folder and am saving the files there.

    Is that correct?
  • SvenSven www.GSA-Online.de
    the filename should hold the domain from the URL where that captcha was loaded from. So you can have a look there and see what type it may be.
Sign In or Register to comment.