SDK for Dummies - How to make your own captcha definition
Creating your own captcha definition with the SDK of Captcha Breaker is fairly easy. This post will show you how to create a new definition step by step.
Preparation
At first you need stack of sample captchas. Depending on how complex the captcha type is you need at least 10-20 for very easy captchas and 100+ for the more complex ones. Basically there are two ways to obtain captchas.
1. safe unknown/unsolved captchas to file until you have enough samples collected
2. open a homepage with the captcha type and
a) copy/paste the image location and download them with the help of the SDK (add/edit/delete -> add -> from URL)
b) if that doesn't work you need to safe the captcha by hand with your browser (safe image as...). sometimes its even necessary to delete your browser cookies each time.
Create the script
1. Open SDK -> 'File' -> 'New'
2a. Fill out all necessary fields in Captcha Info:
- Name
- Type
2b. Fill out all optional fields:
- Alternative Names
- Description
- Sample URL
- Icon File (which needs to be downloaded)
when done click -> 'File' -> 'Safe As' to safe the basic script to file in your 'captcha_systems' folder of CB
3. Add sample captchas to SDK in Captcha Filter + Test ('Add / Edit / Delete' -> 'Add' -> 'Folder' or 'URL')
4. Edit the answers for captchas ('Add / Edit / Delete' -> 'Enter Correct Answer for All' -> 'All with Edit me' or 'Just Everything') or use a human captcha service if possible ('Use decaptcher')
5. After all answers were edited 'Safe downloaded images to folder' (create a new subfolder named 'samples' in your destination folder of the captcha)
6. Go back to 'Captcha Info' tab and click 'Detect from loaded images'
Sometimes it is necessary to complete the 'Charset' with missing characters but this is not the case if you have the answers for a lot of sample captchas.
Safe the script to file again.
7. Go to 'Captcha Filter + Test' tab and click 'Brute Force' -> 'Find best solutions' -> 'All Images'
This will take a while but depends on the number of sample captchas, difficulty level and the size/dimension of the captcha type.
If you are satisfied with the result of "Brute Force" you have to save the definition and you are good to go to use it with your tool of choice.
How to further optimise that captcha definition will be covered in another guide.
Comments
make sure you saved them in the captcha_system folder + do a refresh
than use the search (popup menu)
There are many parts in the video that are fast forwarded 5x - 10x.
Either way, awesome video!
Dumb question.
What setting do we use to crack maths type captchas
1 - Is there a way to test for processing number other than the first one? Because when I click test, it only teses process numer 1.
2 - How do you batch download captcha images (in one screenshot you put something like %nr% in the url)
3 - Whats the difference between the OCR? Why do they come up with different results?
4 - What is the main objective in the SDK in order to facilitate the character recognition? Eliminate the background (dots, dust, lines, etc)? Differenciate the letters from the rest?
Regards!
1. the program asks you to test all or current when you click it.
2. That depends on the site and captcha. Some will create a new captcha on the same ur. If thats not the case see if you have some parameters that look like numbers...exchange that with %nr% and CB will insert a random number. If that does not help use this...
Enter just a small part of the captcha URL (without http://) and enter the URL from the webpage that shows the captcha. This will get you a new captcha each time.
3. Because different engines are used here with different methods...sometimes one works better than the other on different types. The result of all 3 is joined and the best looking one is used as end result. E.g. if you see 2 reply with ABCD and just one replies with 4BCD ... it means that ABCD might be more correct. Of course several other methods are used to take out the most of all the possible results.
4. The main object is to create a image filter list that gets the best OCR/MASK/HASH result
I created a folder called "My Captchas" within the "Unknown" folder and am saving the files there.
Is that correct?