Where do I find CAPTCHA sources?
Here's to a productive week, GSA pplz!! Need to know where to find example CAPTCHAs to feed the SDK in GSA's Captcha Breaker software package.
I already know that I can find them online by getting lists of sites if I scrape for something specific in the catpcha-creating JavaScript, if it's done that way, or even something in the generated HTML, if it's a PHP script making the test.
Or, find a single site if it's a uniquely executed CAPTCHA, and just get many images generated by that same site on refresh.
But what about locally? I know I set GSA to save CAPTCHAs when I started. I also had CS in use for a bit. And now with CB, they're saving as well, I guess?
I already know that I can find them online by getting lists of sites if I scrape for something specific in the catpcha-creating JavaScript, if it's done that way, or even something in the generated HTML, if it's a PHP script making the test.
Or, find a single site if it's a uniquely executed CAPTCHA, and just get many images generated by that same site on refresh.
But what about locally? I know I set GSA to save CAPTCHAs when I started. I also had CS in use for a bit. And now with CB, they're saving as well, I guess?
Comments
Maybe I should also start with CAPTCHAs that aren't uber-challenging?
So far, I've only tried improving a set that was already in CB which had a fairly low solve rate. I think my sample size was too small, but after the brute force with a few new filter profile sets, solve rate apparently was improved. So I saved it.
Is this a REAL improvement, or is it limited b/c my sample size was too small?
In other words, does the larger sample size help CB to solve more variations within that particular CAPTCHA and settings?
When trying to improve, do I use as many sources as possible, both solved and unsolved?
Like I said, I think I want to start with something else.
I've gotten NOWHERE. haha But I am also inspired to mess with that captcha generator a lot more, as well as DL and explore the other freeware php generators and see if there are any drastically different ideas I can add.
I just want to mess with them some more.
Yes; the goal is to make terribly maddening captchas. lol
If *I* can visually distinguish them, why can't a user do the same?? I am not trained in captcha deciphering like CB is.* And I still get nearly 100% right.
I admit, this one does take a few seconds of gazing upon to get right, but it's not like IMPOSSIBLE. I guess the goal for hard ones is to make them challenging, but not so challenging that a user will close their browser cursing me. lol Unless I really don't want logins, just want to seem to or if logins are closed to all but the dedicated. Hmm..
I CAN read them, just seems I am not proceeding in the right way to get CB to recognize the characters! It's not seeing ANYthing on OCR1,2, or 3.
*edit: To be fair, I did choose the typefaces so I already know what the letterforms will look like. I wonder how another person would fare.
@Sven, how do I check the CAPTCHAs against an external service? I tried, but CB didn't do anything, so I didn't do it right.
Basically you would do the following:
- 1. load captchas
- 2. make sure they all are correctly answered
- 3. click DETECT
- 4. click on the red label so it also assigns the chars that are probably missing
- 5. go back to the filters and experiment a bit till you find it good looking (remove noize, threshhold...)
- 6. right click on filters->auto optimize (only if it already has a solution for one captcha at least)
- 7. click brute force and let it use current filters first and not use all sets (popup answers YES, NO)
- 8. when done let it auto optimize it again
---Thats basically the stuff I do for a new captcha.
CB is actually trying to find answers now.
(I am only unsure what you meant about clicking on the red label above.)
The captcha I am working on solving with the 199 samples was by Paul Drain for GPL licensing for use with OSCommerce and ZenCart. I modified it to make OCR harder (before getting into GSA), but not saying in what ways, publicly! Not def trying to give the world ideas, in this regard. lol I'd rather find easy solves out there! The human solvers are getting costlier!
I also worked on a captcha last nite in CB that a user uploaded to the board yesterday. The letterforms all turned out looking squiggly-ended nand I had zero success. I didn't do the steps suggested above, tho....
Both are kind of tough. lol
I think I'm going to DL a captcha module and make signifcant mods to an *easy* captcha (that's new in SOME ways, but still VERY easy), and then solve a puzzle on my level of beginner, but still useful b/c it'll be a mod and somehow different than just re-solving one that already has a high solve rate, and the new captcha can be used "olws" in the future.
(like IRL but "on live web sites"? lol)
I'm sure you know some effective methods better than I do for keeping them difficult. For all I know, also, you could destroy the one I made (that "seems" difficult) in ten minutes! lol
To anyone out there reading this and needing help: The red label includes all the known characters in the character set. So, CB gets it from the right answers, but you can add some chars, too.
I guess, in part, this is why you need a large sample size; the probability of a character coming up as one of four characters out of potentially 70 or more characters makes it possible to miss a few otherwise.
Remember, the character set could be incomplete. I've eliminated some letters in captchas b/c some letters look alike and that really drives users nuts. I guess that again, a large sample size lets you know what you're dealing with.
Captcha Breaker is working on the one uploaded yesterday. So far, the best attempt has yielded a 70.87% success rate, no wait,it's 75.73% already..now 77.67%!!
Better than zero last nite with the squiggly lines.
*SICK* SDK on CB, Sven! This actually WORKS, and works really well!!!!