@sven - "Not much unless you have a lot captchas for the captcha type with correct answers and use the SDK with it."
Are you saying that if I get a large database of captchas that are manually solved, like 1 000 to 2000, I can just use the SDK to try and find the best solution using brute force?
(it's just you said "unless you have a lot of captchas")
The reason I ask is that I can get a few of my VA's to manually solve many captchas and then run these through the SDK brute force overnight if needed to get a better solve rate.
(What sample size of captchas did you use to create the various definitions in CB?)
A good sample size is required to not create a filter set that is just good for the few once you added and not create a false success rate. I guess every captcha type can be improved in one or the other way...but only for a good sample size.
as i've done some of the original captcha definitions i might answer that as well.
1) depends on the captcha type and what solving method will be used.
2) depends if the captcha is easy to solve or not. for the easy ones 20 are enough. for the harder ones >100.
but its not about just brute forcing. often times it is recommended to add some filters beforehand and use them with brute force. this needs some experience and understanding of the matter. "brute force" is not a magic button which can turn shit into gold.
if captchas are hard to solve than they are hard to solve. often times even for humans.
because of that just brute forcing it with quadrillions of captcha won't help to get a better solving rate.
if you are lucky than you can improve a 2% captcha to 4% with further processes for example unless you have some really good idea how to use the filter settings.
Maybe if there was a manual that come with the product , then people wont have to keep asking how to use it. And sifting thru a forum trying to figure out from other peoples sufferings and complaints - how to use a program does not substitute for a manual.
In stead of arrogant and cocky replies why not make a directory so people can easily find guidance in the forum on how to use stuff.
It seems that we are manually trying different filters and settings but I found that it's really hard to know what is the best order to apply certain filters.
It seems brute force basically does what we do, but much quicker. The issue with it is that it always applies the same order of rules to try and solve it. It also sometimes tries filters that we can see add no value.
Maybe if we could come up with a few ideas on how to improve the brute force feature we could increase solve rates. It's just that it is WAY faster than me at trying the order of filters and different values.
@AlexR: i dunno if thats possible. if im not wrong than brute force just take existing definitions of other captcha types and test them against the actual captcha type.
so the process of creating and improving captcha definitions is easiest as i've described in the SDK guides.
- create captcha definition
- fine tuning the filters for that definition
- add another process
- fine tuning the filters for that process
- and so on
apart from that it just needs some experience and understanding as i've said above. trial and error is the way to go as there is no general rule how the change of filter affect the solving rate. i thought so often that my predefined filter must help to improve a captcha, but i didn't get better results with that.
@royalmice: your question to this discussion is exactly what? read the guides about the SDK if you are interested in this. i think they touch everything you need to know about the SDK. maybe Sven can make both guides sticky to this part of the forum.
@Ozz - spent ages in the SDK trying to solve/improve definitions. Just discovered that I can't beat the speed and combinations of the brute force results. That's why I've been trying to brainstorm to focus on this aspect.
How do you know what order is best to apply the filters? It seems some orders work nicely on some definitions, but on others the order is totally different. Are you just trying trial and error?
What about if we could specify a list of filters for the brute force to try (i.e. for certain captchas some filters don't help at all but take forever for it to process)?
As you said "trial and error is the way to go" - surely it would be better if we could automate it somehow as the number of results we could "try and error" would be 1000's of times more!
Just some ideas...when I have a moment, will need to look at it again. Was a month or so ago since I last tried.
The other thing I felt was that maybe we could specify the font of certain captchas so it has a smaller set it's trying to fit the text to. E.g. some of the mollum captchas use a specific font as well as recaptcher. I think that the OCR code/module (with the font libraries in it) is closely watched by Google and Mollum and they just ensure they use a font that is not in the OCR library. (At least that was my understanding from an article)
i don't know which order is perfect and as every captcha is different i just trial and error. delete this and test. add this and test. define some filter that might work good and test and so on. experience combined with trial and error.
this takes time though and often times its just coincidence if things works out like expected (at least for me).
regarding the mollum fonts i already researched it and couldn't find any good match. there is no good information to find about that. futhermore i even studied the tesseract OCR a bit but Sven told me something about a feature he likes to add to CB so i stopped that as his solution sounds way better than to invest more time in tesseract.
i guess the main problem about recaptcha is that the fonts are very close to each other, so there will always be a problem for CB to determine how many characters are in use actually.
there might be a solution some day for that but then you don't know how long it will last when google decides to change its algorythm. i stopped wasting my time with recaptcha because of this.
Well despite the disappointing put down @sven, seems this thread has generated some valuable discussion.
Many thanks @royalmice for the support, I'd hate to think as a paying customer (with 3 GSA products) that we were not expected to ask the questions or make the observations that would make these purchases even better value. Asking for a manual to learn how to use the product to its best, which is normally included 99.9% of the time, shouldn't be necessary.
Even more thanks to @Ozz for the links and superb guides. This or an easy to find link (sticky post for example) should be included without asking, that was my point.
Again thanks to everyone for their help here on one of the best forums I've ever been a member of.
Yeah @Mitch, don't take it personal. I've been smacked by @sven, and pretty much everybody has gotten one smack. It always happens when we do or say something kind of dumb...like posting the same question in two or more different threads.
Everyone helps out here. Pretty much every single question gets answered. Just remember that a lot of people are in the Euro timezone, so evening questions for the guys in the states might not get answered until the next day.
Since I joined the forum when it started in August, there has been an update every day. It would be impossible to create a manual as it would be outdated within a week. But I do think a couple new videos would be helpful to all the new faces.
Until that happens, always try to search first, then read as much as you can, and then ask when you hit a wall. We're all here to help.
@Mitch - I created a best practice guide with screenshots and everything. Within 1 month all new settings had been added! It's a big job for someone to keep up a best practice guide with actual settings.
The best solution that I can think of is where there is a live page that lists every feature and what it does and note about it. This way, when a new feature is added, it can just be added onto the list. That way, you just read through the doc and will know what each feature does, but it won't show you what combination to use them in, but I suppose everyone uses them differently!
@AlexR that would be fantastic. I understand that we need to develop our own use for the tools but it is incredibly helpful to see what they can do first.
I have read your GSA SER best practice thread and it was a great help. It just shows a complete beginner what can be done and what areas you need to experiment with. I can also appreciate how much work went into putting that together.
I've learned from the forum so many tips so far that had I known earlier would have saved a lot of wasted time and effort (as is the case for everyone I understand) and appreciate that users such as yourself want to contribute to help others climb that learning curve a little bit quicker and with less pain.
So not only is great to have these incredibly helpful threads, it is also important to have an easy way of finding them. Maybe another section for How To's and Guides @sven?
Thanks again guys, loving the products and loving the community!
@Mitch - Sven wants to focus on programming. If you want something like this, create a wiki or something and list the various features. Then the various users on the forum will list where and where not to use them and what they do. I think within 1 month we'd have all features explained in a central place. Why don't you setup the platform and list the features and we'll start adding definitions to it?
"Sven wants to focus on programming. If you want something like this, create a wiki or something and list the various features. "Ask Sven what he prefers.
Comments
Please stop posting the same again and again.
>What can I do to help improve results?
Not much unless you have a lot captchas for the captcha type with correct answers and use the SDK with it.
>What are the best settings?
The default once
>How do I use the SDK?
click the SDK button and play with it ;P
In stead of arrogant and cocky replies why not make a directory so people can easily find guidance in the forum on how to use stuff.
Many thanks @royalmice for the support, I'd hate to think as a paying customer (with 3 GSA products) that we were not expected to ask the questions or make the observations that would make these purchases even better value. Asking for a manual to learn how to use the product to its best, which is normally included 99.9% of the time, shouldn't be necessary.
Even more thanks to @Ozz for the links and superb guides. This or an easy to find link (sticky post for example) should be included without asking, that was my point.
Again thanks to everyone for their help here on one of the best forums I've ever been a member of.
Yeah @Mitch, don't take it personal. I've been smacked by @sven, and pretty much everybody has gotten one smack. It always happens when we do or say something kind of dumb...like posting the same question in two or more different threads.
Everyone helps out here. Pretty much every single question gets answered. Just remember that a lot of people are in the Euro timezone, so evening questions for the guys in the states might not get answered until the next day.
Since I joined the forum when it started in August, there has been an update every day. It would be impossible to create a manual as it would be outdated within a week. But I do think a couple new videos would be helpful to all the new faces.
Until that happens, always try to search first, then read as much as you can, and then ask when you hit a wall. We're all here to help.
Thanks for the message though, appreciate it.
I have read your GSA SER best practice thread and it was a great help. It just shows a complete beginner what can be done and what areas you need to experiment with. I can also appreciate how much work went into putting that together.
I've learned from the forum so many tips so far that had I known earlier would have saved a lot of wasted time and effort (as is the case for everyone I understand) and appreciate that users such as yourself want to contribute to help others climb that learning curve a little bit quicker and with less pain.
So not only is great to have these incredibly helpful threads, it is also important to have an easy way of finding them. Maybe another section for How To's and Guides @sven?
Thanks again guys, loving the products and loving the community!
"Sven wants to focus on programming. If you want something like this, create a wiki or something and list the various features.
"Ask Sven what he prefers.
Best way to set this up?