SER & CS & CB not working together

TheEditor · March 2013

Ok. 24 hr of CB alone:

SER says 32420 submitted, 557 verified. 17438 captchas assigned to CB, 2378 to AMB.
CB says it recognized 11730 of 14938 with avg time of 0.247.

Am busy right now; I'll total up the results tomorrow and have some comments.

TheEditor · March 2013

Time for a partial summary of 2 of the 4 experimental runs. Before I start let me cover the basics. My SER sits on a VPS with Berman. For the testing I've been running 4 sites with 3 tiers apiece.

Now let's cover Captcha Breaker vs Captcha Sniper mano a mano:

CB alone for 24 hrs:

SER says 32420 submitted, 557 verified.
17438 captchas assigned to CB, 2378 to AMB.
CB says it recognized 11730 of 14938 with avg time of 0.247.

CS alone for 24 hrs:

30978 submitted 606 verified
From SER's stat bar: 21262 sent to CS, 2266 to AskMeBot
CS says it solved 10788 solved ave time ~0.538

Before I analyze I gotta say I expected CS to win this. I'm not a CS fanboy. My expectation simply derives from CS having been around a lot longer. Captcha solving is a decidedly non-trivial task. You don't see coders hacking around with stuff like this for the sheer fun of it. Experience counts for something.

A look at the forest says this contest is a tie. The submitted count goes to CB by about 5%. The verified goes to CS by about 5%. I consider both to be trivial differences that could easily be reversed were I to run another test. For me a tie means CB is the effective winner. CB is the new kid on the block and to get a tie against more experienced opposition suggests that in the future it will probably overtake CS or at least stay even.

My brother-in-law sometimes calls me Mr Qualifier, because he says I can't answer even the simplest of questions in under 35 words. Let's qualify these results a bit. There are a large number of captcha types. I'm sure that at some types CS is better then CB and vice versa. Test results are in some measure dependent on the types of captchas encountered on the platforms and targets. I'd bet that on my first tier my projects probably don't share so much in common with the average SER user. On the lower tiers my stuff is going to look a lot more like yours. Your mileage may vary.

Now let's look at the trees.

The SER stat bar says it assigned nearly 4K more captchas to CS then CB. I have no idea why this would be. Given the overall results I'm not sure it matters. Maybe Sven can offer a guess on this. Average solve time looks like a big edge to CB. Maybe. Depends on how the solving time is calculated by each program. SER isn't telling us how fast each solver returns an answer. These times come from the solvers themselves. Since the big picture numbers are similar I'm guessing the solve time differential isn't really as much as it appears at first glance. Or as my BIL would put it, I'm putting a qualifier on this.

I had meant to make this exercise a 12 hour test, not 24. I screwed up in the CS testing phase. After testing for 12 I didn't take the proper steps in SER to change over to CB and ended up getting another 12 of CS. I threw up the numbers for the 2nd 12-hour block and things looked up so much for CS that Sven suggested (half-jokingly I think) that Mikey had contacted me to help goose things along.

That didn't happen. Mikey has spent some time with me personally and I'm very appreciative. Still the improvement was stark enough to take a look. In the first 12 hours CS racked up 12055/227, in the 2nd 12 the numbers were 18923/379. Big improvement. More then 50%.

I've been stopping SER and resetting its stats and futzing with the captcha settings every 12 hours. When I started it was on a schedule of 10:15am EST to 10:15pm EST. Now with the time at breaks spent futzing I'm up to about 11:30 as the break point. What I realized for EACH solver the prime time was from the AM to the PM. Overnight numbers are relatively weak.

This result may be *far* more important then which solver is used. It suggests that perhaps - I qualify everything don't I?! - the VPS's CPU should be dedicated to other tasks at some point in the overnight hours. If I'm seeing results this strong I'll bet you will too. Implications? I'm using SER to scrape for new targets. Maybe what I should be doing is turning loose Scrapebox on the the search engines to find new targets during those hours when SER is less productive. If SER is producing a third less submissions during a large 12 hour block its stands to reason that the performance hit is even more severe in some subsection of that block. That would be the time to run SB to get fresh targets. With the target scraping done for it SER would be even more productive during the fertile hours of the day.

I haven't even considered the two-tier solver testing. That I should be ready to do tomorrow.

Ozz · March 2013

very nice. thanks for all the effort you put into this.

one thing i've noticed in your conclusion is that you take "verified links" as an factor. did you set up dummy projects for this or have you tested all captcha tool possibilities within one project? if you did it within one project than i don't think that it is possible to know which verified link is from which testing period. there is also some sort of luck involved how many of your submitted links get verified (and approved by modereation).

furthermore it depends obviously on some other factors like "what did SER do at this time". did it post to the easy targets more (guestbooks, blog comments for example) or did post to the more difficult ones like articles or social networks (in terms of target urls to post to, which captcha types are in use, etc.).

in conclusion i would say that a bulletproof test would to be done with the same url list for each test with a typical variety of target urls to post to. as this is very hard to accomplish i think your test was absolute fine under that circumstances though, but you need to keep all its flaws in mind.

your test shows also CB is a faster than CS. this means that the CPU won't be hit such a long time so it frees time for other tasks on your machine.

TheEditor · March 2013

Well verified links is after all the ultimate goal. So taking them into account makes sense. If they weren't important I suppose Sven wouldn't report them.

Yes it would be nice to segregate which verified links came from which submissions. But that isn't possible unless I run dummy projects. I'm not doing that. I don't have the resources to spare. Considering how close the two tests are in overall submissions do you really think there is going to be a big difference?

I posted to the same platforms on the same projects in each test. I didn't post to *all* platforms. I think I covered the ground adequately in my post.

Flaws? My post had nothing *but* qualifications in it.

As I mentioned I take the average solve times with a grain of salt. You don't know what the source looks like in either solver. They may not be as comparable as you think. For example, suppose CS gives the time at the moment of receipt from SER while CB posts solve time after type classification. There a bunch of different ways to code for average time. And given the multi-threaded, multi-step nature of the posting process it may not matter much as long as one tool is not drastically slower. If CB's apparent speed advantage was all that important there would be a bigger edge then 5% in submissions.

An improved test might supply SER with target URLs so that scraping doesn't need to be done. If CB is truly faster and appreciably so you'd see a more substantial difference in submission rate then.

Sven · March 2013

Thanks a lot for this very good test. My comment about Mikey helping you was of course a joke, don't take that serious The things @Ozz mentioned are true though.

When using SER you never know what engine it takes to build links. Though I see that 12h or 24h is enough to have the same targets for all tests. And yes there are some captcha types in CS who get solved with a better success rate, but there are way more in CB that get better solved as it uses some algorithms not based on OCR (see all 100% captcha types).

Ozz · March 2013

of course verified links are important, but not for testing captchas. how do you know which links were verified from which submission/testing period? the verification process could take up to 5 days (typically).

furthermore the number of verification is just a result of how many submissions you could make. short term there is just too much luck involved if you build some links to auto approve sites or moderated sites for example. because of this the "verification" number doesn't tell you much and captchas aren't needed to get a verification. captchas are needed for submission and that is the number you have to keep your eyes on.

regarding the "flaws" i just meant that your test gave you a very good idea how things might turn out, but as i said many times there is also much luck involved what SER did in your testing period (which keywords+footprints for finding target sites, which engines were used, etc.) . the luck factor can only be minimized with fixed lists or real long time testing. both things are not easy to accomplish so your testing method is best for short term testing if you keep the flaws of that method in mind.

regarding your "avg. time". you are right obviously as you don't know exactly how the devs defined it. on the other hand the avg. time should just be the time when the captcha was put into the solver and when the solver sent it back. i believe both tools are handle that the same way as it would make sense to me elsewhere (i might be wrong about this though??!).

TheEditor · March 2013

Ozz: We can debate how avg time should be calculated. But without code or a set standard its pointless. We have to go by the results we get.

re flaws: what got my goat was that I covered a number of qualifications in my post yet you repeated much of the same material as if I'd been a biased observer and hadn't covered any.

24 hours is a decent length of time. Of course longer would be better. But pollsters can't sample every person and we can't run tests that last forever. It seems like you aren't happy with the results and therefore want to use the length to criticize them.

I think Sven is probably happy with this. A rough parity means the two products need to be separated on other considerations. Like support, and most importantly future *expectations* of support. Given the experience of an entire portfolio of products, a support forum like this one, a functional self-updating facility and a demonstrated update schedule I'd say Sven's value proposition is not bad even if he's a bit more expensive up front.

I do agree that testing with scraped lists is an avenue for investigation.

Ozz · March 2013

no, i think you've misunderstood me or i'm using the wrong words to impress my thoughts (english is not my first language).

i don't wanted to criticize your test. the opposite is true as such effort you'd take should be honored. i just wanted to mentioned that this test has its "weak spots". thats nothing i can blame you for as its the "nature of the beast" and your method is the best for anyone who has to make conclusions with limited time to spare for such tests (= almost anyone).

i'm glad you'd show everyone how to make such test. if i'm happy or not with your results doesn't matter (why you'd think that?). the important thing is that you (and anyone else) take the right descision for its own personal needs based on the results of such tests.

AlexR · March 2013

Thanks for sharing these results. This thread has been a very insightful read!

I suppose a good way to test would be as follows:

1) Rent two cheap VPS.

2) Have 1 setup with SER & CS, and the other setup with SER & CB.

3) Create identical projects with the same settings and content. Just vary the domain.

4) Let it run for 10 days on autopilot (as that's how SER is meant to be run).

Then:

1) Note stats for analysis.

2) Note crashes/downtime. (this is a factor too...since if CS keeps hanging up the machine, it has a big impact)

Maybe something to do when I have a little more spare time! :-)

SER & CS & CB not working together

Comments