Skip to content

How can I segment?

1. What is difference between Segment and SegmentV, I tried, but don't understand.
2. I found another type of UCenter Captcha, its chars have fix position on X-axis, Segment and SegmentV don't work. How can I segment?

Comments

  • OzzOzz
    edited August 2013
    2. To use the Segment filter make sure to clean that captcha first. Some filter you should consider:
    - negate the captcha
    - shave the top and bottom
    - remove dots/objects and maximum filter
    - scale it to a larger dimension

    I'm not 100% positive that you can get a really clean image. Furthermore the segment filter might not work for every captcha as expected because you can't get rid of every dot in the image and some character are to close to each other. 
    However, when you have added some filters and the image looking somewhat clean than do a brute force (with the defined filters!) to see what you get. I got it at 55% with the usual OCR method with 20 captchas after the first run. I'm sure that I can optimize it even more when using the "Test focused Parameter" (right click filter) feature for filters like scale or remove objects for example.
  • Yes. I have done exactly what you listed and get about 35% success rate. When I check the why the success rate is lower than I think, I find CB have not segment right. Further more, I find its chars have fix position on X-axis. So I think this is a way to improve, but I can't segment it right.
    BTW, What is difference between Segment and SegmentV?
  • SvenSven www.GSA-Online.de

    -segmet : requires a b/w image and will segment each black area it finds. If there are more objects in it than the max length of the result, it will remove the smallest parts.

    -segmentV : requires a b/w image and will see cut the chars on its own. The chars do not have to be all surrounded with white but can be joined. It will try to find the best position to cut it if required. This is usually way faster than -segment and should be used when possible.

  • OzzOzz
    edited August 2013
    I have it at 70% with a small sample size of 20.
  • @Ozz, I use a sample of 200 images, success rate of your gcb is 48%.
    I will try OCR now :)
    Thanks.
  • SvenSven www.GSA-Online.de
    @bluescharp why not share your set so we all can improve things?
  • edited August 2013
    @Sven, I got only 35% success rate(using mask), much lower than @Ozz  :P :P
    This is my setting as follows:
    -negate
    -remove-objects 20
    -shave-top 3
    -shave-bottom 3
    -scale 500%
    -segmentV 0


  • OzzOzz
    edited August 2013
    can you share the "characters" which were used for those 200 captchas?
    i think my solving rate is lower in reality because i suppose that not every character were used when creating that definition as i just tested them with 20 samples.
  • SvenSven www.GSA-Online.de
    With set I more meant the images ;)
  • OzzOzz
    edited August 2013
    nevermind, i just added 40 captchas. so with 60 captchas i got a solving rate of 65% with all characters that are used (3-4 were missing before the correction).


    this are the filter i used:
    -negate
    -shave-top 3
    -shave-bottom 4
    -remove-objects 20
    -scale 215.7%
    -max 1
    -segment 0
    -scale 200.0%
    -blur 0x1
    -unsharp 0x5
    -unsharp 0x5
    -unsharp 0x5
    -threshold 50.6%
    -segment 8

    the filter i bolded were defined by hand before brute forcing (scale was at 200% though IIRC). after brute force i fine tuned with the "Test focused Parameter" feature.

    "Use Mask" won't work on this one that good because the characters can't be segmented that well from each other when they are pretty close.
Sign In or Register to comment.