How it's hangin'

Period ending 12-3-97

This week saw some mild ups and downs. Following up on our success with thinning an image, we developed a standard format for our images, which is as follows: we take the black-on-white greyscale jpeg, convert it to binary (only black or white), thin it to a 1-pixel width, crop it as tight as we can, resize it to 25x25 pixels, and then thicken the number a bit. All of this is encapsulated in a function called jpg2std.m.

Once we have this, we run a function called graygrid.m which simply divides the standard image into a 5x5 grid (each grid point composed, obviously, of 5x5 pixels) and averages the values in that grid, returning a value between 0 and 255. This gives us a general idea of what the number looks like. Since the number was thickened, this often results in certain points in the 5x5 grid being all-white or all-black. Also, since we have a very general look at the number in this 5x5 grid, it has the potential to gloss over the variations present in fonts.

With this graygrid function, we next proceeded to come up with a standard graygrid matrix for each digit 0 through 9, which we did using the 10 sample images we already had (5 courier and 5 times). Once we had these average digits, it was easy to write a function, evalimg.m, which sums the variances (squares of differences) of a given image and each of the average digits, returning the digit that has the least variance, which is hopefully the digit in the image we are testing.

The original image The image, standardized
with jpg2std
The image, after graygrid

Well, we tested this image with all the images we had used to compile the average digits, and of course they all passed very well. But, as the sage lyricist of Poison reminds us, every rose has its thorn, just like every cowboy sings a sad, sad song.

Trying to press our function to the extreme, we tested it on some handwritten digits, and it only correctly identified 7 out of 20. Not too happy. Well, one thing it was getting wrong was that handwritten 1's were so skinny that when they got resized to 25x25, they looked more like big rectangles than lines. Thus, we made jpg2std recognize when an image's aspect ratio wasn't normal (should only work in the case of handwritten 1's) and to resize it more properly, adding in some black space on either side. With this change, our function now correctly identified the handwritten 1's, batting 9 out of 20.

Well, never content to sleep, we scanned in a lot more fonts, to see how our function did on other typewritten digits. It performed okay, getting all 10 digits right on some fonts, and on the funkier ones, missing quite a few. This testing was enabled greatly by another key function we wrote, parse.m, which enables us to grab the bottom line from a scanned image (that's where the zip code would be) and parse the individual characters into seperate images. This will be important when we make our recognition function better, so that we can find the zip code on an envelope and identify its digits.

So where we are now is that we have started looking at what digits the recognition function is missing and thinking about why it does so. We noticed a definite correlation between the variances on digits it got right and those on digits it got wrong, and may try to find a dividing point at which we don't trust the function's output. Also, we are looking into other functions, such as loop counters, line counters, and more, that would distinguish further between numbers such as 7, 1, and 2, or 8 and 5 in ways that the current one does not. These could be used as further criterion, and added to our function's decision process. Finally, we need to look into updating our average numbers to reflect the wider range of fonts we're currently looking at.

Now that's a mouthful!


Back to the main page.