This week saw some mild ups and downs. Following up on our success with
thinning an image, we developed a standard format for our images, which is
as follows: we take the black-on-white greyscale jpeg, convert it to
binary (only black or white), thin it to a 1-pixel width, crop it as tight
as we can, resize it to 25x25 pixels, and then thicken the number a bit.
All of this is encapsulated in a function called jpg2std.m.
Once we have this, we run a function called graygrid.m which simply divides
the standard image into a 5x5 grid (each grid point composed, obviously, of
5x5 pixels) and averages the values in that grid, returning a value between
0 and 255. This gives us a general idea of what the number looks like.
Since the number was thickened, this often results in certain points in the
5x5 grid being all-white or all-black. Also, since we have a very general
look at the number in this 5x5 grid, it has the potential to gloss over the
variations present in fonts.
With this graygrid function, we next proceeded to come up with a standard
graygrid matrix for each digit 0 through 9, which we did using the 10
sample images we already had (5 courier and 5 times). Once we had these
average digits, it was easy to write a function, evalimg.m, which sums the
variances (squares of differences) of a given image and each of the average
digits, returning the digit that has the least variance, which is hopefully
the digit in the image we are testing.
The original image
The image, standardized with jpg2std
The image, after graygrid
Well, we tested this image with all the images we had used to compile the
average digits, and of course they all passed very well. But, as the sage
lyricist of Poison reminds us, every rose has its thorn, just like every
cowboy sings
a sad, sad song.
Trying to press our function to the extreme, we tested it on some
handwritten digits, and it only correctly identified 7 out of 20. Not too
happy. Well, one thing it was getting wrong was that handwritten 1's were
so skinny that when they got resized to 25x25, they looked more like big
rectangles than lines. Thus, we made jpg2std recognize when an image's
aspect ratio wasn't normal (should only work in the case of handwritten
1's) and to resize it more properly, adding in some black space on either
side. With this change, our function now correctly identified the
handwritten 1's, batting 9 out of 20.
Well, never content to sleep, we scanned in a lot more fonts, to see how
our function did on other typewritten digits. It performed okay, getting
all 10 digits right on some fonts, and on the funkier ones, missing quite a
few. This testing was enabled greatly by another key function we wrote,
parse.m, which enables us to grab the bottom line from a scanned image
(that's where the zip code would be) and parse the individual characters
into seperate images. This will be important when we make our recognition
function better, so that we can find the zip code on an envelope and
identify its digits.
So where we are now is that we have started looking at what digits the
recognition function is missing and thinking about why it does so. We
noticed a definite correlation between the variances on digits it got right
and those on digits it got wrong, and may try to find a dividing point at
which we don't trust the function's output. Also, we are looking into
other functions, such as loop counters, line counters, and more, that would
distinguish further between numbers such as 7, 1, and 2, or 8 and 5 in ways
that the current one does not. These could be used as further criterion,
and added to our function's decision process. Finally, we need to look
into updating our average numbers to reflect the wider range of fonts we're
currently looking at.