Step 2 - Standardize Image Size and Thickness
The standardization of each image is very important in making a correct
identification. Each image is taken from its original parsing and can be
a wide variety of sizes and thicknesses. This portion of our algorithm
accounts for variations in size and thickness of the original image in an
attempt to make individual numbers universal, independant of the font or
handwriting sample.
Pre-format
Initially, the scanned images are typically larger than the standard size
(25x25 pixel) array. In addition, they contain varying line thicknesses,
even within individual numbers. To begin, we use Matlab's thinning
algorithm, bwmorpth(img,'thin',Inf), to thin each line in the image down to
1 or 2 pixels. At this point, all images have the same line thicknesses.
Figures 1 and 3 show the thinning of a font with uneven thicknesses and for
comparison, figures 2 and 4 show a handwritten sample getting thinned to
the same thickness. Once the images are at the same thinned thickness (~1
pixel), we thicken the thinned image so that in the scale reduction process, the
image does not fall apart or disappear. The number of dilations, using Matlab's
bwmorph(img,'dilate',1), is proportional to the size of the initial
image.
If the image is small in terms of pixels, the image is thickened less
than an image with a large number of pixels. This is to prevent a small
image dilating so much as to close holes or alter other properties.
Aspect Ratio Fix
Handwritten 1's can cause problems with cropping and resizing if they
are not written very clearly. For this reason, we do an aspect ratio test
to attempt to correct the problem. If the 1 is at an angle, then after
resizing and cropping it will appear as a diagonal line going from one
corner to another, rather than
a vertical line. The ratio of height to width of the
original scanned image of a 1 is typically a good bit larger than all the
other numbers. This allows us to separate this image and pad it with blank
space so as to avoid the 1 streching across the entire image.
Initial Resize and Thin
Now that the image has been standardly thickened, we resize the image to a
25x25 pixel image using Matlab's imresize(img,[25 25],'nearest'). Since
the resizing process can cause inconsistent thickness changes in the image,
we again thin the image to get rid of these inconsistencies. With the
standard sized, thinned image, we crop the image so as to remove
all non-data from the outside of the number.

Check Aspect Ratio Again
Since we have just cropped the image, we again run into problems with the
1. If we were to restandardize it at this point, it would often turn into
a big block, rather than the thin 1 we would expect. For this reason, we
again check the aspect ratio and pad if the height to width ratio is large.
Final Resize and Thicken
With the image basically in its final stage, we do the final rescale to
25x25 using the same Matlab command and then thicken it twice to allow the
features of the object to be more easily analyzed. As can be seen, the
similarities between the standardized images are much more apparant than in
the original images of figures 1 and 2.

Matlab Code -- parse2std.m