Step 3 - Variance Computation

In order for our algorithms to correctly identify numbers, standard profiles had to be created for digits 0 through 9. In establishing these profiles, we realized that our vast data bank of typewritten fonts had a wide variety of representations for the same number. Our number identification process calculates the variance of an image compared to a set standard profile. Therefore, if we broadened our standard profile set to include different representations of the same number, the probability of achieving the lowest variance for correct identification increases.

Creation of Standard Profiles

To complete our standard profile set, we found that general characteristics of a number were often shared between fonts. After identifying these characteristics and the fonts that shared them, we compiled an "average" profile of the number using our graygrid Matlab function.

Graygrid takes a 25 X 25 pixel image of a number and takes the average of the pixels contained within each adjacent 5 X 5 pixel area. We often found that the resulting 5 X 5 pixel image did not always look like the original image, but by averaging the graygrid results of the same number from several fonts, we got a little closer to what we were looking for.

Example

Let's take the number seven as an example:

Using the parse2std function, we displayed all 35 versions of the number seven from our data set of fonts:

After lengthy analysis of our results, we came to the conclusion that there were four main characteristics that might distinguish one image of a seven from another.

OVERHANG NO OVERHANG
TAIL HITS BOTTOM MIDDLE TAIL HITS BOTTOM LEFT CORNER

By averaging the graygrid results of the fonts that exhibited these specific characteristics, we ended up with four standard profiles for the number seven.

Other Standard Profiles

[0] [1] [2] [3] [4] [5] [6] [8] [9]


Next
Previous