COMP 130 Elements of Algorithms and Computation
Spring 2012

Bayesian Inference Example with Digit Recognition

For electronic digit recongition, the first step is to digitize the image.   For our purposes here, we will use a simple black-or-white digitization on a 8x8 grid of pixels.   Note that a real system would probably use a gray-scale digitization, but that simply complicates our analysis here by increasing the size of our space by a factor of 256.

Here's a nicely digitized "8":

               
               
               
               
               
               
               
               

But it could be like this:

               
               
               
               
               
               
               
               

Or this

               
               
               
               
               
               
               
               

Is this  "5" or a lousy "8"?

               
               
               
               
               
               
               
               

Let's think in terms of priors, our original notion of the world.   Initially, we believe that the universe of digits is separated into 10 distinct, non-overlapping pieces:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Given someone's badly handwritten numeral, we ask what is the probability it is a 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9?    Obviously, the highest probability is the one we will use as our guess.   But our original universe says that all the digits are equally probable, so the probability for any digit is a) only 10% and b) the same for every digit, so we can't make a decision about which digit the write scribbled down.   These, unfortunately, are our priors.

How can we increase our odds of detecting the intended digit?

Solution:  

We'll do an experiment.    The experiment is that we will digitize the handwritten number into an 8x8 pixel grid.    This sub-divides our universe into 264 possibilities.  We then try to find the likelihoods, i.e. the conditional probabilities.   These conditional probabilities are of the form:

P(pattern | digit) 

This is the probability of any given pattern (one of 264 ) if we know what the intended the digit already is.    We do this by measuring the probabilites of each pattern, given a training set of data where the intended digit is known for any pattern.    Clearly, we need lots of examples in order to get decent statistics here.

It is very important to realize that the last pattern shown above would probably show up in (at least) two places in our training data:  as a possible pattern for an "8" and as a possible pattern for a "5".     Given how people hand-write, we expect it to be fairly common that the same pattern to show up as a possibility for multiple digits.

But in the end the questio we want to answer is, what is the probability that a given pattern is a given digit?   Luckily, Bayes' Theorem tells us that

P(digit|pattern) = P(pattern|digit) * P(digit)/P(pattern)

At the point when we are trying to determine whether a given hand-written numeral is what digit, we know the digitized pattern that we are working with.    The likelihoods , P(pattern | digit), are known for each digit.   

P(digit) is known by analyzing the training data -- don't assume that it is 10% because the digits may not be evenly distributed in the training data.

P(pattern) is also known by simply summing the occurrences of a given pattern across the entire training data set.

Conclusion:

We calculate P(digit|pattern) for every digit given the pattern we digitized from the handwritten numeral.     Which digit do we guess was intended by the writer?   The one with the highest P(digit|pattern) for our pattern, of course!

 

For a far more rigorous and detailed treatment of digit recognition, please see remainder of Prof. Subramanian's Comp140 lectures on the subject:

Comp200 students:  Part 5.

Comp130 students:  Part 5.