Step 1: Parsing the image
About the image
Well, it should be noted first that we assumed certain things about the
scanned image, in order to make the problem focused more on image
recognition, rather than trying to fix everything that could go wrong. We
assumed that the scan was made without extraneous noise in it, at a
resolution that would show enough detail (e.g. numbers were not 2 pixels
tall) and that the zip code was located on the last line, the five
right-most characters. These assumptions are reasonable in that most
people tend to follow them, anyways. Here is an
envelope we used to test, and which we will use to illustrate our function.
Finding the last line
So first thing we do is read in the file using Matlab's built-in
imread function. After converting everything to either black or
white and inverting the image (Matlab considers white data and black
non-data, while humans tend to work the other way), we have a very large
array.
What we do is sum each of the pixel rows, adding up the number of pixels on
each row, essentially. Those rows with a zero sum obviously have no data
in them. We start at the bottom of the image, moving up, skipping over any zero-sum
rows, until we find a row with data in it. Then we move up until we find a
row with no data in it. There are also special cases in our code in case
the bottom line contains data, and so on. Now we have defined the top and
bottom of the last row in the envelope, as such:
We save the highlighted area into a seperate matrix, just to be clever.
Finding the zip code digits
Finding the zip code once we have the last line is remarkably similar - we
sum all the columns together, and search for areas of data bounded by areas
with no data, each time saving the digits we're peeling off into separate
matrices. It's that easy.
Here is how our parse function would parse the above line:
For good measure, here is our Matlab function
which accomplished all of these amazing feats.