Conclusions

Optimum segmentation of pictures is achieved only through a specific filtering of each individual image/text block of a particular picture.  We were able to adjust our threshold values for our first image (Backstreet Boys) to produce an accurate classification of the image.  However, if we tried classifying another image (Britney & Christina) with the same boundaries, our classification result was not as well.

Through experimentation, we discovered that Haar wavelets are best at detecting text, and Daubechies wavelets are best at detecting picture.  Knowing this, we could probably get the best classification result by using the Haar transform in areas where there is predominantly text, and the Daubechies where there is predominantly picture.  If we could determine where on a page there is more likely to be text/image, we could produce a better classification for our image.

The criterion we used to classify our images was the variance of the wavelet coefficients in the high frequency bands.  We were able to do this because the histogram of these coefficients for text and for picture differ in distribution.  If we could analyze the distributions through more advanced statistical methods, we would have a more accurate way of distinguishing between text and picture, and classifying our image.

Image segmentation is a good foundation for computer generated caricature development.  With more advanced statistical methods, we can better analyze the distribution of the wavelet coefficients and better detect differences in texture.  Given a scanned face, we can analyze the wavelet coefficients of the mouth and the nose, and develop a technique to distinguish between the two.   Knowing the particular regions of a scanned face (nose, mouth, eyes), will allow us to manipulate a scanned face and produce a caricature.