Re: From pattern recognition to image enhancement software?

Date: Fri Oct 30 16:30:28 1998
Posted By: Steve Czarnecki, senior technical staff member, Lockheed Martin
Area of science: Computer Science
ID: 908407354.Cs
Message:

Let's see... I think there's two questions in there:

1. a generalized "how do you do that?" and
2. "how do you know you're not fooling yourself?"

The two technologies involved are called "image processing" and "pattern 
recognition".  The enhancement of the brick-thrower video is "image 
processing".  Identifying features in an image such as cloud shapes or 
realizing that the brick thrower has a rose tattoo is "pattern 
recognition".

Both of these technologies are broad areas of study in electrical 
engineering and computer science; I won't attempt to summarize these here 
except to hint at some of the techniques.  A Web search on "image 
processing" or "pattern recognition" will yield many, many links to 
university programs, technical societies, and software products related to 
these technologies.

Part 1: IMAGE PROCESSING

Image processing is the mathematical manipulation of a digitized image.  
You probably are aware that a computerized image is comprised of hundreds 
of thousands, or even a few million pixels (colored dots) arranged in rows 
and columns.  The color of each pixel is typically described by 3 numbers 
that tell the hue, luminance, and saturation of that pixel's color 
(equivalently, the intensity of red, blue, and green light emanating from 
that pixel).  

Using mathematical operations, different characteristics of the picture can 
be changed.  By this, I don't mean adding rose tattoos to people's arms or 
adding funny cartoon dialog bubbles to a picture of Bill Clinton and Newt 
Gingrich.  Instead, think of the brightness, contrast, and color control 
knobs on a color TV set.  By turning these knobs you adjust the 
characteristic of the picture: the average brightness of the picture 
changes, or the difference between dark and light parts of the pictures 
changes, or people's face go from normal to red to green to purple.  The 
content of the picture hasn't changed, but aspects of how it appears has 
changed.

A color TV does this via analog signal processing; these changes can also 
be accomplished by mathematical operations on the 3 numbers for each pixel.
In addition, other mathematical operations can be defined to sharpen edges 
in the picture (changes in light intensity or color), to deblur the image 
(such as due to an out-of-focus lens), or to reduce noise in the picture 
(by averaging adjacent pixels together or by averaging corresponding pixels 
from multiple frames of a video together).  

A combination of these techniques were probably used to sharpen the video 
image and reduce the noise (actually, improve the signal-to-noise ratio) of 
the brick-throwing video.

But none of these can really change the content of the picture, any more 
than twiddling the knobs of your TV set will turn, say, "60 Minutes" into 
"Lavergne and Shirley".


PART II - PATTERN RECOGNITION

Chances are you are reading this message through one of the world's finest 
pattern recognizers: your eyes and brain.  When we look at clouds and see 
cats while others see oak trees it's because evolution has wired us to 
learn to recognize patterns.  We look at a certain place on Mars and see a 
monument to an ancient astronaut; we take a closer look two decades 
later and see that it's really just some hills and big rocks. We look at 
the profile of a granite mountain in New Hampshire and see "the man in the 
mountain".  We look at a potato with a funny bump and see Richard Nixon's 
nose.  We see a pattern of dark and light color on a TV screen and say 
"hey, that's a rose tattoo!"  

We see these things because we have learned to intrepet the pattern of 
optic nerve impulses caused by photons falling on the rods and cones in our 
retinas.   Someone who has never seen Richard Nixon can look at many 
potatoes and never recognize him.  

To limited extent, a machine can be built to look for certain patterns of 
numbers associated with a group of pixels.  For example, it's easy to 
define the pattern of numbers that a circle should make.  The fundamental 
concept in pattern recognition is correlation: the machine compares the 
pixels it is presented to a template representing a circle, and it will 
tell you how close to a circle these pixels seem.  Better yet, the machine 
may be built to indicate only those groups of pixels that correspond will 
to the machine's template for a circle.  That's image recognition: we build 
the machine with a template for circles, provide it an image, and it 
compares the pixels to the template, declaring that it "sees" a circle in 
the lower left corner and the center of the image, for example.

Building a machine to recognize circles, say, is easy.  It's a little 
harder to build it to recognize straight lines (roads, for example).  It's 
a little harder to get it to recognize marks on a paper as letters of the 
alphabet (optical character recognition) It's harder yet, but been done, to 
get it to recognize fingerprints. But it's still really, really hard to get 
it to recognize faces, on the other hand. (And never mind what Hollywood 
movies say...)

In recognizing patterns in an image, two kinds of errors are typically 
considered.  There is the error of falsely declaring a pattern that we're 
looking for to be present when it is not, and there is the error of missing 
the pattern we're looking for when it is indeed present.  This is getting 
at the issue of "are we fooling ourselves?"  We look at a cloud and see a 
cat; this is an error of the first kind.  There is no cat in the clouds, 
but the pattern recognizer in our brain has matched the contours of the 
cloud to our brain's expectation of how a cat looks (or the bumps on a 
potato to how we think Richard Nixon looks).

Conversely, we look at video and see a mark on a brick thrower's arm, but 
can't recognize it; we don't see the pattern which is there, and make an 
error of the second kind.  It's not until we do some image processing to 
enhance the appearance of the image (in other words, to give us a better 
set of glasses to look at the scene), that we can say "aha! a rose 
tattoo!".  For a complex scene such as this video, I'd bet a cup of coffee 
that it was several sets of human eyeballs that recognized a rose tattoo on 
the brick-thrower's arm. 

One must be careful to balance the two kinds of errors: it's innocuous when 
you look at a cloud and see a cat but I look and see nothing but a cloud.  
Something's amiss, though, if you look at clouds and see the entire 
collection of the Louvre, or if your OCR software decides that the result 
is the script to "Macbeth" no matter what document has been scanned.  
Similarly, "none are so blind as they that will not see"; ever had a friend 
point and say, "wow, look at that?" and you say "what, I don't see it" and 
suddenly, there "it" is?  Ever been greeted by an old acquaintance and not 
recognized them?  These are examples of the second kind of error, which is 
not properly recognizing the pattern which is actually present.

Back to your example: "they made out a rose tattoo, and not a four leaf 
clover for instance and they had their man".  I've discussed the technical 
principles involved and the types of errors that can be made; the issue you 
raise is beyond the realm of science and one of justice.  This is why we 
leave the ultimate decision up to a judge or jury, who sorts out the 
evidence and decides what to believe beyond a reasonable doubt and what to 
discard. 

Steve Czarnecki
Current Queue | Current Queue for Computer Science | Computer Science archives
Try the links in the MadSci Library for more information on Computer Science.