MadSci Network: Computer Science |
Let's see... I think there's two questions in there: 1. a generalized "how do you do that?" and 2. "how do you know you're not fooling yourself?" The two technologies involved are called "image processing" and "pattern recognition". The enhancement of the brick-thrower video is "image processing". Identifying features in an image such as cloud shapes or realizing that the brick thrower has a rose tattoo is "pattern recognition". Both of these technologies are broad areas of study in electrical engineering and computer science; I won't attempt to summarize these here except to hint at some of the techniques. A Web search on "image processing" or "pattern recognition" will yield many, many links to university programs, technical societies, and software products related to these technologies. Part 1: IMAGE PROCESSING Image processing is the mathematical manipulation of a digitized image. You probably are aware that a computerized image is comprised of hundreds of thousands, or even a few million pixels (colored dots) arranged in rows and columns. The color of each pixel is typically described by 3 numbers that tell the hue, luminance, and saturation of that pixel's color (equivalently, the intensity of red, blue, and green light emanating from that pixel). Using mathematical operations, different characteristics of the picture can be changed. By this, I don't mean adding rose tattoos to people's arms or adding funny cartoon dialog bubbles to a picture of Bill Clinton and Newt Gingrich. Instead, think of the brightness, contrast, and color control knobs on a color TV set. By turning these knobs you adjust the characteristic of the picture: the average brightness of the picture changes, or the difference between dark and light parts of the pictures changes, or people's face go from normal to red to green to purple. The content of the picture hasn't changed, but aspects of how it appears has changed. A color TV does this via analog signal processing; these changes can also be accomplished by mathematical operations on the 3 numbers for each pixel. In addition, other mathematical operations can be defined to sharpen edges in the picture (changes in light intensity or color), to deblur the image (such as due to an out-of-focus lens), or to reduce noise in the picture (by averaging adjacent pixels together or by averaging corresponding pixels from multiple frames of a video together). A combination of these techniques were probably used to sharpen the video image and reduce the noise (actually, improve the signal-to-noise ratio) of the brick-throwing video. But none of these can really change the content of the picture, any more than twiddling the knobs of your TV set will turn, say, "60 Minutes" into "Lavergne and Shirley". PART II - PATTERN RECOGNITION Chances are you are reading this message through one of the world's finest pattern recognizers: your eyes and brain. When we look at clouds and see cats while others see oak trees it's because evolution has wired us to learn to recognize patterns. We look at a certain place on Mars and see a monument to an ancient astronaut; we take a closer look two decades later and see that it's really just some hills and big rocks. We look at the profile of a granite mountain in New Hampshire and see "the man in the mountain". We look at a potato with a funny bump and see Richard Nixon's nose. We see a pattern of dark and light color on a TV screen and say "hey, that's a rose tattoo!" We see these things because we have learned to intrepet the pattern of optic nerve impulses caused by photons falling on the rods and cones in our retinas. Someone who has never seen Richard Nixon can look at many potatoes and never recognize him. To limited extent, a machine can be built to look for certain patterns of numbers associated with a group of pixels. For example, it's easy to define the pattern of numbers that a circle should make. The fundamental concept in pattern recognition is correlation: the machine compares the pixels it is presented to a template representing a circle, and it will tell you how close to a circle these pixels seem. Better yet, the machine may be built to indicate only those groups of pixels that correspond will to the machine's template for a circle. That's image recognition: we build the machine with a template for circles, provide it an image, and it compares the pixels to the template, declaring that it "sees" a circle in the lower left corner and the center of the image, for example. Building a machine to recognize circles, say, is easy. It's a little harder to build it to recognize straight lines (roads, for example). It's a little harder to get it to recognize marks on a paper as letters of the alphabet (optical character recognition) It's harder yet, but been done, to get it to recognize fingerprints. But it's still really, really hard to get it to recognize faces, on the other hand. (And never mind what Hollywood movies say...) In recognizing patterns in an image, two kinds of errors are typically considered. There is the error of falsely declaring a pattern that we're looking for to be present when it is not, and there is the error of missing the pattern we're looking for when it is indeed present. This is getting at the issue of "are we fooling ourselves?" We look at a cloud and see a cat; this is an error of the first kind. There is no cat in the clouds, but the pattern recognizer in our brain has matched the contours of the cloud to our brain's expectation of how a cat looks (or the bumps on a potato to how we think Richard Nixon looks). Conversely, we look at video and see a mark on a brick thrower's arm, but can't recognize it; we don't see the pattern which is there, and make an error of the second kind. It's not until we do some image processing to enhance the appearance of the image (in other words, to give us a better set of glasses to look at the scene), that we can say "aha! a rose tattoo!". For a complex scene such as this video, I'd bet a cup of coffee that it was several sets of human eyeballs that recognized a rose tattoo on the brick-thrower's arm. One must be careful to balance the two kinds of errors: it's innocuous when you look at a cloud and see a cat but I look and see nothing but a cloud. Something's amiss, though, if you look at clouds and see the entire collection of the Louvre, or if your OCR software decides that the result is the script to "Macbeth" no matter what document has been scanned. Similarly, "none are so blind as they that will not see"; ever had a friend point and say, "wow, look at that?" and you say "what, I don't see it" and suddenly, there "it" is? Ever been greeted by an old acquaintance and not recognized them? These are examples of the second kind of error, which is not properly recognizing the pattern which is actually present. Back to your example: "they made out a rose tattoo, and not a four leaf clover for instance and they had their man". I've discussed the technical principles involved and the types of errors that can be made; the issue you raise is beyond the realm of science and one of justice. This is why we leave the ultimate decision up to a judge or jury, who sorts out the evidence and decides what to believe beyond a reasonable doubt and what to discard. Steve Czarnecki
Try the links in the MadSci Library for more information on Computer Science.