|MadSci Network: Computer Science|
Yes, this answer is lengthy. If nothing else, at the bottom of the article I've placed a number of resources and suggestions to help prevent yourself from becoming roadkill on the Infobahn. Please read them, do them, and get anybody else you can to do them to.
Although the word 'hacker' now often refers to a computer vandal, at one time it had another meaning: One who enjoys working with systems and pushing them to their limits - see hacker in Eric Raymond's excellent Jargon File (which I link heavily here) for details. Eric even has this emblem for old-style hackers. Those who still believe (including myself) the old-form meaning use the word cracker for a computer vandal)
Also, if you're expecting to actually find out how to crack into a system, you can move along now - I'm a professional white hat, so all you're going to get from here is a fairly detailed explanation of the theory. Black hat wannabes can look elsewhere....
Now how to actually answer the question asked.... Hmm.. there's the ultra-short answer, the short answer, and a reasonably complete answer.
The ultra-short answer: Because computers are very literal creatures, and programmers are only human, and Things Go Wrong.
The short answer: An average "production" computer program can easily run to hundreds of thousands or possibly millions of lines - for instance, the Linux 2.6.0 kernel will probably weigh in at around 5.9 million lines of C code. The GUI (XFree86 4.3.0) is another 17 million or so lines of code, and the Mozilla browser about the same. And the best estimates I've seen for Microsoft products are even higher (25M for Internet Explorer, and another 30M for Windows 2000). Nobody can understand all the interactions of all that code. Quite often, even debugging it so it can be run at all is quite difficult:
-- The Hacker and the Ants, chapter 7, Rudy Rucker
Hacking is like building a scale-model cathedral out of toothpicks, except that if one toothpick is out of place the whole cathedral dissapears. And then you have to feel around for the invisible cathedral, trying to figure out which toothpick is wrong. Debuggers make it a little easier, but not much, since a truly screwed-up cutting-edge program is entirely capable of screwing up the debugger as well, so that then it's as if you're feeling around for the missing toothpick with a stroke-crippled claw-hand.
But, ah, the dark dream beauty of the hacker grind against the hidden wall that only you can see, the wall that only you wail at, you the programmer, with the brand new tools that you made up as you went along, your special new toothpick lathes and jigs and your real-time scrimshaw shaver, you alone in the dark with your wonderful tools.
Please note that Rucker is using the old meaning of 'hacker' as 'wizard', not 'one who breaks into systems'.
Yes, programming is that hard. You think writing a 200 page book would be hard - imagine if leaving off a comma on page 17 made chapters 9, 17, and 23 evaporate... So the programmer will screw up. Most of the bugs will never be tripped over, and nobody will care. Some of the bugs will be of the brown paper bag variety, and will hopefully be fixed quickly. Unfortunately, there's a very wide middle ground where things can go wrong in subtle ways, that a cracker can use to break in to a system.
OK.. Having said all that, let's get to the gory details. And I apologize if some of the citations make some eyes glaze over. I'm including them in case somebody wants to do further research - many qualify as 'classic papers' in the computer security community.
There are a lot of ways to break into a computer system, and some of them are incredibly fiendish. Ken Thompson (one of the creators of the Unix operating system) described one of the most clever ways ever devised to break into a computer system when he gave his Turing Award Lecture "Reflections on Trusting Trust" for the Association for Computing Machinery (for the curious, the 'unknown Air Force Document' he references is Karger and Schell's work over 30 years ago. They did a very interesting 30-years-later retrospective).
Still awake? Eyes glazed over yet?
In actual practice, most of the ways mentioned in the papers above are only of practical (rather than theoretical) concern if you're attacking (or defending) a high-profile target - a bank, a military computer, a large online business, etc. Also, many of them require that you already have some access to the computer (for instance, you have a 'guest' account and want to become 'administrator). The average cracker is lazy, and will go for low-hanging fruit. There's no use in spending 10 minutes demonstrating your skill at picking the lock on the front door if the back door is unlocked... So they use the simplest attack that works. And in fact, the vast majority of successful attacks are not done by talented black hats, but are either fully automated attacks by worms or semi-automated sweeps for vulnerable systems by script kiddies using canned software they barely understand. It's not at all unusual to find a script kiddie who's gained control of anywhere from 10,000 to 50,000 computers, and a group of spammers recently advertised that they had 450,000 computers under their control - and that claim was accepted as quite credible.
Most of the widely used remote attacks fall into several broad categories, "improper parameter checking", "buffer overflows", "injection attacks", "insecure configuration", and "self-inflicted". Sometimes, a particular attack will use components of 2 or more categories - one recent exploit discussed on several computer security mailing lists combined six small vulnerabilities step-by-step into something that could completely subvert the computer.
Improper parameter checking attacks include all the cases where the programmer has simply bobbled the verification of information supplied by the attacker. It may be as simple as forgetting to make sure that if a multiple-choice has A/B/C/D that the attacker didn't enter 'E and an order of fries', or it may be a subtle combination of slightly bad data to confuse the computer. The vast majority of remote attacks against software bugs are some variant of this, with two main subclasses:
Buffer overflows are a major subclass of parameter checking attacks. Basically, you tell the computer to put 10 pounds of something into a 5 pound bag, and things break. The way this works is fairly technical, but basically boils down to a programmer not checking the length of a "buffer", which is a term for storage in a program, usually for a string of characters. So for instance, a programmer may create a 20-character long buffer for a name. If they then actually check that the user entered 20 characters or less, there's no problem. If however they forget to check, the attacker can send a very long string for the name (possibly thousands of characters long - one famous exploit involved sending 25 megabytes or so of data). The attacker will very carefully format the string so that in addition to being a string of characters being read into 'name', it is also executable code for the computer (remember that it's all ones and zeroes inside). So the string might be "James A [computer binary code to send me all the system passwords]aaaaaaaaaaaaaaaaaaaaaNNNN". The trick is that there's enough 'a's so that the NNNN happens to land on a special memory location called the 'return address'. So when the program tries to return from the "read a name" subroutine, instead of going back where it should, the NNNN will send it to that binary code to send all the passwords. The classic paper on the details of how this works is Smashing the stack for fun and profit by "Aleph One".
Injection attacks are another subclass of parameter checking attacks. These are a little easier to understand, as they don't make your brain hurt quite so much. These are mostly variants on the "and a side of fries". For instance, a web site might ask for your customer number - but what the attacker might send would be '22343234; and send me a color TV for free". If the attacker gets lucky, at some point the web site will try to use the account number in a database, and the database will also accept an order for a TV....
Insecure configuration is the name for all the truly silly things people do when they set up computers - 'Administrator' accounts with no passwords, sharing their C: drive read/write to the entire Internet, and so on. It's not particularly hard to write a program that just wanders the net, going machine to machine, looking for computers to infect. It's a serious problem because almost every computer has issues here, right out of the box. You buy it, you bring it home, you unpack it, you hook it to the Internet - and it's got security problems.
Self-inflicted cracks include all those cases where the user has agreed to let the software onto their computer. Users will click an e-mail attachment, or download a program that includes spyware or a Trojan Horse. Yes, this is a major way for things to get into your computer. Many users will say "ooh shiny [click]" if it's something that claims to give you dancing hampsters or pictures of some pop star wearing less clothing than she should. Guess what? Usually, that hampster is dropping a virus or similar on your system... And if you installed KaZaa, did you actually read the user agreement before it installed malware on your computer?).
What can you do to protect yourself?
First off: patch your system. Apple's MacOS and most Microsoft systems have a "update software" feature. Use it. Most of the 'Internet worms' that make the news have had patches out for anywhere from 1 to 3 months before the worm hits.
Second: take backups. Many computers come with CD-writers now - use them to back up your data. Yes, this can be a pain when you have to feed it 10 CD's to back things up. No, it won't directly help your security, but it will help if you have to re-install because some malware trashed your system. It also saves your skin if your hard drive suddenly starts making those grinding noises that indicate it just died, or even just if you realize a split second too late that something important got dragged onto that trash can icon....
Third - especially important for Microsoft systems: install an anti-virus package. I don't care which - all of the big vendors are quite good. Just make sure to download new templates (setting it to auto-check once per day would not be unreasonable).
Fourth: Almost all computers are shipped with horribly insecure configurations. Tighten down your system - turn off things you don't need, fix things that are too wide-open. If you need information on what to do, I can recommend the benchmarks from the Center for Internet Security. If you look at that, and it's too complicated for you, find a tech-savvy friend and bribe them with a pizza or something. It's not expected that every computer can implement every setting on those benchmarks, but it would help the Internet's security a lot if every computer did all the ones it can. (Disclaimer: I helped extensively in developing the Solaris and Linux benchmarks for the CIS)
Fifth: (Somebody hand me a soapbox, please?) Consider using something other than Microsoft products. They have a long history of security issues, and the fact that 95% of the computers run it is itself a major problem. Possible options include Apple Macs (if you can afford them), and the various BSD and Linux operating systems (personally, I run a heavily tweaked version of the Fedora Linux distribution, but there's plenty of other Linux options, including the Redhat, Suse, and Debian distributions).
Try the links in the MadSci Library for more information on Computer Science.