MadSci Network: Computer Science
Query:

Re: How compilers are made

Date: Thu Dec 14 12:37:40 2000
Posted By: Mike Westerfield, Staff, Computer Science, Byte Works, Inc.
Area of science: Computer Science
ID: 976526168.Cs
Message:

As it turns out, you’re right on both counts. Compilers are usually written in a language that most people know, often the same language they compile. Compilers are also frequently written with tools most people don’t know about. Here’s how it happens:

Modern day compilers are usually written in a fairly common, popular, general-purpose programming language. Most compilers I’ve seen are written in C or Pascal, although I’ve run across compilers written in many other languages. In that sense, compilers are no different from any other large program. Just about any language that can read and write files can be, and has been, used to write a compiler.

On the other hand, compilers are often written using specialized tools. These tools take a program-like description of the computer language as input and produce a kind of database as output. An engine that is compatible with the tool uses the database to decide how to take the text you type and assign meaning to the symbols. It then calls subroutines to create the final code that makes up a complete program. Those subroutines are written in a language that matches the tool.

It doesn’t have to be that way, though. It’s not really all that hard to write a compiler without specialized tools, and many compilers are written without them. The most popular technique is to use recursive descent to analyze the program. If you’re interested in compilers, that’s a good place to start.

The best way to understand all of this is to read about a few small but real compilers, or perhaps even port one of the compilers or implement one from scratch. With that in mind, here are a few places you might start.

The biggest problem with writing a compiler is that they are, well, big. For a full-blown language like C, C++ or Pascal, it generally takes one or two programmers one or two years of full time work to complete a good compiler. That’s not something you’re likely to do as a hobby! On the other hand, you can learn almost all of the compiler techniques the big compilers use with smaller languages. That’s what subset compilers are all about. You can implement a subset compiler in a few weeks of part-time programming.

A fantastic introduction to recursive decent parsers (and recursion in general) is Chapter 5 of "Algorithms + Data Structures = Programs," Wirth, Prentice-Hall, 1976, ISBN 0-13-022418-9.

One of the earliest, and still one of the best, compilers to start with is a small Pascal compiler called Tiny Pascal that originally appeared in Byte Magazine. The articles appeared in the September, October and November issues in 1978. They are reprinted in "The BYTE Book of Pascal," Bliase Liffick, BYTE Books, 1979, ISBN 0-07-037823-1. Your local library should be able to track down a copy. The articles themselves are the best short introduction to compilers I know of. The compiler isn’t all that great, but the articles more than make up for the compiler.

Another cool small compiler is Small C, which originally appeared in the May 1980 issue of Dr. Dobb’s Journal. Ron Cain wrote the original Small C compiler. James Hendrix extended it and wrote a fine book called "The Small- C Handbook," Prentice-Hall, 1984, ISBN 0-8359-7012-4. This has been collected on a CD-ROM; you can find information at http://www.parkway.co.uk/cd/smallc.htm. You can also find several versions of the compiler at http://deltasoft.fife.wa.us/cpm/archive/unofficial/small_c.html.

One of the most ported compilers of all time is the P4 compiler. Unlike the first two examples, this one compiles a large subset of the Pascal language. It is also the basis for several successful commercial implementations, including the UCSD Pascal compiler that helped popularize Pascal. The P4 compiler is written in itself. It creates p-code, sort of like modern Java byte code. You can write an interpreter for the p-code in just about any language, and once you do, the P4 compiler starts to run.

The P4 compiler is in the public domain. You can find it online at http://www.cwi.nl/~steven/pascal.html, among other places. There is also a fine book that explains the rather poorly commented compiler. Look for "Pascal Implementation, The P4 Compiler," Pemberton & Daniels, John Wiley & Sons, ISBN 0-470-27368-0.

Tiny Pascal and Small C are both pretty easy for a beginner to tackle. The P4 compiler, along with the reference book I mentioned, is a bit harder, but you can handle it with some effort. If you want to dig deeper, though, you’ll want a more formal textbook. In my opinion, the finest mid-level compiler book ever written is "Principles of Compiler Design," Aho & Ullman, Addison-Wesley, 1979, ISBN 0-201-00022-9. This is the famous "dragon book." The cover shows a knight armed with the lance of LALR parser generators, the shield of syntax directed translation and a steed armored with data flow analysis attacking a dragon labeled "complexity of compiler design." The back transforms this into a Don Quixote scene, implying that it really isn’t that hard to write a compiler if you use the right tools and techniques. Maybe. Above my desk is a stained glass of a dragon picking his teeth with a broken lance. Bits of armor lay around the dragon. It’s titled, "Sometimes the Dragon Wins!" (I write compilers and interpreters for a living.)

Incidentally, there is a second edition to this book. I like the first better. The second edition covers a lot of topics in more detail, but the first edition is shorter and easier to follow. Get the second edition if you intend to write compilers for a living. If you want an intermediate introduction, though, the first edition has everything you really need.

While I have my favorites, there are a lot of other good compiler resources out there. Most of the books and articles I quoted are rather old. That’s because there were a lot of beginners diving into software development in the late 70’s and early 80’s and some fine writers who tried to give them what they wanted. You can still find all of these through your local library. Most of the recent books cater to a more advanced audience. There’s nothing wrong with them, but you may find the sources I listed easier to understand, even if they don’t go into quite as much detail.

If you just want to look at a compiler without much explanation, try an online search for "compiler source" along with the language you’re interested in. There are dozens of compilers out there that you can download and play with.

There are also a lot of compiler groups online. If you decide to port an existing compiler, or write one of your own, these groups can help a lot. Poke around a bit and find one that suits your interest--and the language you pick!

Mike Westerfield
Byte Works, Inc.


Current Queue | Current Queue for Computer Science | Computer Science archives

Try the links in the MadSci Library for more information on Computer Science.



MadSci Home | Information | Search | Random Knowledge Generator | MadSci Archives | Mad Library | MAD Labs | MAD FAQs | Ask a ? | Join Us! | Help Support MadSci


MadSci Network, webadmin@www.madsci.org
© 1995-2000. All rights reserved.