Re: cpu and ram timings

Date: Wed Feb 20 16:09:32 2002
Posted By: David Ehnebuske, Sr. Technical Staff Member, Software, IBM Corporation
Area of science: Computer Science
ID: 1002247811.Cs

Message:

Hi nikhil -

Thanks for sending in your request for help on such an interesting topic. Hardware system performance optimization is a subject well outside my personal expertise, so I asked for help from my colleague Steve VanderWiel, a computer engineer who specializes in estimating and enhancing the performance of future computer systems for IBM. He has his PhD in electrical engineering, has guest lectured at a couple universities and authored several conference papers and journal articles on the subject of methods to improve the performance of large computer systems. Here's his answer.

nikhil -

While it is true that technologies such as RAMBUS and DDR DRAMs can improve the performance of a computer's main memory, these improvements continue to be outstripped by the advances in CPU speeds. For example, the fastest DDR memories of which I am aware are clocked at 350MHz while CPU speeds have already passed 2.0 GHz. The frequency of a DRAM chip mostly effects how fast data can be brought into and out of the chip. In terms of overall latency, the problem is worse because most DRAM memory cells are accessed asynchronously (the latency of the access is independent of the frequency of the memory chip) so the access time does not change much even though the chip frequencies increase. The rule of thumb for DRAM is a 7% reduction in latency per year. Compare that to Moore's Law which predicts microprocessor clock frequencies will double every 18 months (unlike DRAMs, a microprocessor's clock frequency is a relatively good predictor of performance within a particular microprocessor family). The result is that CPUs still process data much faster than the data can be supplied from memory. So, if your computer contained only a CPU and a DRAM main memory (whether it be RAMBUS or DDR or some other DRAM variant), the bottleneck is definitely the memory.

The usual solution to this problem is the introduction of caches that sit between the CPU and the DRAMs. A common analogy used to describe how caches work is that of a student (the CPU) writing a paper using reference books (the data) from a library (DRAM memory). When the student starts working on the paper he walks over to the library, checks out the book he needs, takes it home, gets the information he needs from it and then returns the book when he needs another one. The problem is the student usually needs to reference the same book at different times while he writes his paper. Using his current scheme, he has to walk over to the library each time he needs to get a new book and will probably go back for the same books over and over again. To solve the problem, the student installs a bookshelf (a cache) in his house. Now, when he needs a new book he leaves the book he was just using on the bookshelf and retrieves the new book from the library. He continues in this way until his bookshelf is full of books. At this point, he must remove a book from the shelf and return it to the library before he can get a new book. To lessen the probability of returning a book he might need in the near future, he keeps track of when he uses each book and always returns the book that he used least recently.

This is essentially how a cache works in a computer system. Whenever the CPU requests a piece of data for the first time, that data is read from the DRAM main memory, sent to the CPU and a copy is also placed in the cache. Later, when the CPU again needs this data, it will check the cache before going to main memory and will find the data present (this is called a cache "hit"). The access time to get data out of the cache is much faster than the access time to the main memory DRAMs (just as it is faster to get a book off a shelf in your home than it is to get it from the library). The cache access time is faster for three reasons : 1) the cache is typically placed in closer physical proximity to the CPU, 2) the cache is much smaller than main memory so it is easier to find the data you want and 2) the cache's smaller size also means that it can be made out of faster, more expensive, SRAM (static RAM) . The downside of the cache's small size is that, once it is full, new data being brought into the cache forces other data out. This is called "eviction" and it is done in such a way that the displaced data is the least recently used piece of data in the cache (the term for this is an "LRU replacement policy"). Even though we use an LRU policy to evict data from the cache, it is may be the case that the we need that same data later on. When this happens (or when data is requested for the first time) a "cache miss" occurs and we have to go back out to memory to retrieve the data. Fortunately, cache misses are typically much less frequent than cache hits.

The next question to ask is "Why are cache misses less frequent than hits?" The answer is most programs exhibit a property called "locality." Locality means that, at any given time, a program tends to reuse data that it has used in the recent past. This subset of frequently used data will change over time, but this tends to happen slowly. The designers of caches understand the principle of locality and this is why caches are designed to retain the most recently used data when an eviction must be done to make room of new data being brought into the cache.

Caches have been enormously successful over the past 30+ years in helping mitigate the performance gap between CPU and DRAM speeds. However, as this gap has continued to grow, caches have had to become both faster and larger to keep systems balanced. To help do this, caches have been integrated onto the same chip as the CPU. This is now so prevalent that people tend to treat caches as an integral part of the CPU. To help increase the amount of data that is cached in the system without making the cache too large and slow, many systems also provide multiple levels of caches. The first level of cache (usually called the L1) is small but very fast, usually requiring only one or two CPU clock ticks to access. A second level cache (L2) is common these days and sits between the L1 and memory. The L2 is much larger but slower (maybe 8-12 CPU clock ticks) than the L1. Typically, the L1 and L2 are both integrated on the same chip as the CPU. In some high-performance systems there is an off-chip L3 cache. These caches have a capacity of several megabytes and even longer access times (30+ cycles) but they are still much faster than memory.

Rather than examining different DRAM technologies, it might be more interesting to explore what kind of memory system (including one or more caches and main memory) makes sense in a modern computer system. For example, given a CPU that makes memory requests every X clock ticks and main memory that has a latency of Y and a bandwidth of Z, what hit rate does the cache need to have to prevent a memory bottleneck? How large does a cache need to be to attain this hit rate? If this sounds like it would make for an interesting project, please let me know and I can provide realistic values for X, Y, Z and any other relevant information. Below, I've included some references that might help. The first reference is a classic computer architecture book used in many beginning computer engineering/science college courses. The next two references are URLs to websites that discuss cache memory and DRAM structures. I would suggest starting with Hennessy and Patterson. This book is an excellent resource and if your local library does not have a copy, tell them to order it because they should. Good luck with your project and please let me know if I can provide any help.

Steve

REFERENCES:

Computer Architecture: A Quantitative Approach, by J. Hennessy and D. Patterson, Morgan Kaufmann Publishers

>">http://www.ecs.umass.edu/ece/koren/ece668/cache/tutorial.html
http://arstechnica.com/paedia/r/ram_guide/ram_guide.part1-2.html

I hope this helps with your project. Again, thanks for asking about this. I certainly learned something new!

David Ehnebuske
IBM Distinguished Engineer

Current Queue | Current Queue for Computer Science | Computer Science archives

Try the links in the MadSci Library for more information on Computer Science.