Re: How does memory caching works?

Date: Sun Apr 8 01:54:06 2001
Posted By: Valdis Kletnieks, Staff, Computing Center, Virginia Tech Computing Center
Area of science: Computer Science
ID: 985781697.Cs

Message:

In the mid 1960's, memory finally became cheap enough (on the order of a million dollars a megabyte - not a dollar a megabyte or so like it is now) that a "cache" memory finally became feasible. At the time, the 'third generation' of computers were being created. One of the first usages of cache memory I can find was on the IBM 360-195, which ran at the then-astounding speed of 10 million floating point operations per second (MFLOPs). IBM had this to say about the cache (which they call buffer storage):

Because of the sequential nature of most programs, a CPE fetch is likely to be followed by suceeding fetches from the same storage block. Access time for subsequent fetches can be considerably reduced by placing the addressed block of main storage in the high-speed buffer storage. Block transfer is controlled by the storage control unit so that use of the buffer storage is not obvious to the programmer.
The increased performance of the Model 195 is due in part to the faster access to instructions and operands provided by the high-speed buffer storage. Whereas normal instruction or operand fetches from main storage require 810 nanoseconds, fetches from the buffer storage require only 162 nanoseconds.
-- From "IBM System/360 and System/370 Model 195 Functional Characteristics" (GA22-6943-4) (Oct 1975), pages 18-19

A few notes - most PC's now use RAM cards for main storage that run at 60 nanoseconds or so. Things have gotten faster since 1967. And for those of you who followed the link to the picture of the 195, the thing with the blinking lights is just the control panel. The actual computer is all the boxes in the background.

OK. Enough history of when they started being used. Now on to the theory..

What the quote from IBM means is that the engineers recognized the concept of "locality of reference". Basically, this means that most programs spend most of their time looping and referencing data located nearby the previous reference. For instance, if a program has just referenced elements 20, 21, and 22 of an array, it's a very good bet that element 23 is next up to be referenced. Now how to benefit from this? Obviously - when the program hit element 20, we should get elements 21 through 23 handy at the same time.

The cache is usually broken up into handy-sized blocks of 64 or 128 bytes or so of memory called 'lines'. There may be 1024 or 2048 or so lines. The cache controller then keeps a table of all the lines, and where in memory they came from.

At each read from memory, the cache controller gets to look first, and checks its table. If it has a copy of that memory location, it's a "cache hit", and it makes a note that line was used, and gives the data to the CPU. Things get more interesting if there's a "cache miss" and the data isn't in the cache. At that point, the cache controller fetches an entire line of 64 or 128 or so bytes from the main memory, gives the CPU its data, and tries to figure out what to do with the rest of the line. Locality of reference says the CPU will probably be asking for more data from this line very soon. Most cache controllers implement "least recently used" or "not recently used" schemes - they pick a line that's not done anything recently, and replace it with the just-fetched line.

Here's a link to a more detailed explanation of the design, with lots of sub-links...

Now we've finally gotten through to where we can answer your original question: "How does it know which data will be used more?". Well.. the truth is that it doesn't know. It has no clue what will be used more. It just makes an intelligent guess based on recent history. With any luck, it's guess will be correct 95% to 98% of the time, and things will run a lot faster.

However, the proper design of cache chips (things like the number of bytes in a line versus the number of lines - is 64x1024 better than 128x512, or least-recent versus not-recent replacement, etc) is generally considered still more art than engineering, and enough programs manage to totally confound the hardware that "thrashing the cache" can be a major performance issue for high-performance computers. For instance, accessing an array with the wrong 'stride' may result in the program only using 4 or 5 of the 1,024 cache lines, and forgetting that Fortran and C access arrays with different subscript orders may result in a cache miss on almost every memory reference).

Fortunately, most 'thrash the cache' cases are seen in large number-crunching programs - the average computer isn't doing anything that can manage to get that out of hand. Most programmers aren't able to write code that's "tight" enough to actually have a noticable cache problem, and the ones good enough to hit that sort of problem learn to fix their code. For most computers, a bigger problem is multiprogramming - if you have multiple programs running, switching from one to another will usually cause a flurry of cache misses until it catches up with the new locality. However, it's not THAT big a deal - as I'm writing this, the Dell PC I'm using is averaging some 600 switches per second with no ill effects....

Current Queue | Current Queue for Computer Science | Computer Science archives

Try the links in the MadSci Library for more information on Computer Science.