Big Willy’s Little Cache
Almost lost among the marketing platitudes of IDF were a few interesting new facts about Willamette. When Intel conceived the new core, they apparently decided that data memory access latency was enemy number one. The centerpiece of their attack on latency is their controversial decision to use a smaller data cache, 8 KB, than in their last four releases of P6 based processors. This also stands in stark contrast to the 64 KB data cache in arch competitor AMD’s K7 Athlon processor line. It is well known that the larger a cache is the greater chance it contains the data item the CPU needs next. As a useful rule of thumb, quadrupling the size of a cache cuts its miss rate in half.
Associativity indicates the number of places in a cache that any particular piece of data could be placed. The more places a data item can go, the less chance that several frequently used pieces of data will fight over the same place(s) in the cache and keep knocking each other out, forcing so-called conflict misses. Associativity also happens to be more effective in reducing the miss rate of small caches than large ones.
Although the Willamette data cache is 1/8th the size of the AMD K7’s data cache, it is more highly associative: 4-way versus 2-way. As a result of these two factors, the miss rate of Willy’s little data cache is only about 2.2 times higher than the much larger data cache in the K7 for most programs run on a PC.
With all this extra area, and more transistors (42 million versus 28 million for coppermine) why not put in a 32, 64, or even 128 KB data cache? The only reason that makes sense is the war on data access latency. Because they wanted their data cache to operate with a 2 cycle latency, Intel could not make it larger than 8 KB in size in their 0.18 um process without placing it on the critical timing path, and thus limiting the processor clock rate (demonstration systems at IDF ran at up to 2 GHz). Larger memory arrays require longer access time, and as a result the larger data cache in the K7 Athlon has a latency of 3 clock cycles.