Paul DeMone van RealWorld Technologies heeft part I van zijn Intel Willamette artikel online gemikt. Dit eerste deel gaat voornamelijk over de geschiedenis van Willy en zijn welbekende voorganger; de P6 core die gebruikt wordt in de Pentium Pro, Pentium II en Pentium III en Celeron. Verder wordt de architectuur van de Willamette core globaal besproken:
Willamette is a very deeply pipelined processor. It uses 20 pipe stages to execute integer instructions including the 4 pipe stages associated with fetching uops from the trace cache. If you included the pipeline stages associated with fetching x86 code from the L2 cache, decoding it into uops, and loading uops and program mapping/flow information into the trace cache the total number of pipelines stages probably approaches 30 or more.The branch misprediction penalty appears to be at least 19 clock cycles when the correct path is present in the trace cache. If the trace cache misses, then the branch mispredict penalty is considerably higher. This compares to a minimum branch mispredict penalty of 11 cycles for the P6 core. The P6 uses the two-level Yeh and Patt adaptive branch prediction scheme. Despite the fact that the P6 predicted branches correctly around 90% of the time it still lost about 30% of its potential performance due to branch mispredicts. Although the Willamette will no doubt use more modern branch prediction techniques like gshare and dynamic prediction strategy selection, its huge mispredict penalty will make its performance very sensitive to the efficacy of its branch prediction algorithm(s) on the particular code being run.