The K7 has a surprisingly simple branch predictor for a cpu with such a long pipeline, where a mis-branch prediction is much more costly than a shorter pipelined CPU such as the K6. The K7 uses a 2048-entry branch prediction table (BHT) while the K6 has a large 8192-entry BHT. The 2 bit Smith prediction algorithm is also much weaker than the exotic 2 level GAs predictor used in the K6. Also the L2 tag ondie only supports up to 512KB 2way set associative L2 cache not 4way like PII/III do with much slower off die tags.
Also, K7 only has 24 32 bit internal integer registers, which is very small when you realize that most modern RISC architecture have 32 or 40 logical integer registers and EPIC has even more. Also, to save die space and shorten speed path, AMD did not implement bypass circuitry, so that a write has to occur before a read. This is so in both floating point and integer registers, just that the integer registers have to delay the read access until write and tag comparisons complete. The 36-entry scheduler space in the FPU unit is split into three groups of 12 entries, one group for each excution pipeline, so if one's code isn't evenly using the 3 pipelines, the reorder space will appear to be a bit smaller than 36 entries.
Also, from the pics from firingsquad and projections from MPR. K7 might have a power consumption approach 40W, the thermo limitation for SLOT1 type design. Since the stress limit of the gate oxide is only 1.6V, the working internal voltage for K7, and it may not like the added juice too much. This would disappoint many overclockers that like to increase voltage.
Hmm...minder. Nog een paar weken en dat weten we het echt...