Andreas Stiller heeft een artikeltje op de c't site neergemikt met een vrij diepe vergelijking van de Athlon en PIII architectuur. Hier heb je een stukje over de FPU performance van de Athlon:
The units of Athlon are reserved for floating point and multimedia thus either for FPU, MMX or 3DNow!. AMD especially optimized the FPU. In the K6 the FPU was the weak point still. It did not allow any pipelining meaning: an instruction always had to wait until the previous one was finished. Therefore the floating-point performance was significantly below Pentium II level. With Athlon things are completely different now. The FPU is 'fully pipelined': after one clock it can already start processing the next FPU instruction. With some restriction this is also true for FMUL and FDIV that are still 'not pipelined' in the Pentium III.On top of that the three units also split the work: one is responsible for regular FPU operations, the second one for multiplication and the third for storing.
AMD also refined the latency times of the FPU - in total the FPU shows up to 50 percent better performance than the one of the Pentium II/III.