The re-encoding of MPEG2 video images to the MPEG4 DivX format is where the Pentium 4 really shines, and shows what you can do with the bandwidth of RDRAM. The nice thing of FlaskMPEG is that you can use different encoding methods, so that you can see the effects of optimizations such as MMX and SSE2. We see that the average performance in fps climbs 5% when using Foster. The improved bandwidth of the i860 Colusa chipset will probably account for the biggest part of this difference. Remarkably, encoding with SSE2 at low quality is far slower than encoding at medium or high quality. We think this is a bug in Flask, as the same thing happened on the P4 system. Below you’ll find a table with the results and a screenshot of the record:
![]() |
Now that we know that Foster has a higher memory bandwidth available, thanks to the 860 chipset, and can outperform the Pentium 4 with Jackson Technology, we can look at a third method that can be used to further improve the performance of the chip; optimizations. Intel always said that software developed especially for the Pentium 4 will run significantly better. With ‘especially developed’, we are, in this case, not referring to SSE2 routines, but simple compiler settings. It takes little effort to use them, and they can generate code that's far more optimized for the specific strongholds of a processor.
Intel have been working for some time on version 5 of their compilers and performance analyser. The new software offers, alongside special options for the Pentium 4, support for the IA-64 architecture. Tim Wilkens, the creator of ScienceMark, was very kind to provide us some new special versions of his benchmark, created with the latest beta-version of this new compiler. ScienceMark is, as the name implies, a scientific benchmark. We can't explain what it does exactly with virtual liquid Argon molecules, but that isn’t the most important thing. The fact that a 1.5GHz Pentium 4 takes almost 2 minutes to finish calculating says enough about the complexity of this software. The optimized executables surprised us, positively:
![]() |
At the left you can see the result of the normal executable, which is free for download at the ScienceMark site. QxW and QaxW are optimized versions, compiled with the latest beta if the Intel 5.0 compiler. Despite the fact that the source code of the three programs is exactly the same, and they’re essentially doing exactly the same job, the optimized versions finish almost 2 times as fast as the standard one. A performance gain of almost 100% caused by a simple recompile promises a lot for the future of the Pentium 4, but it’s not all roses there:
![]() |
This has to hurt. Despite the fantastic result in one test, the other one has to suffer badly. The effect is actually turned around here. QxW and QaxW are almost two times as slow as the original version, which has been optimized for a Pentium III. The explanation for this is that ScienceMark is a very strange program to a compiler, including instructions that aren't used very often. Tim Wilkens is working with Intel on improving the performance and fixing bugs in the new ScienceMark.
While a recent version is already better than the one we had at the time of testing, only 2 of the 4 important test in ScienceMark could be compiled in the ‘optimized’ version, which causes the differences in the end result that you can see below:
![]() * = Absent tests cause a lower score |
The compiler is still far from finished, but it has shown us some impressive results. While everyday applications won’t easily benefit from the new optimizations as much as ScienceMark does, the difference should still be big enough. People that are afraid that the new optimizations will slow down their Pentium III or Athlon systems don’t have to be worried; while QxW immediately crashes on these systems, QaxW shows the same performance effects as it does on the P4, only a little less extreme. Moreover, the compiler has the possibility of optimizing code for the Athlon processor.