Beside the dead end of the Netburst philosophy, there was an additional problem for Intel's server processors: severe lack of bandwidth. The E7520 'Lindenhurst' chipset, the previous top-of-the-line model for two sockets, had just a single 800MHz bus, meaning that only 1.6GB/s would be available for each core in a dualcore machine. Two months before Woodcrest was announced, Intel improved this situation significantly with the introduction of the Blackford chipset. The double bus and four memory channels meant that even the first version could deliver 4.3GB/s per core. The gates were opened further for the Woodcrest, by upping the bus frequency from 1066MHz to 1333MHz. This means that today, every core has triple the amount of bandwidth at its disposal as compared to the beginning of this year.
|Paxville||Dempsey||Woodcrest||Socket 940||Socket F|
|Number of busses||1||2||2||-||-|
|Number of channels||2||4||4||4||4|
|Bandwidth per core||1.6GB/s||4.3GB/s||5.3GB/s||3.2GB/s||5.3GB/s|
The table shows that in theory, Woodcrest and Socket F have the same amount of bandwidth. However, in practice, there are differences in effectively available bandwidth. Opterons have a decentralized architecture (NUMA) which means that each processor has two channels for itself and can only access the remaining memory via its neighbour. Internal communication proceeds via a HyperTransport link that delivers 4GB/s in each direction. When, in the worst case, a chip just needs the data 'over on the other side', the effective bandwidth drops to 2GB/s. For this reason, it is crucial that operating systems ensure that threads are given out on such a basis that they end up being executed close to their data, something which is not always easy to achieve.
There are also pitfalls in Intel's system. In every system with multiple sockets it is essential that the processors' caches remain synchronized. After all, a core must not perform computations using cache data that has been altered by another core. There are various ways to guard this so-called 'cache coherence', but all of them share the need for internal communication. AMD has this communication proceed via HyperTransport, so that there is no negative effect on local memory bandwidth, but Intel sends the communication data through the bus, which means that not all bus capacity is available for the memory.
Woodcrest does take the burden out of the coherence traffic to a substantial extent compared to Paxville and Dempsey by having two cores share a single cache. This takes away the need to use the bus for internal communication within a single socket. In the coming 'Clovertown' quadcore, consisting of two dualcores, the two chips in the socket will probably have to communicate via the bus, but at least the chipset will be able to avoid making the processor in the other socket wait for this.
Blackford comes in two flavours, 5000P and 5000V. The latter is a somewhat cheaper version that has some limitations in maximal memory capacity and other features. It supports two channels with a maximum of 32GB, instead of four channels and 64GB. So-called 'memory mirroring', which stores data twice so as to notice mistakes, is supported by the 5000P but not by the 5000V. A similar feature called 'memory RAID' is present in both; this also duplicates data, for the purpose of being able to restore it in case a chip or module fails. Finally, there is a third chipset, the 5000X 'Greencreek'. It has specifications largely overlapping those of the 5000P, but has a PCI Express x16 slot in order to connect a decent video card and a workstation-optimal memory controller. This offers a somewhat higher bandwidth, but that goes at a cost of slightly higher latency.