Computers with x86 processors come in many shapes and sizes, ranging from ultra-slim notebooks for business folks to neon-pimped desktops for gamers. One of the most excessive members of this family is the Sun Fire X4600, a server that can accommodate up to eight dual-core Opterons in its casing. Compared to this machine, all the gear that we tested earlier looks like a bunch of toys. But is it possible to use sixteen cores effectively? And if so, do things run smoothly enough to justify a price tag of more than 35,000 euros? These are the questions we shall be trying to answer in this review.
Each quarter, some 1.8 million x86 servers are sold worldwide. About 95% of these are of the standard type which we have tested before, with one or two processors on board. They get used as web, mail, file, print or proxy/firewall boxes, but also for (light) database and application work. Many of the smaller companies will never need more than that. Out of the remaining 90,000 or so servers, the great majority has four sockets. Usually, these machines are used for running groupware, ERP, CRM and other 'enterprise' like software, which many hundreds to a few thousands of people work with on a daily basis. Only about 1500 machines remain that have eight or more processors, which are used for the heaviest and/or most critical tasks. In this segment, x86 competes with chips such as the Itanium, Power and UltraSparc, which have been specifically designed for the most demanding applications.
Although the number of heavy-duty x86 servers that get sold is relatively small, the same rule applies for servers as in other markets: the more expensive the model, the bigger the margin. Usually, the purchase of a powerful server goes hand in hand with a substantial storage and/or backup system, software and service packages or other services such as consultancy. Consequently, selling such systems brings in more money than one might expect purely on the basis of the number of machines sold. More than half of the total server revenue (i.e. x86, Itanium RISC put together) is made in the segment with four or more processors, which comprises only fifteen percent of the market. Note that we are just talking about the hardware: software and services must still be added.
In just a few years, the Opteron has pinched a considerable amount of market share off from the Xeon's once dominant position. AMD managed to cause most damage across the segment of servers with four or more processors. Worldwide, it commands more than forty percent of this market; in the US the figure is even higher and stands at more than fifty percent – contrasting with the Opteron's overall share of a quarter. The reason for this is that the Opteron can seamlessly scale from two to four processors. Sockets can be tied together with integrated HyperTransport links, and since each chip has its own memory controller, there is always a sufficient supply of bandwidth. Intel's Xeons, on the other hand, can only cooperate via an expensive chipset and have to share limited amounts of bandwidth. At the moment, a system with four Xeons has only 12.8GB/s at its disposal, compared with 42.7GB for a four-way Socket F Opteron. And, believe it or not, this is an enormous improvement compared with the old Xeon chipset, which AMD
fought against crushed with ease in the first two years of its server adventure.
|Xeon MP (2002-2005)||3.2|
|Xeon MP (2005-2007)||12.8|
|Xeon MP (2007-...)||34.1|
A few remarks must be made concerning both sides of the battlefield. First of all, AMD's HyperTransport links cannot scale up indefinitely, after all, a simple 'broadcast' protocol is used to keep the caches of the various cores in sync. This means that each processor is continuously talking to every other processor, even if they are working on completely unrelated tasks. This mutual chatter causes delays since a core can do nothing but sit and wait with a piece of data until all the other cores have confirmed that no change has been applied to it. The more chips (or better, the greater the longest distance in the network), the higher the latency will get. Although the influence of this is minor in four-socket systems, there are benchmarks for eight sockets where the effect is clearly noticeable.
The Xeon, meanwhile, does not need to be as limb as it is with Intel's own chipset: with the help of IBM's X3 'Hurricane', up to 32 Xeons can cooperate quite effectively. This is achieved by building a network between the processors at the chipset level – something which is built-in in the Opteron's architecture. The difference is that the X3 chipset manages things somewhat smarter, and won't let two processors exchange small talk when they are not working on the same data. An attempt by Newisys to make such a filtering chipset for Opteron unfortunately never made it to market, but rumour now has it that AMD is planning to integrate this technique into the processor.