Hoofdcategorieën

Database test: Intel Xeon 'Clovertown' X5355

Door Redactie Tweakers.net, donderdag 14 december 2006 15:10, views: 45.294

Dit artikel is ook in het Nederlands beschikbaar.

On November 14 Intel introduced its first quad-core server chips in the form of the Xeon 5300 series, also known under the code name Clovertown. Although the chip is technically pretty much identical to the desktop version which we looked at earlier, it runs on a different platform and is meant for other tasks. In this article we look at the influence of the step from dual-core to quad-core on Intel's position and how Clovertown holds up in our database test.

* Old and new

Since the Xeon 5300 'Clovertown' is, with the exception of a faster bus, identical to the Core 2 Extreme QX6700 'Kentsfield' - whose architecture we looked at extensively in this review - this article has a somewhat more light-hearted introduction. In the table below, the new quad-core is compared to Intel's very first microprocessor, the 4004. This processor was in the spotlight again recently, because it was 35 years ago on November 13 that the chip hit the market.

 4004ClovertownDifference
Introduction1971200635 years later
Price~$935*$117225% more expensive
Cores144x as much
Instructions per clock tick1164x as much per core
Clock speed108kHz2.66GHz24.691x higher
Bus frequency108kHz1333MHz12.346x higher
Bandwidth54KB/s10.6GB/s197.530x more
Production technique10000nm65nm154x smaller
Wafer size50mm300mm36x larger surface
Transistors2300582 million63.261x more per core
Physical size 12mm²286mm²6x larger per core
Contact points1677148x more
Registers4 bit64 bit16x wider
Address space640KB16EBUnimaginable
Instructions46~70015x as many
Consumption1W120W30x higher per core
Voltage15V1.34V>90% lower
Power0,07A90A1286x stronger
* Corrected for inflation - original price: $200. Collectors are paying $1000 for it.

Intel 4004The difference is humongous: with its clock of a tenth of a megahertz, the 4004 could perform a little over 100,000 instructions per second, while Clovertown can manage almost 43 billion in the same time span. To equal the theoretical computing power of a quad-core Xeon, 395.061 4004's would be needed, and then we are not even recognising the fact that one does its math with 4 bits while the other uses 64 bits. We'll save Intel's first-born from a crushing comparison in terms of performance per Watt and per dollar, but it should be clear that the small steps that we have seen over the years, exhibit extreme differences taken across a few decades.

Summer of servers, SPEC CPU

Last summer was the 'summer of servers' according to Intel, since it renewed its complete offering of server processors. It all started on May 23 with Dempsey, a 65nm dual-core Netburst Xeon that was launched along with a new chipset, with more than twice the amount of bandwidth as the previous generation. A little over a month later – on June 26 to be precise - Woodcrest was released, based on the new Core architecture and busses that were 25% quicker. July 18 saw the launch of Itanium, with the introduction of the dual-core Montecito, a chip with 24MB of L3 cache and over 1,7 billion transistors. Another month went by and on August 29 a new Xeon MP came out under the code name Tulsa, with 16MB of L3 cache. Clovertown – the fist quad-core Xeon – was delivered on November 14 as an afterbirth.

Intel multi-core processors

Intel has put in extra hours in the second half of the past year to overtake the competition. All that AMD managed to do to counter that was the introduction of the Socket F models (which, except for the 2.8GHz flavour did little to increase performance), and a price cut. Although that does not mean that AMD did badly in terms of contracts it won and market share – quite the contrary – it does mean that Intel has a lot more weight than it had for a long time. Although the quad-core Opteron Barcelona might tip the balance again in six months, Intel is ahead in many benchmarks at the moment.

Before we turn to our own comparative database test, we shall show the current state of affairs in database country. For completeness: although the 'non-tweakers.net'-scores that have been collected below are practical scores, they have been measured independently This means that there may be differences in a number of parameters of the various system configurations. It also means that better results are not always fully due to hardware improvements: after all, the software that comes with it also tends to get improved. But it is safe to assume that each system builder does his best to achieve the best results, and with this in mind the scores may be considered indicative.

* SPEC CPU

We start with SPECint_rate and SPECfp_rate, two benchmarks designed to measure raw processor performance. The 'INT' (integer) suite consists of a compiler, chess programs, compression and text processing, while the 'FP' (floating point) suite contains, among others, face recognition, neural networks and physics as well as chemistry simulations. All subtests are based on software that is also used 'in real life', but the code has been altered in certain spots to minimize, among other things, hard disk load and improve portability to other platforms.

The addition 'rate' indicates that we are not dealing with a test of a single chip, but of all cores in a system simultaneously, which makes bandwidth a significant factor. 'Peak' indicates that the compiler may be tuned to the maximum extent. A standard run demands that everything is built with the same setting, but for a 'peak' run every individual test may have its own parameters. Since Clovertown-servers have a unique configuration of eight cores in two socket, we compare it to systems that have eight cores in four sockets, as well as configurations with four cores in two sockets. In the integer benchmark we see that the new Xeon beats the competition: the Core architecture already proved itself to be good at these sort of tasks, but improving the best scores of four socket systems is an impressive achievement.

SPECint_rate_peak2000
Opteron 822042,8GHzSanta Rosa 175
Xeon MP 714043,4GHzTulsa 164
Power541,9GHz 147
Itanium 2 905041,6GHzMontecito 134
Xeon MP 704143,0GHzPaxville 114
[*] Xeon X535522,66GHzClovertown 200
Xeon 516023,0GHzWoodcrest 123
Opteron 228022,8GHzSanta Rosa 90,3
Power5+ 22,1GHz 90
Xeon 508023,73GHzDempsey 82,8
Xeon DC22,8GHzPaxville 59,9

In the FP benchmark it turns out that the quad-core doesn't do so well, which is presumably due to its limited bandwidth – which is something that SPECfp_rate can't get enough of. But we do see that Intel has made steady improvements over the course of the year, from a meagre 40.3 with the Paxville to a respectable score of 104. That is still lower than the Opteron (although it would be a narrow victory without compiler magic) but the problem is that AMD still has the step to quad-core architecture and 128 bit computation units ahead, while Clovertown has already been there. The Itanium can keep up, but this test remains one of AMD's favourites.

SPECfp_rate_peak2000
Power541,9GHz 249
Itanium 2 905041,6GHzMontecito 244
Opteron 822042,8GHzSanta Rosa 178
Xeon MP 714043,4GHzTulsa 110
Xeon MP 704143,0GHzPaxville 67,3
Power5+22,1GHz 149
Itanium 2 905021,6GHzMontecito 123
Opteron 222022,8GHzSanta Rosa 119
[*] Xeon X535522,66GHzClovertown 104
Xeon 516023,0GHzWoodcrest 85,9
Xeon 508023,73GHzDempsey 66,5
Xeon DC22,8GHzPaxville 40,3

TPC-C, SAP-SD, and SPECjbb2005

SPEC CPU is an interesting test for all types of processor tasks, but there's more to life than pure maths. A common server task is running some type of database, and TPC-C is a benchmark that is often used as a measure for database performance. The test simulates the business processes of a distributor with multiple offices and hundreds of thousands of customers and products. Performance is measured in the number of transaction per minute. Since the test is sensitive to differences in the speed of the storage systems used as well as memory, a price/performance rating is given to make clear when a manufacturer ups his score artificially by investing a couple of million in hard disks. What isn't reflected in the score is the operating systems and database packages used; these may be chosen freely. Itanium and Power get their best scores running Oracle under HP-UX and running DB2 under AIX, while the x86 chips get tested pretty much exclusively running SQL Server under Windows.

It turns out that Clovertown has the best performance as well as the best price/performance ratio of all two socket servers, but it does not measure up against the big boys with four. The Opteron does not convince at first sight, but whoever takes the absolute prices into consideration instead of performance per dollar, will note that the AMD machines are the cheapest. Incidentally, the same version of SQL Server (64 bit) was used to obtain the Opteron and Clovertown scores. Woodcrest, Dempsey, and Tulsa lack SP1, which might put them at a slight disadvantage.

TPC-C
Power541,9GHz 429900 @ $4,99
Itanium 2 905041,6GHzMontecito 359440 @ $1,99
Xeon MP 7140 43,4GHzTulsa 318407 @ $1,88
Opteron 822042,8GHzSanta Rosa 262989 @ $2,09
Xeon MP 704143,0GHzPaxville 221017 @ $8,27
[*] Xeon X535522,66GHzClovertown 240737 @ $1,85
Itanium 2 905021,6GHzMontecito 230569 @ $2,63
Power521,9GHz 203440 @ $3,93
Xeon 516023,0GHzWoodcrest 169360 @ $2,93
Opteron 222022,8GHzSanta Rosa 139693 @ $2,28
Xeon 508023,73GHzDemspey 125954

SAP-SD (Sales & Distribution) is a similar benchmark in that it also simulates business processes, but rather than transactions per minute, it is the number of users that can work on the system simultaneously, that is given as the final score. As the name indicates, this test is aimed more at the well known ERP package than on the underlying database, but that is not to say that it is not sensitive to factors outside the processor.
Unfortunately, prices may not be listed along with the test results. What we can do instead is check the CPU loads to see whether the system was pushed to its limit. In virtually all cases the load is above 95%, with the exception of the four-way Paxville. This means that the processor was a limiting factor in the rest of the results, and there would be little point in adding extra hard disks or memory. Here, too, we see Intel's come-back: the performance for two sockets has been more than doubled and for four has increased by more than half - enough to overtake the Opteron in both cases.

SAP-SD 2-tier
Itanium 2 905041,6GHzMontecito 2150
Xeon MP 714043,4GHzTulsa 2127
Opteron 228042,8GHzSanta Rosa 1978
Xeon MP 704143,0GHzPaxville 1345
[*] Xeon X535522,66GHzClovertown 1806
Xeon 516023,0GHzWoodcrest 1285
Opteron 221822,6GHzSanta Rosa 1047
Xeon 508023,73GHzDemspey 1047
Xeon DC22,8GHzPaxville 788

The last business benchmark is SPECjbb2005, that simulates a triple layer architecture which emphasizes the middle one, the so-called 'business logic', in which all XML processing and such takes place. The test was written entirely in Java, so the performance of the virtual machine is at least as important as that of the software itself. The score is expressed in BOPS - business operations per second. And again, Clovertown dominates with a score that even four-socket systems cannot get close to.

SPECjbb2005
Xeon MP 714043,4GHzTulsa 178201
Opteron 822042,8GHzSanta Rosa 143525
Itanium 2 905041,6GHzMontecito 138382
Power5+41,65GHz 127851
[*] Xeon X535522,66GHzClovertown 210065
Xeon 516023,0GHzWoodcrest 130589
Opteron 222022,8GHzSanta Rosa 80617
Xeon 508023,73GHzDempsey 64482
Power5+21,65GHz 63544
Xeon DC22,8GHzPaxville 49233
UltraSparc T111,2GHzNiagara 74365

SPECweb2005 and summery

The last benchmark that we look at here is SPECweb2005, a test that looks at the performance of machines when put to use as web servers, and examines dynamical page creation (in PHP or JSP), encryption (SSL connection) and web services. The result is a weighed average of the number of simultaneous sessions, in three scenarios: banking, shopping and support. The good performance of Niagara is noteworthy here: its performance is comparable to that of a dual Woodcrest or even a quad Tulsa. It cannot keep up with a double Clovertown or quadruple Opteron though.

SPECweb2005
Opteron 822042,8GHzSanta Rosa 20235
Xeon MP 714043,4GHzTulsa 14896
[*] Xeon X535522,66GHzClovertown 18160
Xeon 516023,0GHzWoodcrest 13257
Opteron 28522,6GHzItaly 11293
Power5+ 21,9GHz 7881
Xeon 508023,73GHzDempsey 6400
Xeon DC22,8GHzPaxville 5597
UltraSparc T111,2GHzNiagara 14001

In summary, we can conclude that Intel has made some significant gains during the past year. In the two socket segment we have seen average gains of 24% over the Opteron for the Woodcrest. The only test that Intel doesn't win is SPECfp_rate, but it has managed to bring down the difference considerably. Following Clovertown's introduction, the average performance difference between Xeon and Opteron went up to as much as 79%. Although AMD is certain to reduce that gap with the introduction of its own quad-core, it is at least half a year before that is released. For the time being, Intel can continue to improve its once-tarnished reputation and increase its market share, while the best AMD can do is reduce the damage with lower prices.

In the four-way server segment the story is less clear. Intel has gone from being hopelessly behind to a more or less equal performance, winning some tests and losing certain others. AMD is still ahead in terms of price and power consumption. Although Tulsa may be just about good enough to prevent further erosion of the market share, Intel will have to come up with something more convincing to turn the tide, especially since Opteron is also getting four cores next year. Intel has put its hope on Tigerton in order to fight off the AMD's new generation.

4-wayOpteronTulsaDifference
SPECint_rate_peak2000175164-6% up
SPECfp_rate_peak2000178110-38% up
TPC-C26298931840721% up