Door Redactie Tweakers

Database test: Intel Xeon 'Clovertown' X5355

14-12-2006 • 15:10

0

Multipage-opmaak

Introduction

Dit artikel is ook in het Nederlands beschikbaar.

On November 14 Intel introduced its first quad-core server chips in the form of the Xeon 5300 series, also known under the code name Clovertown. Although the chip is technically pretty much identical to the desktop version which we looked at earlier, it runs on a different platform and is meant for other tasks. In this article we look at the influence of the step from dual-core to quad-core on Intel's position and how Clovertown holds up in our database test.

* Old and new

Since the Xeon 5300 'Clovertown' is, with the exception of a faster bus, identical to the Core 2 Extreme QX6700 'Kentsfield' - whose architecture we looked at extensively in this review - this article has a somewhat more light-hearted introduction. In the table below, the new quad-core is compared to Intel's very first microprocessor, the 4004. This processor was in the spotlight again recently, because it was 35 years ago on November 13 that the chip hit the market.

4004ClovertownDifference
Introduction1971200635 years later
Price~$935*$117225% more expensive
Cores144x as much
Instructions per clock tick1164x as much per core
Clock speed108kHz2.66GHz24.691x higher
Bus frequency108kHz1333MHz12.346x higher
Bandwidth54KB/s10.6GB/s197.530x more
Production technique10000nm65nm154x smaller
Wafer size50mm300mm36x larger surface
Transistors2300582 million63.261x more per core
Physical size 12mm²286mm²6x larger per core
Contact points1677148x more
Registers4 bit64 bit16x wider
Address space640KB16EBUnimaginable
Instructions46~70015x as many
Consumption1W120W30x higher per core
Voltage15V1.34V>90% lower
Power0,07A90A1286x stronger
* Corrected for inflation - original price: $200. Collectors are paying $1000 for it.

Intel 4004The difference is humongous: with its clock of a tenth of a megahertz, the 4004 could perform a little over 100,000 instructions per second, while Clovertown can manage almost 43 billion in the same time span. To equal the theoretical computing power of a quad-core Xeon, 395.061 4004's would be needed, and then we are not even recognising the fact that one does its math with 4 bits while the other uses 64 bits. We'll save Intel's first-born from a crushing comparison in terms of performance per Watt and per dollar, but it should be clear that the small steps that we have seen over the years, exhibit extreme differences taken across a few decades.

Summer of servers, SPEC CPU

Last summer was the 'summer of servers' according to Intel, since it renewed its complete offering of server processors. It all started on May 23 with Dempsey, a 65nm dual-core Netburst Xeon that was launched along with a new chipset, with more than twice the amount of bandwidth as the previous generation. A little over a month later – on June 26 to be precise - Woodcrest was released, based on the new Core architecture and busses that were 25% quicker. July 18 saw the launch of Itanium, with the introduction of the dual-core Montecito, a chip with 24MB of L3 cache and over 1,7 billion transistors. Another month went by and on August 29 a new Xeon MP came out under the code name Tulsa, with 16MB of L3 cache. Clovertown – the fist quad-core Xeon – was delivered on November 14 as an afterbirth.

Intel multi-core processors

Intel has put in extra hours in the second half of the past year to overtake the competition. All that AMD managed to do to counter that was the introduction of the Socket F models (which, except for the 2.8GHz flavour did little to increase performance), and a price cut. Although that does not mean that AMD did badly in terms of contracts it won and market share – quite the contrary – it does mean that Intel has a lot more weight than it had for a long time. Although the quad-core Opteron Barcelona might tip the balance again in six months, Intel is ahead in many benchmarks at the moment.

Before we turn to our own comparative database test, we shall show the current state of affairs in database country. For completeness: although the 'non-tweakers.net'-scores that have been collected below are practical scores, they have been measured independently This means that there may be differences in a number of parameters of the various system configurations. It also means that better results are not always fully due to hardware improvements: after all, the software that comes with it also tends to get improved. But it is safe to assume that each system builder does his best to achieve the best results, and with this in mind the scores may be considered indicative.

* SPEC CPU

We start with SPECint_rate and SPECfp_rate, two benchmarks designed to measure raw processor performance. The 'INT' (integer) suite consists of a compiler, chess programs, compression and text processing, while the 'FP' (floating point) suite contains, among others, face recognition, neural networks and physics as well as chemistry simulations. All subtests are based on software that is also used 'in real life', but the code has been altered in certain spots to minimize, among other things, hard disk load and improve portability to other platforms.

The addition 'rate' indicates that we are not dealing with a test of a single chip, but of all cores in a system simultaneously, which makes bandwidth a significant factor. 'Peak' indicates that the compiler may be tuned to the maximum extent. A standard run demands that everything is built with the same setting, but for a 'peak' run every individual test may have its own parameters. Since Clovertown-servers have a unique configuration of eight cores in two socket, we compare it to systems that have eight cores in four sockets, as well as configurations with four cores in two sockets. In the integer benchmark we see that the new Xeon beats the competition: the Core architecture already proved itself to be good at these sort of tasks, but improving the best scores of four socket systems is an impressive achievement.

SPECint_rate_peak2000
Opteron 822042,8GHzSanta Rosa 175
Xeon MP 714043,4GHzTulsa 164
Power541,9GHz 147
Itanium 2 905041,6GHzMontecito 134
Xeon MP 704143,0GHzPaxville 114
[*] Xeon X535522,66GHzClovertown 200
Xeon 516023,0GHzWoodcrest 123
Opteron 228022,8GHzSanta Rosa 90,3
Power5+ 22,1GHz 90
Xeon 508023,73GHzDempsey 82,8
Xeon DC22,8GHzPaxville 59,9

In the FP benchmark it turns out that the quad-core doesn't do so well, which is presumably due to its limited bandwidth – which is something that SPECfp_rate can't get enough of. But we do see that Intel has made steady improvements over the course of the year, from a meagre 40.3 with the Paxville to a respectable score of 104. That is still lower than the Opteron (although it would be a narrow victory without compiler magic) but the problem is that AMD still has the step to quad-core architecture and 128 bit computation units ahead, while Clovertown has already been there. The Itanium can keep up, but this test remains one of AMD's favourites.

SPECfp_rate_peak2000
Power541,9GHz 249
Itanium 2 905041,6GHzMontecito 244
Opteron 822042,8GHzSanta Rosa 178
Xeon MP 714043,4GHzTulsa 110
Xeon MP 704143,0GHzPaxville 67,3
Power5+22,1GHz 149
Itanium 2 905021,6GHzMontecito 123
Opteron 222022,8GHzSanta Rosa 119
[*] Xeon X535522,66GHzClovertown 104
Xeon 516023,0GHzWoodcrest 85,9
Xeon 508023,73GHzDempsey 66,5
Xeon DC22,8GHzPaxville 40,3

TPC-C, SAP-SD, and SPECjbb2005

SPEC CPU is an interesting test for all types of processor tasks, but there's more to life than pure maths. A common server task is running some type of database, and TPC-C is a benchmark that is often used as a measure for database performance. The test simulates the business processes of a distributor with multiple offices and hundreds of thousands of customers and products. Performance is measured in the number of transaction per minute. Since the test is sensitive to differences in the speed of the storage systems used as well as memory, a price/performance rating is given to make clear when a manufacturer ups his score artificially by investing a couple of million in hard disks. What isn't reflected in the score is the operating systems and database packages used; these may be chosen freely. Itanium and Power get their best scores running Oracle under HP-UX and running DB2 under AIX, while the x86 chips get tested pretty much exclusively running SQL Server under Windows.

It turns out that Clovertown has the best performance as well as the best price/performance ratio of all two socket servers, but it does not measure up against the big boys with four. The Opteron does not convince at first sight, but whoever takes the absolute prices into consideration instead of performance per dollar, will note that the AMD machines are the cheapest. Incidentally, the same version of SQL Server (64 bit) was used to obtain the Opteron and Clovertown scores. Woodcrest, Dempsey, and Tulsa lack SP1, which might put them at a slight disadvantage.

TPC-C
Power541,9GHz 429900 @ $4,99
Itanium 2 905041,6GHzMontecito 359440 @ $1,99
Xeon MP 7140 43,4GHzTulsa 318407 @ $1,88
Opteron 822042,8GHzSanta Rosa 262989 @ $2,09
Xeon MP 704143,0GHzPaxville 221017 @ $8,27
[*] Xeon X535522,66GHzClovertown 240737 @ $1,85
Itanium 2 905021,6GHzMontecito 230569 @ $2,63
Power521,9GHz 203440 @ $3,93
Xeon 516023,0GHzWoodcrest 169360 @ $2,93
Opteron 222022,8GHzSanta Rosa 139693 @ $2,28
Xeon 508023,73GHzDemspey 125954

SAP-SD (Sales & Distribution) is a similar benchmark in that it also simulates business processes, but rather than transactions per minute, it is the number of users that can work on the system simultaneously, that is given as the final score. As the name indicates, this test is aimed more at the well known ERP package than on the underlying database, but that is not to say that it is not sensitive to factors outside the processor.
Unfortunately, prices may not be listed along with the test results. What we can do instead is check the CPU loads to see whether the system was pushed to its limit. In virtually all cases the load is above 95%, with the exception of the four-way Paxville. This means that the processor was a limiting factor in the rest of the results, and there would be little point in adding extra hard disks or memory. Here, too, we see Intel's come-back: the performance for two sockets has been more than doubled and for four has increased by more than half - enough to overtake the Opteron in both cases.

SAP-SD 2-tier
Itanium 2 905041,6GHzMontecito 2150
Xeon MP 714043,4GHzTulsa 2127
Opteron 228042,8GHzSanta Rosa 1978
Xeon MP 704143,0GHzPaxville 1345
[*] Xeon X535522,66GHzClovertown 1806
Xeon 516023,0GHzWoodcrest 1285
Opteron 221822,6GHzSanta Rosa 1047
Xeon 508023,73GHzDemspey 1047
Xeon DC22,8GHzPaxville 788

The last business benchmark is SPECjbb2005, that simulates a triple layer architecture which emphasizes the middle one, the so-called 'business logic', in which all XML processing and such takes place. The test was written entirely in Java, so the performance of the virtual machine is at least as important as that of the software itself. The score is expressed in BOPS - business operations per second. And again, Clovertown dominates with a score that even four-socket systems cannot get close to.

SPECjbb2005
Xeon MP 714043,4GHzTulsa 178201
Opteron 822042,8GHzSanta Rosa 143525
Itanium 2 905041,6GHzMontecito 138382
Power5+41,65GHz 127851
[*] Xeon X535522,66GHzClovertown 210065
Xeon 516023,0GHzWoodcrest 130589
Opteron 222022,8GHzSanta Rosa 80617
Xeon 508023,73GHzDempsey 64482
Power5+21,65GHz 63544
Xeon DC22,8GHzPaxville 49233
UltraSparc T111,2GHzNiagara 74365

SPECweb2005 and summery

The last benchmark that we look at here is SPECweb2005, a test that looks at the performance of machines when put to use as web servers, and examines dynamical page creation (in PHP or JSP), encryption (SSL connection) and web services. The result is a weighed average of the number of simultaneous sessions, in three scenarios: banking, shopping and support. The good performance of Niagara is noteworthy here: its performance is comparable to that of a dual Woodcrest or even a quad Tulsa. It cannot keep up with a double Clovertown or quadruple Opteron though.

SPECweb2005
Opteron 8220 4 2,8GHz Santa Rosa 20235
Xeon MP 7140 4 3,4GHz Tulsa 14896
[*] Xeon X5355 2 2,66GHz Clovertown 18160
Xeon 5160 2 3,0GHz Woodcrest 13257
Opteron 285 2 2,6GHz Italy 11293
Power5+ 2 1,9GHz 7881
Xeon 5080 2 3,73GHz Dempsey 6400
Xeon DC 2 2,8GHz Paxville 5597
UltraSparc T1 1 1,2GHz Niagara 14001

In summary, we can conclude that Intel has made some significant gains during the past year. In the two socket segment we have seen average gains of 24% over the Opteron for the Woodcrest. The only test that Intel doesn't win is SPECfp_rate, but it has managed to bring down the difference considerably. Following Clovertown's introduction, the average performance difference between Xeon and Opteron went up to as much as 79%. Although AMD is certain to reduce that gap with the introduction of its own quad-core, it is at least half a year before that is released. For the time being, Intel can continue to improve its once-tarnished reputation and increase its market share, while the best AMD can do is reduce the damage with lower prices.

In the four-way server segment the story is less clear. Intel has gone from being hopelessly behind to a more or less equal performance, winning some tests and losing certain others. AMD is still ahead in terms of price and power consumption. Although Tulsa may be just about good enough to prevent further erosion of the market share, Intel will have to come up with something more convincing to turn the tide, especially since Opteron is also getting four cores next year. Intel has put its hope on Tigerton in order to fight off the AMD's new generation.

4-wayOpteronTulsaDifference
SPECint_rate_peak2000 175 164 -6% --> --> --> --> -->
SPECfp_rate_peak2000 178 110 -38% up
TPC-C 262989 318407 21% --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> --> -->
SAP-SD 1978 2127 8% up
SPECjbb2005 143525 178201 24% up
SPECweb2005 20235 14896 -26% up
Average -3% up
2-wayOpteronClovertownVerschil
SPECint_rate_peak2000 90.3 200 121% up
SPECfp_rate_peak2000 119 104 -13% up
TPC-C 139693 240737 72% up
SAP-SD 1047 1806 72% up
SPECjbb2005 80617 210065 161% up
SPECweb2005 11293 18160 61% up
Average 79% up
2-wayOpteronWoodcrestVerschil
SPECint_rate_peak2000 90.3 123 36% up
SPECfp_rate_peak2000 119 85.9 -18% up
TPC-C 139693 169360 21% up
SAP-SD 1047 1285 23% up
SPECjbb2005 80617 130589 63% up
SPECweb2005 11293 13257 17% up
Average 24% up
2-wayWoodcrestClovertownVerschil
SPECint_rate_peak2000 123 200 63% up
SPECfp_rate_peak2000 85.9 104 21% up
TPC-C 169360 240737 42% up
SAP-SD 1285 1806 41% up
SPECjbb2005 130589 210065 61% up
SPECweb2005 13257 18160 37% up
Average 44% up

Test hardware

Now that we have taken a step back to look at the overall situation, it is time to return to our own reality: the Tweakers.net database. For the purpose of this review, Melrow supplied us with a server with two Xeon X5355 processors and 8GB of FBD667 memory. The machine is a QX208SATA-G3, that has been fitted with Intel's own S5000PSL motherboard. Beside two Xeons, this 2U rack mount has, as one of its standard features, room for six SATA disks in RAID0, RAID1 or RAID10 (upgradeable with an extra controller to eight disks and RAID5), double gigabit ethernet, a DVD drive and a single 500W power supply. The fact that there is no redundant power supply makes it somewhat more sensitive to power failure than some of the other machines that we have tested, but the stability of Intel's server board tends to be good.

Melrow dual quad-core Clovertown - Server
Melrow dual quad-core Clovertown - Processors and memory

The X5355 that we received is the top model of the Clovertown series, with a clock speed of 2.66GHz, a 120W TDP and a 1172 dollar price tag. At the moment, there are three cheaper models in circulation with lower clock speed and a more modest TDP of 80W. Two more versions are expected early next year, including a low voltage model which needs only 50W, in other words 12.5W per core. Below, we give an overview of the various quad-cores and a comparison of their prices as opposed to those in the Woodcrest and Opteron hemispheres:

ModelClockBusTDPPriceIntro
X53552,66GHz1333MHz120W$1172-
E53452,33GHz1333MHz80W$851-
E53352,0GHz1333MHz80W?Q1
E53201,86GHz1066MHz80W$690-
E53101,6GHz1066MHz80W$455-
L53101,6GHz1066MHz50W?Q1
OpteronWoodcrestClovertown
1,8GHz$209 1,6GHz$209
2,0GHz$2551,86GHz$256
2,0GHz$316
2,2GHz$377
2,4GHz$4502,33GHz$4551,6GHz$455
2,6GHz$611
2,66GHz$6901,86GHz$690
2,8GHz$786
3,0GHz$8512,33GHz$851
2,66GHz$1172

Clovertown is clearly the most expensive chip in our test arsenal. In this article we compare it to Intel's top-of-the-line dual-core - the 3.0GHz Woodcrest - because that CPU comes closest in terms of pricing.

Melrow dual quad-core Clovertown - Processor & socket

MySQL 5.0.20a and 5.0.32-bk

The first test case is MySQL 5.0.20a, the same version that we used in all previous reviews. The upscaling step from one to two cores does not reveal anything special. The lower clock speed does make Clovertown a little slower than Woodcrest, the top model, but does not exhibit any unusual behaviour. The step from a single dual-core to a double dual-core also isn’t too remarkable as compared to earlier experiences. When we pit two dual-cores against a single quad, we see that things begin to run somewhat less smoothly. Since only one of the two busses can be used, the available amount of bandwidth has been effectively cut in half. This makes performance go down by about 5% under a heavy load of about 25+ users. The final step to eight cores turns out to be dreadful: MySQL does not understand what is going on any more and the performance diminishes below that of two dual-cores, and finally collapse altogether. On average, the loss in performance is 22% under heavy loads.

Clovertown vs. Woodcrest - MySQL 5.0.20a

This is not the first time we saw MySQL exhibit this sort of behaviour: in the article on the Sun T2000 we reported the same problem. MySQL developers are aware of the problem but when we searched for a solution with Sun at the time, nothing could be done about it yet. For this test, we went out looking for a solution once more, and it turned out that a patch is being readied. It hasn't been officially released, but a MySQL 5.0.32 snapshot from BitKeeper – the system that developers use to manage their source code - allowed us a sneak preview how things stand.

The latest MySQL developments are interesting: it appears as if part of the performance with a low number of cores has been traded for stability when more cores are used. On average, using a single core costs us around 10% of the performance as compared to version 5.0.20a, and with two cores this is down to 4%. At four cores, however, we register a gain of 1% and with eight cores things go 36% better. Also, the peak of 616 requests per second for 5.0.32-bk is clearly better than the 498 that we could squeeze out of 5.0.20a. But still, it turns out that MySQL does not fancy heavy loads too much: after the peak the number of requests per second drops sharply again.

MySQL 5.0.20a vs. 5.0.32-bk

PostgreSQL 8.2-dev

PostgreSQL tends to be the good boy in the classroom in our tests when it comes to scaling behaviour, and this review is no exception. After the messy diagrams of MySQL it is a relief to see the straight lines of PostgreSQL that run according to what might be expected: up to four cores, the 2.66GHz Clovertown is a little slower than the 3.0GHz Woodcrest, and we see that the performance of a single quad-core is just a little below that of a double quad-core. However, pitted against two quad-cores, Woodcrest doesn't hold up and we break the 750 requests per second barrier for the first time. On average, heavy loads make Melrow's Clovertown machine 19% faster than the current database server of our forum, GoT. Although the result is clearly better than those achieved with MySQL, it is a little meagre when the price difference is taken into consideration, as well the higher TDP and the average gains that we saw in other benchmarks.

Clovertown vs. Woodcrest - PostgreSQL 8.2-dev

Fully loaded, the server consumes 355 Watts. On average, the use of PostgreSQL allowed 448.726 pages to be served in ten minutes, making for a performance/Watt ratio of 1264. Unfortunately, there's nothing out there yet to compare this to. The results we published earlier were obtained using an older version of the Linux kernel, while the new one is clearly better. A coarse estimate based on figures that we do have indicates that Clovertown will probably offer slightly better performance per Watt than the 2.66GHz dual-core.

Conclusion

After having run behind for years, Intel grabs the performance crown back with the introduction of the Xeon Woodcrest. AMD did not have a sufficient technical response to the new architecture and had to cut the prices of its Opterons in order to prevent it from losing its competitiveness. A calmer management guild might have been content with that, but the new aggressive Intel does not want to leave any doubt that the company is ahead again, and does not allow its competitor any breathing space. The step from two to four cores widens the gap between Xeon and Opteron considerably. Of course, it is not entirely fair to compare Intel's quad-core, which is based on a new architecture, to AMD's current K8 dual-core. But the fact of the matter is that this is precisely the choice for customers during the next six months. That period will see sales of x86 processors worth about 12 billion dollars, so there is a golden opportunity for Intel to grab market share back.

AMD has all of its hope set on Barcelona, the quad-core Opteron. According to the latest rumours, it will be introduced at a clock speed of 2.5GHz at the most - a little under the speed of Intel's 2.66GHz quad-core. As yet, it is unclear what effect the changes to the 'K8L' design will have on the performance, but if we take the easy assumption that it will do the same amount of work per clock tick, it will be at a small disadvantage as far as raw computing power is concerned. However, this is likely to be compensated by the L3 cache with integrated memory controller. Intel will have an answer to this in the form of higher clock speeds, possibly still at 65nm, or else with the coming 45nm production technology. Rumour has it that Intel's 45nm quad-cores will exceed 3.0GHz, which is something AMD will have a hard time following.

AMD K8L-core with improvements

Barcelona has a further significant advantage against Clovertown: it scales to 4 and 8 sockets. The Xeon MP 'Tulsa' can barely stand up to the current Opteron, but is helpless against the new generation of quad-cores. That is why Intel is putting a lot of work into Tigerton, a Clovertown version that is capable of being used in heavier systems. The biggest problem is not the processor – which is virtually identical – but the chip sets. With four separate 1066MHz buses and a 64MB cache in the northbridge, Intel's own Clarksboro chip set will smoothen the communication between the sockets considerably compared to today's standards. Meanwhile, IBM is working on X4 – the successor of the X3 'Hurricane' – which can scale to 32 sockets by simulating the Opteron's Numa architecture in the chip set. How these will do compared to the Barcelona is anyone's guess.

Clovertown, Kentsfield, and Tigerton

In our benchmark, Clovertown did not offer tremendous gains, and especially MySQL was - initially - disappointing. The latest development version partially solves the problems: performance with eight cores is no longer dramatically bad, but on average, heavy loading does not make things better than a double dual-core using the old version of the software. But we did register a 19% higher peak, although possibly Woodcrest will also profit from the changes to version 5.0.32-bk, so the gains cannot be solely attributed to the extra cores. PostgreSQL does better, but average gains of 19% is not really something to write home about, given that it is the result of doubling the number of cores in the machine.

Possibly, developments on the hardware side are going to fast for the open source community. But Intel's architecture may also be responsible: with limited bus band width and no shared cache, AMD does not recognize Clovertown as a quad-core proper. We may also have to point the finger at ourselves and accept that this type of server is not intended for the Tweakers.net database. But we shall find out if this holds as soon as we get our hands on a heavier Opteron machine.

* Acknowledgements

Melrow logoTweakers.net would like to thank Melrow for lending us a Clovertown-server, Peter Zaitsev from the MySQL Performance Blog for checking our configuration, ACM and moto-moi for setting up and executing the benchmarks, and Mick de Neeve for the English translation of this review.

* Earlier articles in this series

13-11-2006: Intel Xeon 'Woodcrest' 3,0GHz (Apollo 5)
4-9-2006: Intel Xeon 'Woodcrest' 2,66GHz
30-7-2006: AMD Opteron Socket F 2,4GHz
27-7-2006: Sun UltraSparc T1 vs. AMD Opteron
19-4-2006: Xeon vs. Opteron, single- and dualcore (in Dutch)

Reacties

0
0
0
0
0
0
Wijzig sortering

Er zijn nog geen reacties geplaatst

Op dit item kan niet meer gereageerd worden.