On November 14 Intel introduced its first quad-core server chips in the form of the Xeon 5300 series, also known under the code name Clovertown. Although the chip is technically pretty much identical to the desktop version which we looked at earlier, it runs on a different platform and is meant for other tasks. In this article we look at the influence of the step from dual-core to quad-core on Intel's position and how Clovertown holds up in our database test.Since the Xeon 5300 'Clovertown' is, with the exception of a faster bus, identical to the Core 2 Extreme QX6700 'Kentsfield' - whose architecture we looked at extensively in this review - this article has a somewhat more light-hearted introduction. In the table below, the new quad-core is compared to Intel's very first microprocessor, the 4004. This processor was in the spotlight again recently, because it was 35 years ago on November 13 that the chip hit the market.
| 4004 | Clovertown | Difference | |||
|---|---|---|---|---|---|
| Introduction | 1971 | 2006 | 35 years later | ||
| Price | ~$935* | $1172 | 25% more expensive | ||
| Cores | 1 | 4 | 4x as much | ||
| Instructions per clock tick | 1 | 16 | 4x as much per core | ||
| Clock speed | 108kHz | 2.66GHz | 24.691x higher | ||
| Bus frequency | 108kHz | 1333MHz | 12.346x higher | ||
| Bandwidth | 54KB/s | 10.6GB/s | 197.530x more | ||
| Production technique | 10000nm | 65nm | 154x smaller | ||
| Wafer size | 50mm | 300mm | 36x larger surface | ||
| Transistors | 2300 | 582 million | 63.261x more per core | ||
| Physical size | 12mm² | 286mm² | 6x larger per core | ||
| Contact points | 16 | 771 | 48x more | ||
| Registers | 4 bit | 64 bit | 16x wider | ||
| Address space | 640KB | 16EB | Unimaginable | ||
| Instructions | 46 | ~700 | 15x as many | ||
| Consumption | 1W | 120W | 30x higher per core | ||
| Voltage | 15V | 1.34V | >90% lower | ||
| Power | 0,07A | 90A | 1286x stronger | ||
The difference is humongous: with its clock of a tenth of a megahertz, the 4004 could perform a little over 100,000 instructions per second, while Clovertown can manage almost 43 billion in the same time span. To equal the theoretical computing power of a quad-core Xeon, 395.061 4004's would be needed, and then we are not even recognising the fact that one does its math with 4 bits while the other uses 64 bits. We'll save Intel's first-born from a crushing comparison in terms of performance per Watt and per dollar, but it should be clear that the small steps that we have seen over the years, exhibit extreme differences taken across a few decades.Summer of servers, SPEC CPU

Intel has put in extra hours in the second half of the past year to overtake the competition. All that AMD managed to do to counter that was the introduction of the Socket F models (which, except for the 2.8GHz flavour did little to increase performance), and a price cut. Although that does not mean that AMD did badly in terms of contracts it won and market share – quite the contrary – it does mean that Intel has a lot more weight than it had for a long time. Although the quad-core Opteron Barcelona might tip the balance again in six months, Intel is ahead in many benchmarks at the moment.
Before we turn to our own comparative database test, we shall show the current state of affairs in database country. For completeness: although the 'non-tweakers.net'-scores that have been collected below are practical scores, they have been measured independently This means that there may be differences in a number of parameters of the various system configurations. It also means that better results are not always fully due to hardware improvements: after all, the software that comes with it also tends to get improved. But it is safe to assume that each system builder does his best to achieve the best results, and with this in mind the scores may be considered indicative.
We start with SPECint_rate and SPECfp_rate, two benchmarks designed to measure raw processor performance. The 'INT' (integer) suite consists of a compiler, chess programs, compression and text processing, while the 'FP' (floating point) suite contains, among others, face recognition, neural networks and physics as well as chemistry simulations. All subtests are based on software that is also used 'in real life', but the code has been altered in certain spots to minimize, among other things, hard disk load and improve portability to other platforms.
The addition 'rate' indicates that we are not dealing with a test of a single chip, but of all cores in a system simultaneously, which makes bandwidth a significant factor. 'Peak' indicates that the compiler may be tuned to the maximum extent. A standard run demands that everything is built with the same setting, but for a 'peak' run every individual test may have its own parameters. Since Clovertown-servers have a unique configuration of eight cores in two socket, we compare it to systems that have eight cores in four sockets, as well as configurations with four cores in two sockets. In the integer benchmark we see that the new Xeon beats the competition: the Core architecture already proved itself to be good at these sort of tasks, but improving the best scores of four socket systems is an impressive achievement.
| Opteron 8220 | 4 | 2,8GHz | Santa Rosa | |||||||
| Xeon MP 7140 | 4 | 3,4GHz | Tulsa | |||||||
| Power5 | 4 | 1,9GHz | ||||||||
| Itanium 2 9050 | 4 | 1,6GHz | Montecito | |||||||
| Xeon MP 7041 | 4 | 3,0GHz | Paxville | |||||||
| 2 | 2,66GHz | Clovertown | ||||||||
| Xeon 5160 | 2 | 3,0GHz | Woodcrest | |||||||
| Opteron 2280 | 2 | 2,8GHz | Santa Rosa | |||||||
| Power5+ | 2 | 2,1GHz | ||||||||
| Xeon 5080 | 2 | 3,73GHz | Dempsey | |||||||
| Xeon DC | 2 | 2,8GHz | Paxville | |||||||
In the FP benchmark it turns out that the quad-core doesn't do so well, which is presumably due to its limited bandwidth – which is something that SPECfp_rate can't get enough of. But we do see that Intel has made steady improvements over the course of the year, from a meagre 40.3 with the Paxville to a respectable score of 104. That is still lower than the Opteron (although it would be a narrow victory without compiler magic) but the problem is that AMD still has the step to quad-core architecture and 128 bit computation units ahead, while Clovertown has already been there. The Itanium can keep up, but this test remains one of AMD's favourites.
| Power5 | 4 | 1,9GHz | ||||||||
| Itanium 2 9050 | 4 | 1,6GHz | Montecito | |||||||
| Opteron 8220 | 4 | 2,8GHz | Santa Rosa | |||||||
| Xeon MP 7140 | 4 | 3,4GHz | Tulsa | |||||||
| Xeon MP 7041 | 4 | 3,0GHz | Paxville | |||||||
| Power5+ | 2 | 2,1GHz | ||||||||
| Itanium 2 9050 | 2 | 1,6GHz | Montecito | |||||||
| Opteron 2220 | 2 | 2,8GHz | Santa Rosa | |||||||
| 2 | 2,66GHz | Clovertown | ||||||||
| Xeon 5160 | 2 | 3,0GHz | Woodcrest | |||||||
| Xeon 5080 | 2 | 3,73GHz | Dempsey | |||||||
| Xeon DC | 2 | 2,8GHz | Paxville | |||||||
TPC-C, SAP-SD, and SPECjbb2005
It turns out that Clovertown has the best performance as well as the best price/performance ratio of all two socket servers, but it does not measure up against the big boys with four. The Opteron does not convince at first sight, but whoever takes the absolute prices into consideration instead of performance per dollar, will note that the AMD machines are the cheapest. Incidentally, the same version of SQL Server (64 bit) was used to obtain the Opteron and Clovertown scores. Woodcrest, Dempsey, and Tulsa lack SP1, which might put them at a slight disadvantage.
| Power5 | 4 | 1,9GHz | ||||||||
| Itanium 2 9050 | 4 | 1,6GHz | Montecito | |||||||
| Xeon MP 7140 | 4 | 3,4GHz | Tulsa | |||||||
| Opteron 8220 | 4 | 2,8GHz | Santa Rosa | |||||||
| Xeon MP 7041 | 4 | 3,0GHz | Paxville | |||||||
| 2 | 2,66GHz | Clovertown | ||||||||
| Itanium 2 9050 | 2 | 1,6GHz | Montecito | |||||||
| Power5 | 2 | 1,9GHz | ||||||||
| Xeon 5160 | 2 | 3,0GHz | Woodcrest | |||||||
| Opteron 2220 | 2 | 2,8GHz | Santa Rosa | |||||||
| Xeon 5080 | 2 | 3,73GHz | Demspey | |||||||
SAP-SD (Sales & Distribution) is a similar benchmark in that it also simulates business processes, but rather than transactions per minute, it is the number of users that can work on the system simultaneously, that is given as the final score. As the name indicates, this test is aimed more at the well known ERP package than on the underlying database, but that is not to say that it is not sensitive to factors outside the processor.
Unfortunately, prices may not be listed along with the test results. What we can do instead is check the CPU loads to see whether the system was pushed to its limit. In virtually all cases the load is above 95%, with the exception of the four-way Paxville. This means that the processor was a limiting factor in the rest of the results, and there would be little point in adding extra hard disks or memory. Here, too, we see Intel's come-back: the performance for two sockets has been more than doubled and for four has increased by more than half - enough to overtake the Opteron in both cases.
| Itanium 2 9050 | 4 | 1,6GHz | Montecito | |||||||
| Xeon MP 7140 | 4 | 3,4GHz | Tulsa | |||||||
| Opteron 2280 | 4 | 2,8GHz | Santa Rosa | |||||||
| Xeon MP 7041 | 4 | 3,0GHz | Paxville | |||||||
| 2 | 2,66GHz | Clovertown | ||||||||
| Xeon 5160 | 2 | 3,0GHz | Woodcrest | |||||||
| Opteron 2218 | 2 | 2,6GHz | Santa Rosa | |||||||
| Xeon 5080 | 2 | 3,73GHz | Demspey | |||||||
| Xeon DC | 2 | 2,8GHz | Paxville | |||||||
The last business benchmark is SPECjbb2005, that simulates a triple layer architecture which emphasizes the middle one, the so-called 'business logic', in which all XML processing and such takes place. The test was written entirely in Java, so the performance of the virtual machine is at least as important as that of the software itself. The score is expressed in BOPS - business operations per second. And again, Clovertown dominates with a score that even four-socket systems cannot get close to.
| Xeon MP 7140 | 4 | 3,4GHz | Tulsa | |||||||
| Opteron 8220 | 4 | 2,8GHz | Santa Rosa | |||||||
| Itanium 2 9050 | 4 | 1,6GHz | Montecito | |||||||
| Power5+ | 4 | 1,65GHz | ||||||||
| 2 | 2,66GHz | Clovertown | ||||||||
| Xeon 5160 | 2 | 3,0GHz | Woodcrest | |||||||
| Opteron 2220 | 2 | 2,8GHz | Santa Rosa | |||||||
| Xeon 5080 | 2 | 3,73GHz | Dempsey | |||||||
| Power5+ | 2 | 1,65GHz | ||||||||
| Xeon DC | 2 | 2,8GHz | Paxville | |||||||
| UltraSparc T1 | 1 | 1,2GHz | Niagara | |||||||
SPECweb2005 and summery
The last benchmark that we look at here is SPECweb2005, a test that looks at the performance of machines when put to use as web servers, and examines dynamical page creation (in PHP or JSP), encryption (SSL connection) and web services. The result is a weighed average of the number of simultaneous sessions, in three scenarios: banking, shopping and support. The good performance of Niagara is noteworthy here: its performance is comparable to that of a dual Woodcrest or even a quad Tulsa. It cannot keep up with a double Clovertown or quadruple Opteron though.
|
|
||||||||||
| Opteron 8220 | 4 | 2,8GHz | Santa Rosa | |||||||
| Xeon MP 7140 | 4 | 3,4GHz | Tulsa | |||||||
| 2 | 2,66GHz | Clovertown | ||||||||
| Xeon 5160 | 2 | 3,0GHz | Woodcrest | |||||||
| Opteron 285 | 2 | 2,6GHz | Italy | |||||||
| Power5+ | 2 | 1,9GHz | ||||||||
| Xeon 5080 | 2 | 3,73GHz | Dempsey | |||||||
| Xeon DC | 2 | 2,8GHz | Paxville | |||||||
| UltraSparc T1 | 1 | 1,2GHz | Niagara | |||||||
In summary, we can conclude that Intel has made some significant gains during the past year. In the two socket segment we have seen average gains of 24% over the Opteron for the Woodcrest. The only test that Intel doesn't win is SPECfp_rate, but it has managed to bring down the difference considerably. Following Clovertown's introduction, the average performance difference between Xeon and Opteron went up to as much as 79%. Although AMD is certain to reduce that gap with the introduction of its own quad-core, it is at least half a year before that is released. For the time being, Intel can continue to improve its once-tarnished reputation and increase its market share, while the best AMD can do is reduce the damage with lower prices.
In the four-way server segment the story is less clear. Intel has gone from being hopelessly behind to a more or less equal performance, winning some tests and losing certain others. AMD is still ahead in terms of price and power consumption. Although Tulsa may be just about good enough to prevent further erosion of the market share, Intel will have to come up with something more convincing to turn the tide, especially since Opteron is also getting four cores next year. Intel has put its hope on Tigerton in order to fight off the AMD's new generation.
| 4-way | Opteron | Tulsa | Difference |
|---|---|---|---|
| SPECint_rate_peak2000 | 175 | 164 | -6% ![]() |
| SPECfp_rate_peak2000 | 178 | 110 | -38% ![]() |
| TPC-C | 262989 | 318407 | 21% ![]() |
| SAP-SD | 1978 | 2127 | 8% ![]() |
| SPECjbb2005 | 143525 | 178201 | 24% ![]() |
| SPECweb2005 | 20235 | 14896 | -26% ![]() |
| Average | -3% ![]() |
||
| 2-way | Opteron | Clovertown | Verschil |
| SPECint_rate_peak2000 | 90.3 | 200 | 121% ![]() |
| SPECfp_rate_peak2000 | 119 | 104 | -13% ![]() |
| TPC-C | 139693 | 240737 | 72% ![]() |
| SAP-SD | 1047 | 1806 | 72% ![]() |
| SPECjbb2005 | 80617 | 210065 | 161% ![]() |
| SPECweb2005 | 11293 | 18160 | 61% ![]() |
| Average | 79% ![]() |
||
| 2-way | Opteron | Woodcrest | Verschil |
| SPECint_rate_peak2000 | 90.3 | 123 | 36% ![]() |
| SPECfp_rate_peak2000 | 119 | 85.9 | -18% ![]() |
| TPC-C | 139693 | 169360 | 21% ![]() |
| SAP-SD | 1047 | 1285 | 23% ![]() |
| SPECjbb2005 | 80617 | 130589 | 63% ![]() |
| SPECweb2005 | 11293 | 13257 | 17% ![]() |
| Average | 24% ![]() |
||
| 2-way | Woodcrest | Clovertown | Verschil |
| SPECint_rate_peak2000 | 123 | 200 | 63% ![]() |
| SPECfp_rate_peak2000 | 85.9 | 104 | 21% ![]() |
| TPC-C | 169360 | 240737 | 42% ![]() |
| SAP-SD | 1285 | 1806 | 41% ![]() |
| SPECjbb2005 | 130589 | 210065 | 61% ![]() |
| SPECweb2005 | 13257 | 18160 | 37% ![]() |
| Average | 44% ![]() |
||
Test hardware


The X5355 that we received is the top model of the Clovertown series, with a clock speed of 2.66GHz, a 120W TDP and a 1172 dollar price tag. At the moment, there are three cheaper models in circulation with lower clock speed and a more modest TDP of 80W. Two more versions are expected early next year, including a low voltage model which needs only 50W, in other words 12.5W per core. Below, we give an overview of the various quad-cores and a comparison of their prices as opposed to those in the Woodcrest and Opteron hemispheres:
| Model | Clock | Bus | TDP | Price | Intro |
|---|---|---|---|---|---|
| X5355 | 2,66GHz | 1333MHz | 120W | $1172 | - |
| E5345 | 2,33GHz | 1333MHz | 80W | $851 | - |
| E5335 | 2,0GHz | 1333MHz | 80W | ? | Q1 |
| E5320 | 1,86GHz | 1066MHz | 80W | $690 | - |
| E5310 | 1,6GHz | 1066MHz | 80W | $455 | - |
| L5310 | 1,6GHz | 1066MHz | 50W | ? | Q1 |
| Opteron | Woodcrest | Clovertown | ||||||
|---|---|---|---|---|---|---|---|---|
| 1,8GHz | $209 | 1,6GHz | $209 | |||||
| 2,0GHz | $255 | 1,86GHz | $256 | |||||
| 2,0GHz | $316 | |||||||
| 2,2GHz | $377 | |||||||
| 2,4GHz | $450 | 2,33GHz | $455 | 1,6GHz | $455 | |||
| 2,6GHz | $611 | |||||||
| 2,66GHz | $690 | 1,86GHz | $690 | |||||
| 2,8GHz | $786 | |||||||
| 3,0GHz | $851 | 2,33GHz | $851 | |||||
| 2,66GHz | $1172 | |||||||
Clovertown is clearly the most expensive chip in our test arsenal. In this article we compare it to Intel's top-of-the-line dual-core - the 3.0GHz Woodcrest - because that CPU comes closest in terms of pricing.

MySQL 5.0.20a and 5.0.32-bk

This is not the first time we saw MySQL exhibit this sort of behaviour: in the article on the Sun T2000 we reported the same problem. MySQL developers are aware of the problem but when we searched for a solution with Sun at the time, nothing could be done about it yet. For this test, we went out looking for a solution once more, and it turned out that a patch is being readied. It hasn't been officially released, but a MySQL 5.0.32 snapshot from BitKeeper – the system that developers use to manage their source code - allowed us a sneak preview how things stand.
The latest MySQL developments are interesting: it appears as if part of the performance with a low number of cores has been traded for stability when more cores are used. On average, using a single core costs us around 10% of the performance as compared to version 5.0.20a, and with two cores this is down to 4%. At four cores, however, we register a gain of 1% and with eight cores things go 36% better. Also, the peak of 616 requests per second for 5.0.32-bk is clearly better than the 498 that we could squeeze out of 5.0.20a. But still, it turns out that MySQL does not fancy heavy loads too much: after the peak the number of requests per second drops sharply again.

PostgreSQL 8.2-dev

Fully loaded, the server consumes 355 Watts. On average, the use of PostgreSQL allowed 448.726 pages to be served in ten minutes, making for a performance/Watt ratio of 1264. Unfortunately, there's nothing out there yet to compare this to. The results we published earlier were obtained using an older version of the Linux kernel, while the new one is clearly better. A coarse estimate based on figures that we do have indicates that Clovertown will probably offer slightly better performance per Watt than the 2.66GHz dual-core.
Conclusion
AMD has all of its hope set on Barcelona, the quad-core Opteron. According to the latest rumours, it will be introduced at a clock speed of 2.5GHz at the most - a little under the speed of Intel's 2.66GHz quad-core. As yet, it is unclear what effect the changes to the 'K8L' design will have on the performance, but if we take the easy assumption that it will do the same amount of work per clock tick, it will be at a small disadvantage as far as raw computing power is concerned. However, this is likely to be compensated by the L3 cache with integrated memory controller. Intel will have an answer to this in the form of higher clock speeds, possibly still at 65nm, or else with the coming 45nm production technology. Rumour has it that Intel's 45nm quad-cores will exceed 3.0GHz, which is something AMD will have a hard time following.

Barcelona has a further significant advantage against Clovertown: it scales to 4 and 8 sockets. The Xeon MP 'Tulsa' can barely stand up to the current Opteron, but is helpless against the new generation of quad-cores. That is why Intel is putting a lot of work into Tigerton, a Clovertown version that is capable of being used in heavier systems. The biggest problem is not the processor – which is virtually identical – but the chip sets. With four separate 1066MHz buses and a 64MB cache in the northbridge, Intel's own Clarksboro chip set will smoothen the communication between the sockets considerably compared to today's standards. Meanwhile, IBM is working on X4 – the successor of the X3 'Hurricane' – which can scale to 32 sockets by simulating the Opteron's Numa architecture in the chip set. How these will do compared to the Barcelona is anyone's guess.

In our benchmark, Clovertown did not offer tremendous gains, and especially MySQL was - initially - disappointing. The latest development version partially solves the problems: performance with eight cores is no longer dramatically bad, but on average, heavy loading does not make things better than a double dual-core using the old version of the software. But we did register a 19% higher peak, although possibly Woodcrest will also profit from the changes to version 5.0.32-bk, so the gains cannot be solely attributed to the extra cores. PostgreSQL does better, but average gains of 19% is not really something to write home about, given that it is the result of doubling the number of cores in the machine.
Possibly, developments on the hardware side are going to fast for the open source community. But Intel's architecture may also be responsible: with limited bus band width and no shared cache, AMD does not recognize Clovertown as a quad-core proper. We may also have to point the finger at ourselves and accept that this type of server is not intended for the Tweakers.net database. But we shall find out if this holds as soon as we get our hands on a heavier Opteron machine.
Tweakers.net would like to thank Melrow for lending us a Clovertown-server, Peter Zaitsev from the MySQL Performance Blog for checking our configuration, ACM and moto-moi for setting up and executing the benchmarks, and Mick de Neeve for the English translation of this review.
