Original Link: https://www.anandtech.com/show/2294



Introduction

"AMD has no answer to the armada of new Intel's CPUs."

"Penryn will be the final blow."

These two sentences have been showing up on a lot of hardware forums around the Internet. The situation in the desktop is close to desperate for AMD as it can hardly keep pace with the third highest clocked Core 2 Duo CPU, and there are several quad core chips - either high clocked expensive ones or cheaper midrange models - that AMD simply has no answer for at present. As AMD gets closer to the launch of their own quad core, even at a humble 2GHz, Intel let the world know it will deliver a 3GHz quad core Xeon with 12 MB L2 that only needs 80W, and Intel showed that 3.33GHz is just around the corner too. However, there is a reason why Intel is more paranoid than the many hardware enthusiasts.

While most people focus on the fact that Intel's Core CPUs win almost every benchmark in the desktop space, the battle in the server space is far from over. Look at the four socket market for example, also called the 4S space. As we showed in our previous article, the fastest Xeon MP at 3.4GHz is about as fast as the Opteron at 2.6GHz. Not bad at all, but today AMD introduces a 3.2GHz Opteron 8224, which extends AMD's lead in the 4S space. This lead probably won't last for long, as Intel is very close to introducing its newest quad core Xeon MP Tigerton line, but it shows that AMD is not throwing in the towel. Along with the top-end 3.2GHz 8224 (120W), a 3GHz 8222 at 95W, 3.2GHz Opteron 2224 (120W) and 3GHz 2222 (95W) are also being introduced.

The 3.2GHz Opteron 2224 is quite interesting, as it is priced at $873. This is the same price point as the dual core Intel Xeon 5160 at 3GHz and the quad core Intel Xeon 5355. The contrast with the desktop market is sharp: not one AMD desktop CPU can be found in the higher price ranges. So how does AMD's newest offering compare to the two Intel CPUs? Is it just an attempt at deceiving IT departments into thinking the parts are comparable, or does AMD have an attractive alternative to the Intel CPUs?



A Closer Look at AMD's Newest Offering

AMD thus brings four new CPUs into the server space:
  • A 3.2GHz Opteron which consumes at most 120W, in dual and quad/octal socket versions
  • A 3GHz Opteron 8222 and 2222 which used to consume up to 120W, but which are - thanks to process improvements - now limited to 95W
Both CPUs are still made on the "old" 90 nm SOI process. It seems that AMD's 65 nm investments are all focused on the desktop offerings as well as the newest K10 parts.


The Opteron 2224 package

To position AMD's new Opteron 2224 offering we take a look at the most interesting Intel and AMD CPUs at the current date.

Intel Processor Overview
Intel CPU Clock Codename L2 L3 FSB Mem bandwidth TDP Price
Dual NetBurst CPUs
Xeon MP 7140M 3.4GHz Tulsa 2x1MB 16MB 200MHz Quad 6.4 GB/s 150W $1,980
Xeon MP 7130M 3.2GHz Tulsa 2x1MB 8MB 200MHz Quad 6.4 GB/s 150W $1,391
Xeon MP 7120M 3GHz Tulsa 2x1MB 4MB 200MHz Quad 6.4 GB/s 95W $1,117
Quad Core CPUs
Xeon E5355 2.66GHz Clovertown 2x4MB - 333MHz Quad 21 GB/s 120W $1,172
Xeon E5345 2.33GHz Clovertown 2x4MB - 333MHz Quad 21 GB/s 80W $851
Xeon E5320 1.86GHz Clovertown 2x4MB - 266MHz Quad 17 GB/s 80W $690
Dual Core CPUs
Xeon DP 5160 3GHz Woodcrest 4MB - 333MHz Quad 21 GB/s 80W $851
Xeon DP 5150 2.66GHz Woodcrest 4MB - 333MHz Quad 21 GB/s 65W $690
Xeon DP 5148 2.33GHz Woodcrest 4MB - 333MHz Quad 21 GB/s 40W $519
AMD Processor Overview
AMD CPU Clock Codename L2 L3 HT Mem bandwidth TDP Price
Eight-Way CPUs
Opteron 8224 SE 3.2GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 119W $2,149
Opteron 8222 3GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 95W $1,514
Opteron 8220 2.8GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 95W $1,165
Opteron 8218 2.6GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 95W $873
Opteron 8218 HE 2.6GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 68W $1,019
Two-Way CPUs
Opteron 2224 SE 3.2GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 119W $873
Opteron 2222 3GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 95W $698
Opteron 2220 2.8GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 95W $523
Opteron 2218 2.6GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 95W $377
Opteron 2218 HE 2.6GHz Santa Rosa 2x1MB - 1000MHz DDR 10.6 GB/s 68W $450

So basically, AMD's 120W Dual core 3.2GHz Opteron has to prove that it is a worthy competitor to two Intel offerings:
  • The dual core Xeon 5160 3GHz with a TDP of 80W
  • The quad core Xeon 5345 2.33 with a TDP of 80W
As Intel is offering twice as many cores at lower power consumption, it seems that it is already game over for AMD at first sight. However, this is not necessarily the case. First of all, not all workloads scale well from four to eight cores. Prime examples are the well known MySQL scaling problems, quite a few HPC applications, and even some Java applications. This might seem like a weird statement to the casual benchmark observer, but if you delve a bit deeper you'll find that some benchmarks use several (mostly four) instances running together to make scaling easier. Examples of such an approach are Sysbench, MySQL, and Specjbb2005.

This means that the software only has to make use of two CPU cores (eight cores needs four instances), which is a lot easier than making use of all eight cores in a single instance. This kind of "multi instance" benchmarking may reflect the way quite a few people and businesses use their servers, but at the same time such benchmarks can paint a picture that is far too optimistic for those people who are only running one application on their server.

Secondly, it might seem like a dual Opteron 2224 needs 80W more, but the reality is different. Intel's Northbridge consumes up to 20W more, and each FB-DIMM needs about 5W more than the DDR modules the AMD platform uses. So the difference is not as big as it might seem at first, something we have shown in a previous article. The AMD CPUs also scale back to 1GHz when running at idle, while the Intel CPUs run at 1.6 or 2GHz when idle. As a result the AMD platforms can consume less power when idle or at lower loads.


The Opteron runs now at 3.2GHz

As you can see in the animated gif above, the CPU is capable of scaling back to 1GHz and 1.1V. It can also run at 2.8GHz at 1.275V, and it needs 1.375V to run at 3.2GHz. PowerNow! enables it to run at any frequency between 1.6GHz to 2.8GHz in steps of 200MHz. Keep this in mind when we look at some power figures later on.



Words of Thanks

A lot of people gave us assistance with this project, and we would of course like to thank them.

Kelly Sasso, Crucial Technology


Our experience: Crucial offers excellent support and quality for barebone servers

William H. Lea, Intel US
Jerry R. Baugh, Intel US
Matty Bakkeren, Intel Netherlands
(www.intel.com)

Brett Jacobs, AMD US
Damon Muzny, AMD US
(www.amd.com)

Bob Cramblitt
Larry D. Gray
(www.spec.org)

Benchmark configuration
Here is the list of the different configurations. All servers have been flashed to the latest BIOS, and unless we add any specific comments to the contrary, the BIOS was set to default settings.

Opteron Socket F 1207 Server 1: Tyan Transport TA26 - 2932
Dual Opteron 2222 3GHz / 2224SE 3.2GHz
Tyan Thunder n3600m (S2932) - NVIDIA nForce Pro 3600 chipset
8GB (4x2GB) Crucial Registered DDR2-667 CL5 ECC
NIC: nForce Pro 3600 integrated MAC with Marvell 88E1121 Gigabit Ethernet PHY

Xeon Server 1: Intel "Bensley platform" server
2x Xeon 5160 3GHz or 2x Xeon E5345 at 2.33GHz
Intel Server Board S5000PSL - Intel 5000P Chipset
8GB (4x2GB) Crucial Registered FB-DIMM DDR2-667 CL5 ECC
NIC: Dual Intel PRO/1000 Server NIC
BIOS comment: Hardware prefetching disabled.

Client Configuration: Dual Opteron 850
MSI K8T Master1-FAR
4x512 MB Infineon PC2700 Registered, ECC
NIC: Broadcom 5705

Software
SUSE Linux SLES SP1 (Linux 2.6.16.46-smp)
MySQL 5.0.26 as shipped with SUSE SLES 10 SP1
SPECjbb2005
Sun Hotspot Java JVM 1.5.0_08
3DSMax 9
Cinebench 9.5
WinRAR 3.61



Tyan Transport TA26

It has taken us a while, but we finally have a full blown Socket F server in the labs.


The Tyan TA26's front

The 2U rack-mountable barebone TA26 B2932 supports two Socket F Opterons and didn't have any trouble with our Opteron 3.2GHz parts. (The BIOS was flashed to version 2.0). The exact model in the lab is the B2932T26W8HR which supports eight hot swappable SAS disks and a (1+1) redundant 600W power supply.


The internals of our Tyan Server

The motherboard in the system is the TYAN Thunder n3600M S2932, which is based on the NVIDIA nForce Pro 3600. A total of sixteen DDR2 DIMM sockets support up to 64GB of registered DDR2-667 memory. Two PCI Express x16 slots (x8 electrical) allow interested sysops to turn this server into an SLI gaming machine, but you'll need 2U GeForce cards.... (We are kidding, of course.) It is good to see that there are still two PCI-X 133/100MHz, one PCI-X 100MHz, and one 32-bit PCI slot available as this will protect any previous investments in NICs and storage adapters.

Our experiences were very good with this server: removable components such as fans, heatsinks, disks, and PSUs are very user-friendly and easy to use. We saw only one minor disadvantage: the three fast fans are capable of cooling the 3.2GHz chips, but when one fails the cooling system hits its limits. We didn't experience any crashes, but the CPUs got very hot (70-75°C) with two fans. On the plus side, the automatic fan speed control does a very good job in adjusting fan speed to provide sufficient heat dissipation. There is very little latency: the fan speed almost immediately increases as the CPU throttles up from being idle at 1GHz to full load at 3.2GHz.



The Secret Boost of the Opteron 2224

Socket F Opterons have a small secret weapon: a speed bump offers more than just a faster CPU. To understand this, take a look at the table below. We measured the L2 cache's bandwidth with Lavalys Everest 3.51.

Lavalys Everest 3.51 L2 Bandwidth
  Read (MB/s) Write (MB/s) Copy (MB/s)
Dual Xeon 5160 3.0 GHz 22019 17751 23628
Xeon E5345 2.33 GHz 17610 14878 18291
Opteron 2224 SE 3.2 GHz 14636 12636 14630
Opteron 8218HE 2.6 GHz 11891 10266 11891

The L2 cache of the Opteron 8218 at 2.6GHz is slower than the Core 2's L2 cache at 2.33. At about 10-11 GB/s it barely matches the theoretical peak bandwidth that DDR2 at 667MHz can deliver (10.6 GB/s), while its exclusive nature also forces it to exchange quite a bit of data with the L1 cache. Now combine this table with the following one, where we measured memory bandwidth.

Lavalys Everest 3.51 Memory Bandwidth
  Read (MB/s) Write (MB/s) Copy (MB/s) Latency (ns)
Dual Xeon 5160 3.0 GHz 3656 2771 3800 112.2
Xeon E5345 2.33 GHz 3578 2793 3665 114.9
Opteron 2224 SE 3.2 GHz 7466 6980 6863 58.9
Opteron 8218HE 2.6 GHz 6944 6186 5895 64

It is no secret that a higher clocked integrated memory controller can increase the actual delivered bandwidth of the same DDR2 modules. But it also helps that the L2 cache is able to swallow the bandwidth that the memory is capable of delivering. Also notice that without the use of SSE2 instructions, the memory subsystem of the 5000p chipset delivers relatively disappointing amounts of bandwidth. As most applications do not use carefully tuned SSE2 code to get data from memory, this should reflect the real world situation most of the time. And of course, until Intel introduces the Nehalem family, memory latency will continue to be one of the strong points of AMD.

Processor Latency Comparison
CPU L1 L2 L3 min mem max mem Absolute latency (ns)
Xeon 5160 3.0 - DDR2 533 3 14   69 380 127
Xeon 5160 3.0 - DDR2 667 3 14   67 338 113
Core 2 Duo 2.933 - DDR2 533 3 14   67 180 61
Quad Xeon E5345 2.33 - DDR2 533 3 14   80 280 120
Quad Xeon E5345 2.33 - DDR2 667 3 14   80 271 116
Xeon 7130M 3.2 - DDR2 400 4 29 109 245 624 195
Opteron 880 2.4 - DDR333 3 12   84 228 95
Opteron 2224 SE - DDR2 667 3 12   72 189 59
Opteron 2218 HE - DDR2 667 3 12   62 157 60

The latency penalty that FB-DIMM introduces is huge. To get an idea, we added the latency measured with a Core 2 Duo 2.933 using 2x 2GB 533MHz DDR2. The staggering conclusion is that registered FB-DIMMs add - in the worst case - about 200 cycles or 66ns of latency. Sure, some of that latency can be attributed to the buffering which is necessary for server memory. Buffered memory contains registers which will actually hold data for one full clock cycle before it's passed on. So this means that registered memory should add about 8ns (2 clock cycles at 266MHz base clock, DDR2-533).

The secondary benefit of FB-DIMMs is that motherboards can use more DIMMs per bank, potentially increasing total memory capacity. AMD already gets around this quite easily with up to eight DIMM sockets per CPU socket, however, so this benefit really doesn't materialize in any reasonable form. The bottom line is that while FB-DIMMs were a potentially good idea from a purely theoretical point of view, it is rather obvious that in practice they have some pretty bad consequences.



SPECjbb2005

SPECjbb2005 from SPEC (Standard Performance Evaluation Corporation) evaluates the performance of server side Java by emulating a three-tier client/server system with emphasis on the middle tier. Instead of testing with a possible disk intensive database system, SPECjbb uses tables of objects, implemented by Java Collections, rather than a separate database. A longer description can be found here.

Again, it is not our objective to show the best possible scores. Very few people will take the time to fully tune the JVM and take the risk that some of the ultra aggressive optimizations backfire. So we tested with some decent but rather generic tuning that we could use on all systems. The JVM is Sun's version 1.5.0_08, which allows us to compare scores with previous results.

We tested SPECjbb2005 with four application instances. Using NUMActl, a clever utility written by Andi Kleen, we were able to bind each Java application to each node on our Tyan server. We didn't bind instances to CPUs on the Intel platforms (though it is possible with taskset) as it gives worse performance. The parameters in bold show the actual JVM optimizations.

On the Opteron we used:
numactl --cpunodebind=$node --membind=$node -- java -cp jbb.jar:check.jar -Xms2g -Xmx2g -Xmn1g -Xss128K -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id $x
On the Xeons we used:
java -classpath jbb.jar:check.jar -Xms2g -Xmx2g -Xmn1g -Xss128K -XX:+AggressiveOpts -XX:+UseParallelOldGC -XX:+UseParallelGC spec.jbb.JBBmain -propfile SPECjbb.props -id $x
Below you can find the final score that specjbb2005 reports, which is an average of the last four runs.


Specjbb 4 instances

The impact of binding each instance to a specific node is less dramatic as what we have seen before, but still, the Opteron scored only 42254 without the use of numactl. The Opterons are in a neck-and-neck race with the dual core Intel's. As this kind of transactional Java application depends quite a bit on the memory interface, the slightly lower integer power of the Opteron is hidden by its faster access to the memory. Nevertheless, it is the Xeon E5345 which wins this race as the use of 4 instances allows the Xeon 53xx to scale well.



MySQL Configuration

To get an idea what a typical SLES 10 user will experience, we simply used the MySQL version which is supported by the latest SLES 10 SP1, i.e. MySQL 5.0.26. Unfortunately, this means that we see the typically bad scaling. Therefore we focus on the single CPU, dual core results. It doesn't make sense to use a quad Xeon 5345 here: more than two CPUs give negative scaling as we have reported before. The 2.33GHz Xeon 5345 scored between 700 and 750 queries per second as a result of this. For those who are surprised by this: notice that Intel's own benchmarks use four parallel runs of the Sysbench MySQL benchmark to get higher scores out of MySQL. All testing was done with InnoDB as our storage engine in MySQL 5.0.26. Here is our MySQL configuration:

MySQL Configuration
default-storage-engine InnoDB
skip-external-locking  
skip-locking  
key_buffer 256M
.
table_cache 64
max_allowed_packet 1M
thread_stack 128K
.
sort_buffer_size 2M
read_buffer_size 2M
innodb_buffer_pool_size 1G
.
thread_concurrency 16
innodb_thread_concurrency 16
innodb_additional_mem_pool_size 8MB
read_rnd_buffer_size 8MB
thread_cache 64
max_heap_table 256MB
tmp_table 128MB
.
innodb_log_file_size 250MB
innodb_table_locks 0
innodb_flush_log_at_trx_commit 0
max_user_connections 2000
max_connections 2000

The "query cache" was off, as we wanted to test worst case performance. Our test database is still the same 1GB database. The workload consists of more than 90% selects, mostly a "read intensive" workload. All numbers are expressed in queries per second (Y-axis), and the X-axis shows the number of concurrent accesses.

MySQL results



The Xeon 5160 keeps a 10-14% lead on the Opteron 2224. Our time was limited, and you'll see other versions of MySQL pop up in later reviews. The first results seem to indicate that the difference between the Opteron and Xeon gets smaller.



Render Servers

To get a better idea on how the different server platforms compare, we did some rendering too. Most of our tests (MySQL, DB2, and SPECjbb2005) are very integer intensive; render tests are floating point intensive. We start with a simple Cinebench 9.5 benchmark (on Windows 2003 32-bit), which is based on Maxon's Cinema 4D rendering engine.

Cinebench 9.5 Rendering

Cinebench runs almost perfectly within the caches of all CPUs. The Opteron 2224 does pretty well thanks to its strong x87 FPU. As we have noted before, the Opteron scales slightly better than its Intel Competitor. This gets even clearer when you look at the performance of one core. One Opteron 3.2GHz core scores 472 and gets 3.24 times faster when you quadruple the number of cores. One Intel core at 3GHz scores 499 and gets 3.05 times faster with 4 cores. We admit we are nitpicking, but it is interesting nevertheless.

3DS Max 9

Cinebench is popular because it is an easy benchmark, 3DS Max is a very popular application. We tested with 3DS Max version 9, which has been improved to work better with multi-core systems. We used the "architecture" scene, which has been a favorite benchmarking scene for years. All tests were done with 3ds max's default scanline renderer, SSE enabled, and we rendered at HD 720p (1280x720) resolution. We measured the time it takes to render 10 frames from 20 to 29. All results are reported in seconds, lower being better.



3DS Max 9 Architecture 1280x720
As this test has been our standard test for a while, we added a few results from previous tests. As you increase the resolution, multi-core scaling gets better. The reason is that there's a certain amount of overhead required to split a scene into multiple parts. At lower resolutions, the splitting process ends up taking a significant amount of time, so the extra cores are not fully able to stretch their legs.



Software Rendering

Some of you might remember the "Kribi" engine, an ultra-powerful real-time software rendering 3D engine. It seems like madness to invest time in a software 3D engine now that the GeForce 8800 has 128 small FPUs working at 1.35GHz, but software rendering is far from dead. The new Intel Core architecture can perform up to four 64-bit FP (3 sustained) instructions per clock cycle, and now we have cheap quad cores at 2.4GHz. That is a lot of FP power too, if you carefully optimize for it. That is exactly what the people of zVisuel in Lausanne, Switzerland have been doing. There are quite a few advantages for using software rendering. For example, the end result looks the same on every PC and it runs (although potentially faster or slower depending on hardware) on every PC. That is a big advantage for companies where many people use portables.

If you are still not convinced that real-time software rendering can offer great results, take a look at this movie.


An example what the zVisuel Kribi 3D Engine can create

Eric Bron provided us with a benchmark which is based on real world use by zVisuel's clients. The first benchmark does not use antialiasing.

zVisuel Watch Assembly (no AA)

As we explained here, the new Core architecture has theoretically twice the SSE2 power of the Athlon X2. Extremely carefully optimized SSE2 applications such as the 3D engine of zVisuel show that this leads to a 70% IPC advantage in practice. This shows very nicely why AMD needs the new K10 family: in this case the Athlon 64 architecture is starting to show its age.

We performed the same benchmark, but now antialiasing was applied.

zVisuel Watch Assembly (AA)

AA clearly makes the application more memory intensive. The two quad core Xeons are only 38% faster than one CPU, while they were 50% faster in the previous benchmark. This helps the Opteron to make the gap a little smaller: the Xeon 3GHz is 48% faster clock for clock, instead of 70%.



WinRAR 3.62

Many servers and workstations have to compress a lot of data. WinRAR is one of the most popular compression applications and now features a multi-threaded benchmark.

WinRAR 3.62
  Multi Single
Dual Xeon E5345 2.33 1501 522
Dual Opteron 2224 SE 1259 529
Dual Xeon 5160 3.0 1236 549
Dual Opteron 2222 1219 471
Dual Opteron 8218HE 2.6 1172 426
Xeon E5345 2.33 1169 522
Xeon 5160 3 923 549

Compression algorithms work on large streams of data, so fast memory access is important. The WinRAR Benchmark has a rather high margin of error, but it is still is interesting to look at the scaling numbers.

WinRAR Scaling
  Single Dual Quad Octal
Xeon 5345 2.3 GHz 522 901 1169 1501
Xeon 5160 3 GHz 549 923 1236 N/A
Opteron 2224 SE 3.2 GHz 529 957 1259 N/A

The picture gets clearer as you compare the gains from extra cores in percentages.

WinRAR Scaling - Percentages
  Dual vs. Single Quad vs. Dual Octal vs. Quad
Xeon 5345 2.3 GHz 73% 30% 28%
Xeon 5160 3 GHz 68% 34% N/A
Opteron 2224 SE 3.2 GHz 81% 32% N/A

The algorithm does scale somewhat but it is another example of how hard it is to scale well as more cores get added. NUMA architectures like AMD's Opteron have the potential to extract more memory performance, but there's still the problem of properly coding an algorithm to work with NUMA.



Power

Our AMD system had a different but similar power supply as our Intel system. The fan setup was also different, but the peak power consumption of the fans of both systems was very close. If you would like a completely apples-to-apples comparison (or at least as close as we can get), we'll refer to our previous performance/watt measurements which have been done with almost identical systems. Take these Intel versus AMD figures with a grain of salt, but the comparison between the different AMD CPUs is still very interesting.

Power Usage
  SPECjbb Cinebench (Load) Idle PowerNow! Idle Load vs. Idle Savings Idle PowerNow/EIST Savings
Dual Xeon 5160 3.0 376 354 248 244 110 4
Dual Xeon E5345 2.33 374 331 248 244 87 4
Dual Opteron 2224 SE 380 409 310 159 250 151
Dual Opteron 2222 330 342 259 158 184 101
Dual Opteron 8218HE 2.6 GHz 279 299 225 155 144 70

To be fair, we are using somewhat early Intel samples; the current Intel CPUs will probably consume a little less power due to process maturity and other minor tweaks. Still, it is very clear that AMD's CPUs are able to save a lot more when they are not stressed. What kind of power savings may you expect when you buy a lower power Opteron?

Power Savings
  SPECjbb Cinebench Idle PowerNow! Idle
Normal 95W vs. SE 119W 50 67 51 1
HE 68W vs. Normal 95 51 43 34 3

The above table makes a few interesting points
  • It is quite impressive that the AMD Opteron 2222 is now able to reach 3GHz at 95W. This means that, compared to just 2-3 months ago, you save up to 67W per server and get the same performance (2222 versus the older 2222SE).
  • AMD's PowerNow! Technology is very efficient: it saves you between 150W and 250W depending on system load and configuration. 250W seems impossible, but the three fans of our Tyan TA26 had to run at much higher speeds to cool the CPUs at 3.2GHz than at 1GHz.
  • The gains of Intel's EIST are very limited: the CPUs only throttle back to 2GHz.
The fact that the Opterons consume less power when running SPECjbb2005 versus running Cinebench 9.5 is quite interesting, as the Intel systems actually consume a bit more. Since SPECJbb2005 is a rather memory intensive benchmark, the reason for this difference is quickly found: the extra power consumption of the FB-DIMMs negates the fact that SPECjbb2005 is less CPU intensive than Cinebench. Or if you look at it another way: Intel's system consumes a bit less when running Cinebench as the FB-DIMMs have very little to do. Combined with the latency penalty we have measured, we wouldn't be surprised if Intel relegates FB-DIMMs to a small high-end niche market in the future.



Conclusion

The best news for AMD is that the newly launched 8224SE and 8222 will outperform the current Xeon MP by a significant margin. However, AMD will have very little time to enjoy that victory as the new Xeon MP based on the Core architecture is going to launch very soon. That leads us to the dual socket space. Here's a recap of the various benchmarks that we have run.

Performance Comparison
General applications Opteron 3.2GHz vs. DC Xeon 3GHz Opteron 3GHz vs. DC Xeon 3GHz Opteron 3GHz vs. QC Xeon 2.33GHz
General applications
WinRAR 3.62 8% 5% -17%
3D Applications
3DS Max 9 -11% -16% -34%
Cinebench 9 0% -7% -14%
zVisuel 3D Kribi Engine -31% -32% -40%
Server applications
SPECjbb 0% -4% -30%
MySQL -12% N/A N/A

Intel has a clear lead in the rendering market. If you are rendering complex high resolutions images, the quad core Xeon is clearly the best choice. If you are rendering normal resolution pictures, quad core might not really pay off, but the dual core Xeon will still be a bit faster than the Opteron. Both Cinebench and 3ds max have been "mildly" optimized for SSE2, but if you use a carefully SSE2 optimized application the Opteron's lack of SSE power is painfully obvious: the Intel CPUs are up to 70% faster in SSE-heavy code. That is one specific area that Barcelona should remedy in the coming months. If you are in for a new server for your FP intensive applications, it might be interesting to wait a bit and see how Harpertown compares to Barcelona; if you can't wait, right now Intel is the first choice in this market.

When it comes to the purely business processing, such as database processing and java applications, we feel that the answer cannot be given so quickly. If your application is usually under high load, the Intel CPUs are clearly better. They use slightly less power than the Opteron SE and run faster. Especially if your application is based on databases such as DB2, Oracle, and MS SQL server, it is clear that the quad core Xeon still rules. The quad core Xeon may not be a "native quad core" design, but it was surely a brilliant move by Intel. Until AMD's own quad core comes out, this market will be out of reach of AMD.

However, some servers are only stressed during a short period of time or are based on mediocre scaling software like MySQL. In that case, the Opteron 2222 makes a lot of sense. The cores will run at a low and cool 1GHz most of the time and consume very little power. Our Tyan Server saved no less than 184W during the "calm periods" and that is a lot of power. That amount of power has to be multiplied by +/- 1.5 (adding your air conditioning's energy consumption) to calculate the total energy consumption savings, making power savings even more significant. During periods of high load, the Opteron 2222 still offers decent performance at a slightly lower price than the dual core 3.0GHz Xeons.

The most interesting thing about AMD's latest launch is probably that AMD has now a 3GHz Opteron that consumes very little when running at low load while it keeps the power consumption reasonable at full load. The Opteron 2224 SE will only interest the people who have already invested in clusters of cheap socket F servers and who are looking to squeeze more performance out of them. If you haven't made that investment already, there's nothing really new or surprising with the latest launch, so you might be best off waiting a bit longer to see what the future holds.

Log in

Don't have an account? Sign up now