1. Why is the TDP of the 65W ACP Magny Cours the question mark? And are you sure the TDP of the 80W ACP ones 115W?
2. The Intel systems have only 24GB ram against the 32GB ram on the 2S magny cours. That's why the 100GB database test favors the Magny cours by a large margin.
AMD told us the TDP values of the Magny-Cours at 80 and 105W ACP. The TDP values of the Lower power versions were not disclosed yet.
And as we disclosed on the benchmark config page, none of the benches uses more than 20 GB. The vAPus mark I uses about 19 GB. The SQL Server uses much less. While the SQL server test has to scan through the complete index, it does access the complete 100 GB data. There absolutely no advantage for the Opterons there. We checked.
The fact that we spec the servers like that is a direct consequence of their memory channels (3 and 4). There is not much we can do about that.
How about about 4P performance? It's cheap now and it's AMD whole selling point. I guess you can get a 4P 48-core 128GB system for not that much. How would that compare to a say 2P Nehalem 12-core 92GB? Wouldn't they cost about the same? Will it still be competitive against 8-core 2P Nehalem-EX? And how about the 4P (like 6-core versions) Nehalem-EX? How about the 8-core versions of 6100 series Opterons?
Thanks, I don't condemn it as advertising as this is a new platform so it's interesting and hard to get prices for complete systems yet. Basically 4P 8-core 6100-series opterons with 128GB DDR3 ECC REG cost as much as 2P six-core Xeon (Westmere EP) with 96GB DDR3 ECC REG. Mainly because you can use cheaper 4GB sticks and still get 128GB. And partly because there's no longer any markup for above >2P parts. I guess it accounts for something. Yeah, 6128 chip virtually don't cost nothing for being 4P compatible. Guess it helps AMD for a lot of workload scenarios. And since you can get 4P in 1U it's really nothing that speaks against it. Will be interesting to see what the Nehalem-EX can do though.
We are working on it. Expect an update with new SKUs this month. I would say next week, but I would like to take some time to do some in depth analysis.
Anand,
I want to ask why are you biased against AMD? You should base your tests based on price. AMD is selling their 12 core for the price of an Intel 6 core. Compare apples to apples! Do a 12 core vs 6 core comparison and see who wins. Otherwise, you are doing a disservice.
Sorry, but you do realize that the majority of these 6-core SKUs will be sold to customers where the CPU represents a small fraction of the system cost?
We're talking $40,000 to $60,000 for a chassis and four fully loaded blades. A couple hundred dollars difference for the processor means nothing. What's important is the performance and the RAS features.
Good post. Indeed, many enthusiast don't fully understand how it works in the IT world. Some parts of the market are very price sensitive and will look at a few hundreds of dollars more (like HPC, rendering, webhosting), as the price per server is low. A large part of the market won't care at all. If you are paying $30K for a software license, you are not going to notice a few hundred dollars on the CPUs.
If that's true, then why did you benchmark the slower parts at all? If it only matters in HPC, then why test it in database? Why would the IDM's spend time and money binning CPU's?
Responding with "Product differentiation and IDM/OEM price spreads" simply means that it *does* matter from a price perspetive.
Because those of us with applications running on older machines need comparisons against older systems in order to determine whether it is worth migrating existing applications to a new platform. Personally, I'd like to see more comparisons to even older kit in the 2-3 year range that more people will be upgrading from.
Some programs were licensed by physical processor chips, others were licensed by logical cores. Is this still correct, and if so, could you explain in based on the software used for benchmarking?
I have to check, but I doubt that besides a very exotic operation anything is going to scale beyond 4-8 cores. These CPUs are not made for Photoshop IMHO.
I would recommend trying to apply some advanced filters on a 200+ GB file.
Especially with the new higher megapixel cameras I could easilly see how some proffesionals would fork up the cash if this reduces the time they have to spend in front of the screen waiting on things to process.
Great review! Thanks for the review, when will you guys be reviewing the AMD Phenom II X6 for us mere mortals? I wonder how the Phenom II X6 will stack up against the Core i7 920/930.
"Microprocessors take approximately five years to go from concept to product and there is no way Intel can add SSE5 to their Nehalem product and AMD can’t add SSE4 to their first-generation 45nm CPU “Shanghai” or their second-generation 45nm “Bulldozer” CPU even if they wanted to. AMD has stated that they will implement SSE4 following the introduction of SSE5 but declined to give a timeline for when this will happen."
One of the best optimized and multi threaded applications out there is the open source video encoder x264.
Would it be possible to test how well 2 x 8 and 2x12 amd configurations work at encoding 1080p video at some very high quality settings?
A workstation with 24 cores from AMD would cost almost as much as a single socket 6 cores system from Intel so it would be interesting to see if the increase in frequency and the additional SSE instructions would be more advantage than the number of cores.
I wonder if the difference between the Windows and Linux test results is related to the recentish changes in the scheduler? From what I understand the introduction of the CFS in 2.6.23 was supposed to be really good for large numbers of cores, and I'm given to understand that before that the Linux scheduler worked similarly to the recent Windows one. It would be interesting to try running that benchmark with a 2.6.22 kernel or one with the old O(1) patched in.
Or it could just be that Linux tends to be more tuned for throughput whereas Windows tends to be more tuned for low latency. Or both.
In any event, the place I work for is a Linux shop and our workload is probably most similar to Blender, so we're probably going to continue to buy AMD.
"Performance testing on the Egenera BladeFrame system has demonstrated that the platform is capable of delivering high throughput from multiple servers using Oracle Real Application Clusters (RAC) database software. Analysis using Oracle’s Swingbench demonstration tool and the Calling Circle schema has shown very high transactions-per-minute performance from single-node implementations with dual-core, 4-socket SMP servers based on Intel and AMD architectures running a 64-bit-extension Linux operating system. Furthermore, results demonstrated 92 percent scalability on either server type up to at least 10 servers. The BladeFrame’s architecture naturally provides a host of benefits over other platforms in terms of manageability, server consolidation and high availability for Oracle RAC."
It could also be that Linux has a NUMA-aware scheduler, so it'd try to keep data stored in ram which is connected to the core that's running the thread which needs to access the data. I probably didn't explain that too well, but it'd cut down on memory latency because it would minimize going out over the HT links to fetch data. I doubt that Windows does this, given that Intel hasn't had NUMA systems for very long yet.
I sort of like to see more Linux benchmarks, since that's really all I'd ever consider running on data center-class hardware like this, and since apparently Linux performance has very little to do with Windows performance, based on that one test.
I like the review and enjoyed reading it. I can't help but feel the benchmarks are less a comparison of CPU's and more a study on how well the apps can be threaded as well as the implementation of that threading -- higher clocked cpus will be better for serial code and more cores will win for apps that are well threaded. In scientific number crunching (the code I write ), more cores always wins (AMD). We do use Fluent too, so thanks for including those benchamarks!!
It should be noted that newer nehelam based processors have specific AES encryption instructions. The benchmark where the xeon blows everything out of the water is likely utilizing that instruction set (though, AFAIK not many real-world applications do)
I read that Intel is expected to launch the 8-core Nehalem EX today. It'll be interesting to compare it against the 12-core Magny Cours. Both are on a 45nm process.
Depending on your definition, the nortbridge is in the CPU. AMD uses "northbride" in its own slides to refer to the part where the memory controller etc. resides.
Hi Johan, For last few days I did several tests with Swingbench CC with similar database configuration but I achieved a bit different results, I’m just wondering what exactly settings you put for CC test itself. I mean about when you generate schema and data for that test? Thanks for answer.
Can't figure out if hyperthreading were enabled on Intels. Particularly interested in virtualization benchmark with hyperthreading both enabled and disabled. Also of interest would be an Office benchmark with a bunch of small VMs (1.5 to 2GB) to simulate VDI configuration.
Hyperthreading is always on. But we will follow up on that. A VDI based hypervisor tests is however not immediately on the horizon. The people of the VRC project might do that though. Google on the VRC project.
I can report that one of my customers, performing intensive image processing, found that DISABLING hyper-threading on a Nehalem-based workstation, actually IMPROVED performance considerably.
It seems that certain applications don't like hyper-threading, while others do. I always recommend that my customers perform sensitivity analyses on their computing tasks with HT on and off, and then use whichever is best.
How is it possible that Intel's Xeon X5670 rig returns 19k+ for a score while AMD's magny-cours returns only 2k+?? I only question the results of this benchmark chart because Intel's Xeon X5570 rig returns only around 1k. How can a X5670 be 19x faster than a X5570?? And I doubt the same is true for the magny-cours by being just 10.5% of what the X5670 can do.
(is there an extra '0' by accident in there?)
tracerburnout proud supporter of AMD, with a few Intel rigs for Linux only
No, it is just that Sisoft uses the new AES instructions of West-mere. It is a forward looking benchmark which tests only a small part of a larger website code base. So that 19x faster will probably result in 10 to 20% of the complete website being 19x faster. So the real performance impact will be a lot slower. It is interesting though to see how much faster these dedicated SIMD instructions are on these kinds of workloads.
If you guys need help with setting up or running the Fluent/LS-DYNA benchmarks let me know.
I see that you don't really spend as much time writing or tweaking it as you do with some of the other programs, and that to me is a little concerning only because I don't think that it is showing the true potential of these processors if you run it straight out-of-the-box (especially with Fluent).
Fluent tends to have a LOT of iterations, but it also tends to short-stroke the CPU (i.e. the time required to complete all of the calculations necessary is less than 1 second and therefore; doesn't make full use of the computational ability.)
Also, the parallelization method (MPICH2 vs. HP MPI) makes a difference in the results.
You want to make sure that the CPUs are fully loaded for a period of time such that at each iteration, there should be a noticable dwell time AT 100% CPU load. Otherwise, it won't really demonstrate the computational ability.
With LS-DYNA, it also makes a difference whether it's SMP parallelization or MPP parallelization as well.
The most baffling part is how linux could engage 12-CPUs much better than windows. I am obviously curious about the OS platform for other tests.. Similary MS SQL was able to scale well on multi-cores... In this context, I am not sure how we can look at the performance numbers... A badly scaling app or OS could show the 12-core one in bad light.
I have followed your articles from the early day's at Ace's and have a good respect for the technical accuracy of your articles.
It appears that the X5570 scaling between 4 and 8 cores has very little gain in the Oracle Calling Circle benchmark. Furthermore, the 24 cores of MC at 2.2Ghz are way behind. Westmere appears to do quite well, but really should not be able to best 8 cores in the X5570 with all else being equal.
I have heard some state that the benchmark is thread bound to a low number of threads (don't know if I am buying this), but surely something fishy is going on here.
It appears that there is either a real world application limit to core scaling on certain types of Oracle database applications (if there are, could you please explain what features an app has when these limits appear), or that the benchmark is flawed in some way.
I have a good amount of experience in Oracle applications and have usually found that more cores and more memory make Oracle happy. My experience seems at odds with your latest benchmarks.
I am starting to suspect the same. I am going to dissect the benchmark soon to see what is up. It is not disk related, or at least that surely it is not our biggest problem. Our benchmark might not be far from the truth though, I think Oracle really likes the big L3-cache of the Westmere CPU.
If you have other ideas, mail at johanATthiswebsiteP
You wrote Test-Setup: Xeon Server 1: ASUS RS700-E6/RS4 barebone Dual Intel Xeon "Gainestown" X5570 2.93GHz, Dual Intel Xeon “Westmere” X5670 2.93 GHz 6x4GB (24GB) ECC Registered DDR3-1333
"Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570."
That is not exactly the reason, I think. The reason ist you populated the second memory-bank in both setups. Intel specification: Westmere-1333MHZ-CPUs run with 1333 MHZ with second bank populated while Nehalem-1333MHZ-CPUs run with 1066 MHZ with second bank populated
That could be updated.
Compare tech docs on Intel site: datasheet Xeon 5500 Part 2 and datasheet Xeon 5600 Part 2
3ds max crashes because of the mental ray renderer. remove the plugin from loading and max will start up. its due to mental ray cannot see more than 16 threads (physical or virtual via hyper-threading). please do test the max rendering performance. thanks
In the final words it states "We estimate that the new Opteron 6174 is about 20% slower than the Xeon 5670 in virtualized servers with very high VM counts. " But in the virtualization section I can't seem to figure out what brought you to that conclusion. The VMmark scores for the Cisco X5680 system was 35.83@26 tiles. You have the VMmark for the 6176SE at 31 which is dead on to the HP DL385 G7 which got 30.96@22 tiles. I see the X5680 15% better at best. And the Cisco x5680 system had 192GB of memory to the HP 6176SE system had 128GB. What am I missing here?
I appreciate AMD's lower CPU cost but on the other hand, Oracle will license me their RDBMS per core and whether it's an Intel 56xx or AMD 61xx, I am still paying a relation of .5 license per core.
So in the end, I would pay 6 cores for AMD and 3 cores for Intel. The price per core is much higher than the hardware price difference.
Any thought or solutions on this issue would be appreciated...
Would it be possible to get the xml parameter files you have used in this test ? We are currently in a trial phase at my company to see how the current crop of intel boxes (dual Xeon X5460 procs) hold up against a new z10 system. Did you run the swingbench on the server itself or did you use a dedicated client to test ?
In 1991 I had an AMD 386-40 that kicked the snot out of Intel pride and joy 486DX2-66. Benchmarks were 25%+ across the board over Intel. Then Intel lied to the market and started passing off cull processors as viable options calling the 25Mhz and 50Mhz processors, when they were actually processors that failed the benchmarks for 33Mhz and 66Mhz respectively.
In 1998 When Win98 Beta was released I was building Servers and workstations at a Tech-company and Again the AMD was kicking the snot out of Intel. Load times on new system builds, boot time and performance. The Intel chips could not hack it. Then when MS release their actual market version of Win98...all of a sudden you could not even use an AMD processor to run it. You had to wait 2 weeks for MS come up with a "AMD Patch" to run on an AMD system.
One think I have seen over 20 years in the industry is that Intel will, Lie, Cheat, Steal and Bribe to try and get the upper hand on AMD. Always have....Always Will!!
i've been an amd fan for as long as i can remember. started fixing computers in 1979. used to fix mai basic four minis in the mid-80s that were built on amd bit-slice bipolar cpus on boards that cost 15,000$.
just got 2 opteron 6172 cpus from ebay for what i thought was peanuts (450 $ each) only to discover upon delivery that both had hairline cracks at a 45 degree angle on one corner of the contact pad surface. looking at their web site i could figure i was out on limb and they would laugh in my face if asked for warranty support on these not-boxed cpus. i know some dumb ass managed to break those cpu corners, and tried to shove the crap to an ebay sucker, but the problem lies deeper, mostly in the g34 socket physical design itself of these otherwise beautiful electronic products. the edge of the metal cover doesn't reach the edge of the fiber board, leaving some unsupported area to be broken by dumb asses mimicking the old days when they could put a 40-pin dip cpu upside-down in its socket. so i'm freshly reviewing my belief system about amd while i figure a solution for this crap-hits-the-fan situation. wish i could have told amd engineers to cover theses last millimeters at the bleeding edge. they might say this and that about warranty, i still hold them responsible for this preventable disaster.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
58 Comments
Back to Article
564265425722557 - Monday, March 29, 2010 - link
1. Why is the TDP of the 65W ACP Magny Cours the question mark? And are you sure the TDP of the 80W ACP ones 115W?2. The Intel systems have only 24GB ram against the 32GB ram on the 2S magny cours. That's why the 100GB database test favors the Magny cours by a large margin.
JohanAnandtech - Monday, March 29, 2010 - link
AMD told us the TDP values of the Magny-Cours at 80 and 105W ACP. The TDP values of the Lower power versions were not disclosed yet.And as we disclosed on the benchmark config page, none of the benches uses more than 20 GB. The vAPus mark I uses about 19 GB. The SQL Server uses much less. While the SQL server test has to scan through the complete index, it does access the complete 100 GB data. There absolutely no advantage for the Opterons there. We checked.
The fact that we spec the servers like that is a direct consequence of their memory channels (3 and 4). There is not much we can do about that.
Penti - Tuesday, March 30, 2010 - link
How about about 4P performance? It's cheap now and it's AMD whole selling point. I guess you can get a 4P 48-core 128GB system for not that much. How would that compare to a say 2P Nehalem 12-core 92GB? Wouldn't they cost about the same? Will it still be competitive against 8-core 2P Nehalem-EX? And how about the 4P (like 6-core versions) Nehalem-EX? How about the 8-core versions of 6100 series Opterons?elnexus - Wednesday, March 31, 2010 - link
In answer to cost:Compare our 2P Xeon 5600-series Workstation :http://elnexus.com/products.aspx?line_id=15514
with our 4P Opteron 6100-series Workstation: http://elnexus.com/products.aspx?line_id=15635
(I hope this isn't condemned as advertising, since it is an attempt to answer a question about price vs performance.)
Note how low priced the 6128 chip is (the default chip included in the base price).
AMD, I think are running away from Intel if you factor in the price...
Penti - Wednesday, March 31, 2010 - link
Thanks, I don't condemn it as advertising as this is a new platform so it's interesting and hard to get prices for complete systems yet. Basically 4P 8-core 6100-series opterons with 128GB DDR3 ECC REG cost as much as 2P six-core Xeon (Westmere EP) with 96GB DDR3 ECC REG. Mainly because you can use cheaper 4GB sticks and still get 128GB. And partly because there's no longer any markup for above >2P parts. I guess it accounts for something. Yeah, 6128 chip virtually don't cost nothing for being 4P compatible. Guess it helps AMD for a lot of workload scenarios. And since you can get 4P in 1U it's really nothing that speaks against it. Will be interesting to see what the Nehalem-EX can do though.TitanusComp - Wednesday, April 6, 2011 - link
You can really get a good idea by comparing this two products:48 Cores:
http://www.titanuscomputers.com/A400-AMD-Workstati...
24 Cores (Quad SLi Capable)
http://www.titanuscomputers.com/X450-Intel-High-Pe...
Now, things to consider, do you need CPU or GPU power?
duploxxx - Monday, March 29, 2010 - link
To make the whole benchmark complete I think you should ask some AMD Opteron 6136 from AMD to get a full review.duploxxx - Monday, March 29, 2010 - link
and add the 56xx 4core counterpart off courseJohanAnandtech - Tuesday, March 30, 2010 - link
We are working on it. Expect an update with new SKUs this month. I would say next week, but I would like to take some time to do some in depth analysis.Hacp - Monday, March 29, 2010 - link
Anand,I want to ask why are you biased against AMD? You should base your tests based on price. AMD is selling their 12 core for the price of an Intel 6 core. Compare apples to apples! Do a 12 core vs 6 core comparison and see who wins. Otherwise, you are doing a disservice.
Accord99 - Monday, March 29, 2010 - link
The X5670 is 6-core.JackPack - Tuesday, March 30, 2010 - link
LOL. Based on price?Sorry, but you do realize that the majority of these 6-core SKUs will be sold to customers where the CPU represents a small fraction of the system cost?
We're talking $40,000 to $60,000 for a chassis and four fully loaded blades. A couple hundred dollars difference for the processor means nothing. What's important is the performance and the RAS features.
JohanAnandtech - Tuesday, March 30, 2010 - link
Good post. Indeed, many enthusiast don't fully understand how it works in the IT world. Some parts of the market are very price sensitive and will look at a few hundreds of dollars more (like HPC, rendering, webhosting), as the price per server is low. A large part of the market won't care at all. If you are paying $30K for a software license, you are not going to notice a few hundred dollars on the CPUs.Sahrin - Tuesday, March 30, 2010 - link
If that's true, then why did you benchmark the slower parts at all? If it only matters in HPC, then why test it in database? Why would the IDM's spend time and money binning CPU's?Responding with "Product differentiation and IDM/OEM price spreads" simply means that it *does* matter from a price perspetive.
rbbot - Saturday, July 10, 2010 - link
Because those of us with applications running on older machines need comparisons against older systems in order to determine whether it is worth migrating existing applications to a new platform. Personally, I'd like to see more comparisons to even older kit in the 2-3 year range that more people will be upgrading from.Calin - Monday, March 29, 2010 - link
Some programs were licensed by physical processor chips, others were licensed by logical cores. Is this still correct, and if so, could you explain in based on the software used for benchmarking?AmdInside - Monday, March 29, 2010 - link
Can we get any Photoshop benchmarks?JohanAnandtech - Monday, March 29, 2010 - link
I have to check, but I doubt that besides a very exotic operation anything is going to scale beyond 4-8 cores. These CPUs are not made for Photoshop IMHO.AssBall - Tuesday, March 30, 2010 - link
Not sure why you would be running photoshop on a high end server.Nockeln - Tuesday, March 30, 2010 - link
I would recommend trying to apply some advanced filters on a 200+ GB file.Especially with the new higher megapixel cameras I could easilly see how some proffesionals would fork up the cash if this reduces the time they have to spend in front of the screen waiting on things to process.
wolfman3k5 - Monday, March 29, 2010 - link
Great review! Thanks for the review, when will you guys be reviewing the AMD Phenom II X6 for us mere mortals? I wonder how the Phenom II X6 will stack up against the Core i7 920/930.Keep up the good work!
ash9 - Tuesday, March 30, 2010 - link
Since SSE4.1,SSE4.2 are not in AMD's , its Andand's way of getting an easy benchmark win, seeing some of these benchmark test probably use them-http://blogs.zdnet.com/Ou/?p=719
August 31st, 2007
SSE extension wars heat up between Intel and AMD
"Microprocessors take approximately five years to go from concept to product and there is no way Intel can add SSE5 to their Nehalem product and AMD can’t add SSE4 to their first-generation 45nm CPU “Shanghai” or their second-generation 45nm “Bulldozer” CPU even if they wanted to. AMD has stated that they will implement SSE4 following the introduction of SSE5 but declined to give a timeline for when this will happen."
asH
mariush - Tuesday, March 30, 2010 - link
One of the best optimized and multi threaded applications out there is the open source video encoder x264.Would it be possible to test how well 2 x 8 and 2x12 amd configurations work at encoding 1080p video at some very high quality settings?
A workstation with 24 cores from AMD would cost almost as much as a single socket 6 cores system from Intel so it would be interesting to see if the increase in frequency and the additional SSE instructions would be more advantage than the number of cores.
Aclough - Tuesday, March 30, 2010 - link
I wonder if the difference between the Windows and Linux test results is related to the recentish changes in the scheduler? From what I understand the introduction of the CFS in 2.6.23 was supposed to be really good for large numbers of cores, and I'm given to understand that before that the Linux scheduler worked similarly to the recent Windows one. It would be interesting to try running that benchmark with a 2.6.22 kernel or one with the old O(1) patched in.Or it could just be that Linux tends to be more tuned for throughput whereas Windows tends to be more tuned for low latency. Or both.
Aclough - Tuesday, March 30, 2010 - link
In any event, the place I work for is a Linux shop and our workload is probably most similar to Blender, so we're probably going to continue to buy AMD.ash9 - Tuesday, March 30, 2010 - link
http://www.egenera.com/pdf/oracle_benchmarks.pdf"Performance testing on the Egenera BladeFrame system has demonstrated that the platform
is capable of delivering high throughput from multiple servers using Oracle Real Application
Clusters (RAC) database software. Analysis using Oracle’s Swingbench demonstration tool
and the Calling Circle schema has shown very high transactions-per-minute performance
from single-node implementations with dual-core, 4-socket SMP servers based on Intel and
AMD architectures running a 64-bit-extension Linux operating system. Furthermore, results
demonstrated 92 percent scalability on either server type up to at least 10 servers.
The BladeFrame’s architecture naturally provides a host of benefits over other platforms
in terms of manageability, server consolidation and high availability for Oracle RAC."
nexox - Tuesday, March 30, 2010 - link
It could also be that Linux has a NUMA-aware scheduler, so it'd try to keep data stored in ram which is connected to the core that's running the thread which needs to access the data. I probably didn't explain that too well, but it'd cut down on memory latency because it would minimize going out over the HT links to fetch data. I doubt that Windows does this, given that Intel hasn't had NUMA systems for very long yet.I sort of like to see more Linux benchmarks, since that's really all I'd ever consider running on data center-class hardware like this, and since apparently Linux performance has very little to do with Windows performance, based on that one test.
yasbane - Wednesday, May 19, 2010 - link
Agreed. I do find it disappointing that they put so few benchmarks for Linux for servers, and so many for windows.-C
jbsturgeon - Tuesday, March 30, 2010 - link
I like the review and enjoyed reading it. I can't help but feel the benchmarks are less a comparison of CPU's and more a study on how well the apps can be threaded as well as the implementation of that threading -- higher clocked cpus will be better for serial code and more cores will win for apps that are well threaded. In scientific number crunching (the code I write ), more cores always wins (AMD). We do use Fluent too, so thanks for including those benchamarks!!jbsturgeon - Tuesday, March 30, 2010 - link
Obviously that rule can be altered by a killer memory bus :-).Cogman - Tuesday, March 30, 2010 - link
It should be noted that newer nehelam based processors have specific AES encryption instructions. The benchmark where the xeon blows everything out of the water is likely utilizing that instruction set (though, AFAIK not many real-world applications do)Hector1 - Tuesday, March 30, 2010 - link
I read that Intel is expected to launch the 8-core Nehalem EX today. It'll be interesting to compare it against the 12-core Magny Cours. Both are on a 45nm process.spoman - Tuesday, March 30, 2010 - link
You stated "... that kind of bandwidth is not attainable, not even in theory because the next link in the chain, the Northbridge ...".How does the Northbridge affect memory BW if the memory is connected directly to the processor?
JohanAnandtech - Wednesday, March 31, 2010 - link
Depending on your definition, the nortbridge is in the CPU. AMD uses "northbride" in its own slides to refer to the part where the memory controller etc. resides.Pari_Rajaram - Tuesday, March 30, 2010 - link
Why don't you add STREAM and LINPACK to your benchmark suites? These are very important benchmarks for HPC.JohanAnandtech - Wednesday, March 31, 2010 - link
Stream... in the review.piooreq - Wednesday, March 31, 2010 - link
Hi Johan,For last few days I did several tests with Swingbench CC with similar database configuration but I achieved a bit different results, I’m just wondering what exactly settings you put for CC test itself. I mean about when you generate schema and data for that test? Thanks for answer.
JohanAnandtech - Thursday, April 1, 2010 - link
Your question is not completely clear to me. What is the info you would like? You can e-mail if you like at johanATthiswebsitePointcomzarjad - Wednesday, March 31, 2010 - link
Can't figure out if hyperthreading were enabled on Intels. Particularly interested in virtualization benchmark with hyperthreading both enabled and disabled. Also of interest would be an Office benchmark with a bunch of small VMs (1.5 to 2GB) to simulate VDI configuration.JohanAnandtech - Thursday, April 1, 2010 - link
Hyperthreading is always on. But we will follow up on that. A VDI based hypervisor tests is however not immediately on the horizon. The people of the VRC project might do that though. Google on the VRC project.zarjad - Friday, April 2, 2010 - link
I understand that HT can be disabled in BIOS and that some benchmarks don't like HT.elnexus - Wednesday, April 21, 2010 - link
I can report that one of my customers, performing intensive image processing, found that DISABLING hyper-threading on a Nehalem-based workstation, actually IMPROVED performance considerably.It seems that certain applications don't like hyper-threading, while others do. I always recommend that my customers perform sensitivity analyses on their computing tasks with HT on and off, and then use whichever is best.
tracerburnout - Wednesday, March 31, 2010 - link
How is it possible that Intel's Xeon X5670 rig returns 19k+ for a score while AMD's magny-cours returns only 2k+?? I only question the results of this benchmark chart because Intel's Xeon X5570 rig returns only around 1k. How can a X5670 be 19x faster than a X5570?? And I doubt the same is true for the magny-cours by being just 10.5% of what the X5670 can do.(is there an extra '0' by accident in there?)
tracerburnout
proud supporter of AMD, with a few Intel rigs for Linux only
JohanAnandtech - Thursday, April 1, 2010 - link
No, it is just that Sisoft uses the new AES instructions of West-mere. It is a forward looking benchmark which tests only a small part of a larger website code base. So that 19x faster will probably result in 10 to 20% of the complete website being 19x faster. So the real performance impact will be a lot slower. It is interesting though to see how much faster these dedicated SIMD instructions are on these kinds of workloads.alpha754293 - Thursday, April 1, 2010 - link
If you guys need help with setting up or running the Fluent/LS-DYNA benchmarks let me know.I see that you don't really spend as much time writing or tweaking it as you do with some of the other programs, and that to me is a little concerning only because I don't think that it is showing the true potential of these processors if you run it straight out-of-the-box (especially with Fluent).
Fluent tends to have a LOT of iterations, but it also tends to short-stroke the CPU (i.e. the time required to complete all of the calculations necessary is less than 1 second and therefore; doesn't make full use of the computational ability.)
Also, the parallelization method (MPICH2 vs. HP MPI) makes a difference in the results.
You want to make sure that the CPUs are fully loaded for a period of time such that at each iteration, there should be a noticable dwell time AT 100% CPU load. Otherwise, it won't really demonstrate the computational ability.
With LS-DYNA, it also makes a difference whether it's SMP parallelization or MPP parallelization as well.
k_sarnath - Friday, April 2, 2010 - link
The most baffling part is how linux could engage 12-CPUs much better than windows. I am obviously curious about the OS platform for other tests.. Similary MS SQL was able to scale well on multi-cores... In this context, I am not sure how we can look at the performance numbers... A badly scaling app or OS could show the 12-core one in bad light.OneEng - Saturday, April 3, 2010 - link
Hi Johan,I have followed your articles from the early day's at Ace's and have a good respect for the technical accuracy of your articles.
It appears that the X5570 scaling between 4 and 8 cores has very little gain in the Oracle Calling Circle benchmark. Furthermore, the 24 cores of MC at 2.2Ghz are way behind. Westmere appears to do quite well, but really should not be able to best 8 cores in the X5570 with all else being equal.
I have heard some state that the benchmark is thread bound to a low number of threads (don't know if I am buying this), but surely something fishy is going on here.
It appears that there is either a real world application limit to core scaling on certain types of Oracle database applications (if there are, could you please explain what features an app has when these limits appear), or that the benchmark is flawed in some way.
I have a good amount of experience in Oracle applications and have usually found that more cores and more memory make Oracle happy. My experience seems at odds with your latest benchmarks.
Any feedback would be appreciated .... Thanks!
JohanAnandtech - Tuesday, April 6, 2010 - link
I am starting to suspect the same. I am going to dissect the benchmark soon to see what is up. It is not disk related, or at least that surely it is not our biggest problem. Our benchmark might not be far from the truth though, I think Oracle really likes the big L3-cache of the Westmere CPU.If you have other ideas, mail at johanATthiswebsiteP
heliosblitz2 - Wednesday, April 7, 2010 - link
You wroteTest-Setup:
Xeon Server 1: ASUS RS700-E6/RS4 barebone
Dual Intel Xeon "Gainestown" X5570 2.93GHz, Dual Intel Xeon “Westmere” X5670 2.93 GHz
6x4GB (24GB) ECC Registered DDR3-1333
"Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570."
That is not exactly the reason, I think.
The reason ist you populated the second memory-bank in both setups.
Intel specification:
Westmere-1333MHZ-CPUs run with 1333 MHZ with second bank populated while
Nehalem-1333MHZ-CPUs run with 1066 MHZ with second bank populated
That could be updated.
Compare tech docs on Intel site: datasheet Xeon 5500 Part 2 and datasheet Xeon 5600 Part 2
Arnold.
gonerogue - Saturday, April 10, 2010 - link
The Viper is a V10 and most certainly not a traditional muscle car ;)kokotko - Saturday, April 24, 2010 - link
why you are NOT SHARIG same "shareable" components - like PSU ??????NO WONDER THE NUMBERS ARE WORSE ! ! !
blurian589 - Tuesday, May 11, 2010 - link
3ds max crashes because of the mental ray renderer. remove the plugin from loading and max will start up. its due to mental ray cannot see more than 16 threads (physical or virtual via hyper-threading). please do test the max rendering performance. thanksDesired_Username - Tuesday, June 29, 2010 - link
In the final words it states "We estimate that the new Opteron 6174 is about 20% slower than the Xeon 5670 in virtualized servers with very high VM counts. " But in the virtualization section I can't seem to figure out what brought you to that conclusion. The VMmark scores for the Cisco X5680 system was 35.83@26 tiles. You have the VMmark for the 6176SE at 31 which is dead on to the HP DL385 G7 which got 30.96@22 tiles. I see the X5680 15% better at best. And the Cisco x5680 system had 192GB of memory to the HP 6176SE system had 128GB. What am I missing here?jeffjeff - Wednesday, September 22, 2010 - link
I appreciate AMD's lower CPU cost but on the other hand, Oracle will license me their RDBMS per core and whether it's an Intel 56xx or AMD 61xx, I am still paying a relation of .5 license per core.So in the end, I would pay 6 cores for AMD and 3 cores for Intel. The price per core is much higher than the hardware price difference.
Any thought or solutions on this issue would be appreciated...
Joffrey
stealthy - Wednesday, November 24, 2010 - link
Would it be possible to get the xml parameter files you have used in this test ?We are currently in a trial phase at my company to see how the current crop of intel boxes (dual Xeon X5460 procs) hold up against a new z10 system.
Did you run the swingbench on the server itself or did you use a dedicated client to test ?
Big_Mr_Mac - Thursday, December 16, 2010 - link
In 1991 I had an AMD 386-40 that kicked the snot out of Intel pride and joy 486DX2-66. Benchmarks were 25%+ across the board over Intel. Then Intel lied to the market and started passing off cull processors as viable options calling the 25Mhz and 50Mhz processors, when they were actually processors that failed the benchmarks for 33Mhz and 66Mhz respectively.In 1998 When Win98 Beta was released I was building Servers and workstations at a Tech-company and Again the AMD was kicking the snot out of Intel. Load times on new system builds, boot time and performance. The Intel chips could not hack it. Then when MS release their actual market version of Win98...all of a sudden you could not even use an AMD processor to run it. You had to wait 2 weeks for MS come up with a "AMD Patch" to run on an AMD system.
One think I have seen over 20 years in the industry is that Intel will, Lie, Cheat, Steal and Bribe to try and get the upper hand on AMD. Always have....Always Will!!
rautamiekka - Saturday, December 25, 2010 - link
Why the fuck are you testing with WinServer and M$ SQL ? Just reading this makes my blood boil 9 times in a second.polbel - Saturday, May 21, 2011 - link
i've been an amd fan for as long as i can remember. started fixing computers in 1979. used to fix mai basic four minis in the mid-80s that were built on amd bit-slice bipolar cpus on boards that cost 15,000$.just got 2 opteron 6172 cpus from ebay for what i thought was peanuts (450 $ each) only to discover upon delivery that both had hairline cracks at a 45 degree angle on one corner of the contact pad surface. looking at their web site i could figure i was out on limb and they would laugh in my face if asked for warranty support on these not-boxed cpus. i know some dumb ass managed to break those cpu corners, and tried to shove the crap to an ebay sucker, but the problem lies deeper, mostly in the g34 socket physical design itself of these otherwise beautiful electronic products. the edge of the metal cover doesn't reach the edge of the fiber board, leaving some unsupported area to be broken by dumb asses mimicking the old days when they could put a 40-pin dip cpu upside-down in its socket. so i'm freshly reviewing my belief system about amd while i figure a solution for this crap-hits-the-fan situation. wish i could have told amd engineers to cover theses last millimeters at the bleeding edge. they might say this and that about warranty, i still hold them responsible for this preventable disaster.
paul :-)