No, but that's what we already knew. What's new is that "If using the optimum settings, the new one can catch up and slightly outperform the old one. The gap to the Xeons closes a bit, too."
(quick note: the Blender benchmark on Windows is one of the worst benchmarks for the Opteron Interlagos, so see this as "worst case" performance point).
So make sure you see this in perspective, it is not one of the Interlagos favorite benchmarks.
This is basically the impression those benchmarks give, but it's dead wrong.
For anything windows, the current scheduler is unable to adapt to the Bulldozer architecture and will thus completely waste any and all advantage it brings, that is why you don't get much better performance in any windows benchmark of either fx- or interlagos CPU's.
That is temporary, as microsoft will update their scheduler in the future for WS2008 and W7 in the process.
For anything ESX, we can again see how Intel's strategy of helping other vendors customize and adapt to Xeon is paying.
So there, the Interlagos is doing bad because AMD didn't bother to pay someone @ ESX to create a power plan adapted to Interlagos - and we can see the default power plan is even more retarded than the windows scheduler (it's not that it doesn't understand modules, it doesn't understand power states ...).
In the hands of an ESX / Interlagos expert, you could have a finely tuned ESX host that would definitely outperform the Istanbul, and most probably the Xeon, given how much faster Interlagos is compared to Istanbul.
Moving on, we get into "stupid" benchmarks like Cinebench, which should be altogether dropped from benchmarks, be it Desktop or Server - as it is Intel favored and globally irrelevant for anyone.
Missing from a server benchmark are the raw performance numbers, some fine-tuned SQL benchmarks (those can't be done in-house at anandtech but hey, they could be copied or something), and some real world virtualization performance per watt benchmarks. (No, ESX is not the only one, and it wasn't even properly tweaked)
Globally, if anything, the benchmarks presented here on Anandtech say three things :
1) Windows Threading Scheme Sucks 2) ESX power plans suck big time 3) Some benchmarks should be removed (rendering included)
If anyone still has a doubt, go ask Cray why they picked AMD . and you'll understand that not everything is as it's presented in Anandtech benchmarks.
Unfortunately I have very little knowledge of PostgreSQL. The last time we tried (together with a decent PostgreSQL expert), it scaled slightly better than MySQL (>4 cores, <8cores), but nowhere near MS SQL server (which can tackle 32-64 cores). The big problem is getting these kind of databases work with 8 cores and more.
Johan, Hopefully you and the Anandtech team get a chance to test out Postgresql again, especially with version 9.2 or newer. There have been some recent scaling improvements to postgresql that might prove interesting: http://rhaas.blogspot.com/2011/07/read-scaling-out...
Scaling PostgreSQL is totally possible, but it's indeed reserved to the experts.
If you need someone to help you setup a benchmark, you might want to ask around on the postgreSQL performance mailing list, I'm pretty sure they'll help you setup your test bed.
PostgreSQL has been scaled far beyond 8 cores in the past so it shouldn't be an issue for just two socket Interlagos.
And, please don't ever consider MySQL and PostgreSQL as the same kind of databases, the first is a toy for webdevs, the second is a real concrete alternative to Oracle for DBA's.
Read an article in CT (german magazine) which also did some tests with Linux, Interlagos and some - obscure - compiler optimization flags. Andreas Stiller did those tests I believe.
CT achieved much better benchmark scores than current Xeon cpu's using Linux. Strange to see multiple benchmarks that show entirely different results.
thanks. I've been testing for several months on B2 silicon. I've noticed perf differences between 2k8 r2 SP1 and non-SP1 systems as well as differences with win7 sp1 and non-SP1. Linux kernels w/3.x and greater tend to perform a bit better as well (Suse, in particular) than their windows counterparts.
Benchmarks: synthetic, real world, niche users? Compilers and -O flags? Clearly Bulldozer's failure or success does not answer to oversimplified metrics. I am a big AMD fan, but clearly they have made some undefendable gambles: slightly too deep a pipeline, lower IPC and of course no AMD optimized compilers! Not to mention the lack of in house fab capability (sure their existence depended on spinning off to GBF, but too late on HKMG, a bulk node behind, killing off phenom II, betting on TSMC... you get the idea).... aesthetically it is a pleasing design, and hopefully with an iteration or two, bulldozer +fusion with a decent GPU, and more true GPGPU optimized programs/Oses and we'll see some wisdom in their choices. Just hope they don't go bankrupt before we get to a truly heterogeneous computing era...
Has anyone seen any decent VMware benchmarks? This is the one place I was hoping Bulldozer architecture might end up being useful. I think Intel may have an advantage just because some software charges by # of cores per CPU socket, but AMD could still have a fighting chance otherwise.
None of those benchmarks really covers a large part of the market: Specjbb based benchmarks only cover java benches that are very light on I/O, and that is a very small part of the market.
SpecInt is hardly relevant either for the server market: it is a combination of HPC, video and game related benchmarks. And it runs all those instances with separate data in parallel.
We tried out two virtualization benchmarks with real world workloads, as used in real enterprises. Considering that most servers now ship with a hypervisor on top of them and I clearly indicated that the encryption and file archiving benchmarks carry less weight, I think your comment contains little constructive critism.
And didn't I indicate that we are working on extensive MS SQL Server 2008 benchmarks?
i see, what you want to say. they all cover only a small part of the market... but all together they give you a hint. i dont understand, why you tested 4 different rendering benchmarks (cinebench, 3dsmax, maxwell render and blender as well). dont they have similar load-characteristics? how big should the market for this be? 50%?
I was wondering if a quad opteron ( or even dual opteron ) 6174 setup benefit MAYA or 3Ds MAx rendering , if cpu rendering is considered there should be enough horse power in such a setup but i am not sure if Maya or 3ds max would even support as many cores. Any experience or ideas?
Single server benchmarks are great, and can be extremely valuable, but how about benchmarking clusters of servers? For me, Hadoop/Hbase would be ideal, but anything that clusters and spreads out the load across multiple machines could be interesting. I suppose you could say doing it all on 1 machine can give you an idea of performance, but it doesn't necessarily really stress absolutely every part of a system, since the network I/O is limited (usually), and the behavior of a cluster on >1 machine can be different than on a cluster of just 1 machine.
"may become very robust in FP intensive applications once the use of AVX gets widespread"
This is very unlikely, given how constrained the Bulldozer cores are with the shared floating point unit. If they can't cope currently with SSE instructions (4 32-bit floats) per cycle, how are they going to cope with the 8 that AVX use?
Great work Johan, it will be very interesting to see the complete benchmark results.
Regarding the FPU, Bulldozzer can be more competitive with the use of XOP/FMAC instructions, but with "simple" AVX I didn't expect particular high gains (as a 256 bit AVX instructions is split into 2x128 bit macro operations).
Obviously some operations will be faster as AVX semanthic and functionality are better than existing SSE instructions, but this is true also for Intel's AVX implementation.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
30 Comments
Back to Article
chrone - Thursday, December 8, 2011 - link
it seems the new interlagos is outperformed with previous version of opteron. this is bad for amd. :(i like the more cores in server, it could be useful for web server. i hope next generation of amd will improve its performance per watt.
MrSpadge - Thursday, December 8, 2011 - link
No, but that's what we already knew. What's new is that "If using the optimum settings, the new one can catch up and slightly outperform the old one. The gap to the Xeons closes a bit, too."MrS
JohanAnandtech - Thursday, December 8, 2011 - link
I quickly added the disclaimer:(quick note: the Blender benchmark on Windows is one of the worst benchmarks for the Opteron Interlagos, so see this as "worst case" performance point).
So make sure you see this in perspective, it is not one of the Interlagos favorite benchmarks.
Morg. - Monday, December 12, 2011 - link
This is basically the impression those benchmarks give, but it's dead wrong.For anything windows, the current scheduler is unable to adapt to the Bulldozer architecture and will thus completely waste any and all advantage it brings, that is why you don't get much better performance in any windows benchmark of either fx- or interlagos CPU's.
That is temporary, as microsoft will update their scheduler in the future for WS2008 and W7 in the process.
For anything ESX, we can again see how Intel's strategy of helping other vendors customize and adapt to Xeon is paying.
So there, the Interlagos is doing bad because AMD didn't bother to pay someone @ ESX to create a power plan adapted to Interlagos - and we can see the default power plan is even more retarded than the windows scheduler (it's not that it doesn't understand modules, it doesn't understand power states ...).
In the hands of an ESX / Interlagos expert, you could have a finely tuned ESX host that would definitely outperform the Istanbul, and most probably the Xeon, given how much faster Interlagos is compared to Istanbul.
Moving on, we get into "stupid" benchmarks like Cinebench, which should be altogether dropped from benchmarks, be it Desktop or Server - as it is Intel favored and globally irrelevant for anyone.
Missing from a server benchmark are the raw performance numbers, some fine-tuned SQL benchmarks (those can't be done in-house at anandtech but hey, they could be copied or something), and some real world virtualization performance per watt benchmarks. (No, ESX is not the only one, and it wasn't even properly tweaked)
Globally, if anything, the benchmarks presented here on Anandtech say three things :
1) Windows Threading Scheme Sucks
2) ESX power plans suck big time
3) Some benchmarks should be removed (rendering included)
If anyone still has a doubt, go ask Cray why they picked AMD . and you'll understand that not everything is as it's presented in Anandtech benchmarks.
chrone - Thursday, December 8, 2011 - link
Dear Johan,Is there any PostgreSQL benchmark on AMD and Intel new CPU?
JohanAnandtech - Thursday, December 8, 2011 - link
Unfortunately I have very little knowledge of PostgreSQL. The last time we tried (together with a decent PostgreSQL expert), it scaled slightly better than MySQL (>4 cores, <8cores), but nowhere near MS SQL server (which can tackle 32-64 cores). The big problem is getting these kind of databases work with 8 cores and more.Short answer: not that I know off :-).
chrone - Thursday, December 8, 2011 - link
oh okay, thanks for the reply, i really appreciate it. :)argh, too bad postgresql is not as popular as mysql. hehe
samuraid - Thursday, December 8, 2011 - link
Johan,Hopefully you and the Anandtech team get a chance to test out Postgresql again, especially with version 9.2 or newer. There have been some recent scaling improvements to postgresql that might prove interesting:
http://rhaas.blogspot.com/2011/07/read-scaling-out...
Morg. - Monday, December 12, 2011 - link
Johan,Scaling PostgreSQL is totally possible, but it's indeed reserved to the experts.
If you need someone to help you setup a benchmark, you might want to ask around on the postgreSQL performance mailing list, I'm pretty sure they'll help you setup your test bed.
PostgreSQL has been scaled far beyond 8 cores in the past so it shouldn't be an issue for just two socket Interlagos.
And, please don't ever consider MySQL and PostgreSQL as the same kind of databases, the first is a toy for webdevs, the second is a real concrete alternative to Oracle for DBA's.
;)
Elite99 - Thursday, December 8, 2011 - link
Read an article in CT (german magazine) which also did some tests with Linux, Interlagos and some - obscure - compiler optimization flags. Andreas Stiller did those tests I believe.CT achieved much better benchmark scores than current Xeon cpu's using Linux. Strange to see multiple benchmarks that show entirely different results.
davegraham - Thursday, December 8, 2011 - link
johan,have you been testing with DDR3-1600 ECC/REG?
d
JohanAnandtech - Thursday, December 8, 2011 - link
yes. 8x 8 GB of DDR3-1600. The RAM that was inside the AMD review server.davegraham - Thursday, December 8, 2011 - link
thanks. I've been testing for several months on B2 silicon. I've noticed perf differences between 2k8 r2 SP1 and non-SP1 systems as well as differences with win7 sp1 and non-SP1. Linux kernels w/3.x and greater tend to perform a bit better as well (Suse, in particular) than their windows counterparts.d
Klimax - Friday, December 9, 2011 - link
Not suprising. When you use arch specific optimisation then by definition other archs will not be used optimally. (small exception is Core arch)I'd say it would be more strange to see both CPUs to get better/stay same with arch specific opt.
Morg. - Tuesday, December 13, 2011 - link
I don't think you get it : an interlagos-optimized beat a xeon-optimized hands down . no surprise though.But this goes to show that half-assed benchmarks cannot show wether a cpu is better than another one.
Elite99 - Thursday, December 8, 2011 - link
And here's the article and benchmark: http://www.heise.de/artikel-archiv/ct/2011/25/158/RaggedRaz - Thursday, December 8, 2011 - link
Benchmarks: synthetic, real world, niche users? Compilers and -O flags? Clearly Bulldozer's failure or success does not answer to oversimplified metrics. I am a big AMD fan, but clearly they have made some undefendable gambles: slightly too deep a pipeline, lower IPC and of course no AMD optimized compilers! Not to mention the lack of in house fab capability (sure their existence depended on spinning off to GBF, but too late on HKMG, a bulk node behind, killing off phenom II, betting on TSMC... you get the idea).... aesthetically it is a pleasing design, and hopefully with an iteration or two, bulldozer +fusion with a decent GPU, and more true GPGPU optimized programs/Oses and we'll see some wisdom in their choices. Just hope they don't go bankrupt before we get to a truly heterogeneous computing era...TiGr1982 - Thursday, December 8, 2011 - link
I fully agree; let's hope they'll survive at least to deliver "Steamroller"/"Excavator" CPU generations...Filiprino - Thursday, December 8, 2011 - link
Yeah, AMD f***ed up a bit and for quite a while.davegraham - Thursday, December 8, 2011 - link
Open64 is AMD optimized and is frightfully easy to use. pretty sure that it'll do the job.twhittet - Thursday, December 8, 2011 - link
Has anyone seen any decent VMware benchmarks? This is the one place I was hoping Bulldozer architecture might end up being useful. I think Intel may have an advantage just because some software charges by # of cores per CPU socket, but AMD could still have a fighting chance otherwise.derbene - Thursday, December 8, 2011 - link
so testing servers means mostly rendering, a bit encryption, file archiver and VM to you? whats with real server benchmarks?take a look at these benchmarks:
http://www.tecchannel.de/server/prozessoren/203825...
bulldozer looks pretty brilliant compared to xeon there.
ender8282 - Thursday, December 8, 2011 - link
All a saw there was a single performance/watt benchmark. That is useful but this is a server processor not a SOC for a smartphone.derbene - Thursday, December 8, 2011 - link
then you should try to see better"Alle Diagramme"
or << >>
JohanAnandtech - Friday, December 9, 2011 - link
None of those benchmarks really covers a large part of the market: Specjbb based benchmarks only cover java benches that are very light on I/O, and that is a very small part of the market.SpecInt is hardly relevant either for the server market: it is a combination of HPC, video and game related benchmarks. And it runs all those instances with separate data in parallel.
We tried out two virtualization benchmarks with real world workloads, as used in real enterprises. Considering that most servers now ship with a hypervisor on top of them and I clearly indicated that the encryption and file archiving benchmarks carry less weight, I think your comment contains little constructive critism.
And didn't I indicate that we are working on extensive MS SQL Server 2008 benchmarks?
derbene - Friday, December 9, 2011 - link
i see, what you want to say. they all cover only a small part of the market... but all together they give you a hint. i dont understand, why you tested 4 different rendering benchmarks (cinebench, 3dsmax, maxwell render and blender as well). dont they have similar load-characteristics? how big should the market for this be? 50%?ashrafi - Thursday, December 8, 2011 - link
I was wondering if a quad opteron ( or even dual opteron ) 6174 setup benefit MAYA or 3Ds MAx rendering , if cpu rendering is considered there should be enough horse power in such a setup but i am not sure if Maya or 3ds max would even support as many cores.Any experience or ideas?
Drizzt321 - Thursday, December 8, 2011 - link
Single server benchmarks are great, and can be extremely valuable, but how about benchmarking clusters of servers? For me, Hadoop/Hbase would be ideal, but anything that clusters and spreads out the load across multiple machines could be interesting. I suppose you could say doing it all on 1 machine can give you an idea of performance, but it doesn't necessarily really stress absolutely every part of a system, since the network I/O is limited (usually), and the behavior of a cluster on >1 machine can be different than on a cluster of just 1 machine.kallestar - Friday, December 9, 2011 - link
"may become very robust in FP intensive applications once the use of AVX gets widespread"This is very unlikely, given how constrained the Bulldozer cores are with the shared floating point unit. If they can't cope currently with SSE instructions (4 32-bit floats) per cycle, how are they going to cope with the 8 that AVX use?
shodanshok - Sunday, December 11, 2011 - link
Great work Johan, it will be very interesting to see the complete benchmark results.Regarding the FPU, Bulldozzer can be more competitive with the use of XOP/FMAC instructions, but with "simple" AVX I didn't expect particular high gains (as a 256 bit AVX instructions is split into 2x128 bit macro operations).
Obviously some operations will be faster as AVX semanthic and functionality are better than existing SSE instructions, but this is true also for Intel's AVX implementation.