Each module includes 2x integer cores, correct. But the floating point core is "shared-separate", meaning it an be used as two separate 128-bit FPUs or as a single 256 FPU.
Thus, each Bulldozer module can run either 3 or 4 threads simultaneously: - 2x integer + 2x 128-bit FP threads, or - 2x integer + 1x 256-bit FP threads
It's definitely a dual-core module. It's just that the number of threads it can run is flexible.
The thing to remember, though, is that these are separate hardware pipelines, not mickey-moused hyperthreaded pipelines.
You can get into a long discussion about that. The way that I see it, is that part of the core is "logical/virtual", the other part is real in Bulldozer . What is the difference between an SMT thread and CMT thread when they enter the fetch-decode stages? Nothing AFAIK, both instructions are interleaved, and they both have a "thread tag".
The difference is when they are scheduled, the instructions enters a real core with only one context in the CMT Bulldozer. With SMT, the instructions enter a real core which still interleave two logical contexts. So the core still consists of two logical cores.
It is gets even more complicated when look at the FP "cores". AFAIK, the FP cores of Interlagos are nothing more than 8 SMT enabled cores.
The way I see it, the FPU on the Interlagos is this:
It's really a 256-bit wide FPU.
It can't really QUITE separate the ONE physical FPUs into two 128-bit wide FPUs, but it more probably in reality, interleaves them (which is really just code for "FPU-starved").
Intel's original HTT had this as a MAJOR problem, because the test back then can range from -30% to +30% performance increase. Floating-point intensive benchmarks have ALWAYS suffered mostly because suppose you're writing a calculator using ONLY 8-byte (64-bit) double precision.
NORMALLY, that should mean that you should be able to crunch through four DWORDs at the same time. And that's kinda/sorta true.
Now, if you are running two programs, really...I don't think that the CPU, the compiler (well..maybe), the OS, or the program knows that it needs to compile for 128-bit-wide FPUs if you're going to run two instances or two (different) calculators.
So it's resource starved in trying to do the calculation processes at the same time.
For non-FPU-heavy workloads, you can get away with that. For pretty much the entire scientific/math/engineering (SME) community; it's an 8-core processor or a highly crippled 16-core processor.
Intel's latest HTT seems to have addressed a lot of that, and in practical terms, you can see upwards of 30% performance advantage even with FPU-heavy workloads.
So in some cases, the definition of core depends on what you're going to be doing with it. For SME/HPC; it's good cuz it can do 12-actual-cores worth of work with 8 FPUs (33% more efficient), but sucks because unless they come out with a 32-thread/16-core monolithic die; as stated, it's only marginally better than the last. It's just cheaper. And going to get incrementally faster with higher clock speeds.
P.S. Also, like Anand's article about nVidia Optimus:
Context switching even at the CPU level, while faster, is still costly. Perhaps maybe not nearly as costly as shuffling data around; but it's still pretty costly.
Ouch, this is going to be AMD's Itanium. That is, it has architecture adoption problems that people simply won't build around. Maybe less substantial than IA64, but still a huge performance loss because of underutilized integer units.
think they way CPU-z reporting it for BD cpus is correct each core has 2 FP, so 8 cores and 16 threads is correct
to bad windows does not understand how to spread the load correctly on an amd cpu (windows 7 with HT cpus Intel works fine, spreads the load correctly, SP1 improves that more but for Intel cpus only)
windows 7 sp1 makes biger use of core parking and gives better cpu use on Intel cpus as i have been seeing on 3 systems most work loads now stay on the first 2 cores and the other 2 stay parked, on amd side its still broke with cool and quite enabled
Bulldozers do not utilize hyper threading, which takes a single integer core and can at times put two threads into that single integer core. A Bulldozer core has actual hardware two run two threads at the same time. This would suggest there are two physical cores.
Does it perform like an intel 16 core (if there was such a thing), no. But that does not mean that it is not in fact a 16 core device. As the hardware is there. Yes they share an FPU, but that doesn't mean they are not cores.
Actually, Bulldozer is 16 cores. It has two dedicated integer units and a float point unit which can act as two 128 bit units or one 256 bit unit for AVX. So, you can have 2 and 2 per module. Bulldozer does not use hyperthreading.
I'm curious if CPU-Z polls the hardware for this information or if it queries a database to fetch this information. If it is getting the core and thread count from hardware, it maybe configurable. So while the chip itself does not use Hyperthreading, it maybe reporting to the OS that does it by default. This would have an impact in performance scaling as well as power consumption as load increases.
They are integer cores, which share few ressources besides the FPU. On the Intel side there are two threads running concurrently (always, @Stuka87) which share a few less ressources.
Arguing which one deserves the name "core" and which one doesn't is almost a moot point. However, both designs are nto that different regarding integer workloads. They're just using a different amount of shared ressources.
People should also keep in mind that a core does not neccessaril equal a core. Each Bulldozer core (or half module) is actually weaker than in Athlon 64 designs. It got some improvements but lost in some other areas. On the other hand Intels current integer cores are quite strong and fat - and it's much easier to share ressources (between 2 hyperthreaded treads) if you've got a lot of them.
but on Intel side there are only 4 real cores with HT off or on (on an i7 920 seems to give an benefit, but on results for the second gen 2600k HT seems less important)
where as on amd there are 4 cores with each core having 2 FP in them (desktop cpu) issue is the FPs are 10-30% slower then an Phenom cpu clocked at the same speed
I'm not so sure I'd fault AMD too much because 95% of the people that their product users, in this case, won't go through the effort of upgrading their software to get a significant performance increase, at least at first. Sometimes, you have to "force" people to get out of their rut and use something that's actually better for them.
I freely admit that I don't know much about running business apps; I build gaming computers for personal use. I can't help but think of my Father though, complaining about Vista and Win 7 and how they won't run his old, freeware apps properly. Hey, Dad, get the people that wrote those apps to upgrade them, won't you? It's not Microsoft's fault that they won't bring them up to date.
Backwards compatibility can be a stone around the neck of progress.
I've tended to be disappointed in AMD's recent CPU releases as well, but maybe they really do have an eye focused on the future that will bring better things for us all. If that's the case, though, they need to prove it now, and stop releasing biased press reports that don't hold up when these things are benched outside of their labs.
The problem is that a lot of server folks buy new servers to run the current or older software faster. It is a matter of TCO: they have invested a lot of work into getting webapplication x.xx to work optimally with interface y.yy and database zz.z. The vendor wants to offer a service, not a the latest technology. Only if the service gets added value from the newest technology they might consider upgrading.
And you should tell your dad to run his old software in virtual box :-).
Most of the benchmarks are for rendering: Cinebench, 3DSMax, Maxwell, Blender, etc.
How many enterprises actually do 3D rendering?
Far more common enterprise applications would be RDBMS, data warehouse, OLTP, JVM, app servers, etc.
You touched on some of that in just one virtualization benchmark, vApus. That doesn't make sense either - how many enterprises you know run database servers on VM?
A far more useful review would be running separate benchmarks for OLTP, OLAP, RDBMS, JVM, etc. tppc, tpce, tpch would be a good place to start
But the exploding core counts made it as good as impossible.
1. For example, a website that scales to 32 cores easily: most people will be amazed how many websites have trouble scaling beyond 8 cores.
2. Getting an OLTP database to scale to 32 cores is nothing to sneeze at. If your database is small and you run most of it in memory, chances are that you'll get a lot of locks and that it won't scale anyway. If not, you'll need several parallel RAID cards which have a lot of SSDs. We might pull that one off (the SSDs), but placing several RAID cards inside a server is most of the time not possible. once you solve the storage bottleneck, other ones will show up again. Or you need an expensive SAN... which we don't have.
We had an OLAP/ OLTP and Java benchmarks. And they were excellent benchmarks, but between 8 and 16 cores, they started to show decreasing CPU utilization despite using SSDs, tweaking etc.
Now puts yourself in our place. We can either spend weeks/months getting a database/website to scale (and we are not even sure it will make a real repeatable benchmark) or we can build upon our virtualization knowledge knowing that most people can't make good use of a native 32 core database anyway (or are bottlenecked by I/O and don't care anyway), and buy their servers to virtualize.
At a certain point, we can not justify to invest loads of time in a benchmark that only interest a few people. Unless you want to pay those people :-). Noticed that some of the publications out there use geekbench (!) to evaluate a server? Noticed how many publication run virtualization benchmarks?
"That doesn't make sense either - how many enterprises you know run database servers on VM?"
Lots of people. Actually besides a few massive Oracle OLTP databases, there is no reason any more not to virtualized your databases. SQL server and MySQL are virtualized a lot. Just googling you can find plenty of reports of MySQL and SQL server on top of ESX 4. Since vSphere 4 this has been common practice.
"etc. tppc, tpce, tpch would be a good place to start "
No not really. None of the professional server buyers I know cares about TPC benches. The only people that mentione them are the marketing people and hardware enthusiast that like to discuss high-end hardware.
So you prefer software that requires 300.000$ of storage hardware over a very realistic virtualization benchmarks which are benchmarked with real logs of real people?
Your "poor benchmark choice" title is disappoing after all the time that my fine colleagues and me have spend on getting a nice website + groupware virtualization benchmark running which is stresstested by vApus which uses real logs of real people. IMHO, the latter is much more interesting than some inflated TPC benchmarks with storage hardware that only the fortune 500 can afford. Just HMO.
While scaling to 32 cores can be problematic for some software, it's worth keeping in mind that the vast majority of dual-socket servers don't have 32 cores.
In fact, a dual-CPU Intel server only has *at most* 12 cores, that's a far cry from 32-cores. Postgresql & MySQL has no problem at all to scale to 12 cores and beyond.
Now if AMD decided to make a CPU with crappy per-core performance but has so many cores that most software can't take full advantage of, that's their own fault. It's not like they haven't been warned. Sun tried and failed with the same approach with T2. If AMD is hellbent on making the same mistake, they only have themselves to blame.
My post title is a bit harsh. But it is disappointing to see a review that devotes FOUR separate benchmarks to 3D rendering, an application that the vast majority of enterprises have no use for at all. Meanwhile, the workhorse applications for most enterprises, OLTP, OLAP, and such, received far too little attention.
"In fact, a dual-CPU Intel server only has *at most* 12 cores..."
Incorrect. There is s1567. This allows 2-8 CPUs, with a max. of 8C/16T per CPU......... which I'm wondering why Anandtech failed to include in this review?
"You mean the E7-8830 CPU from the E7-8800 series which has prices *starting* at $2280?"
I'm not sure what he meant, but there are E7-2xxx processors for dual socket servers, which are priced much lower than the E7-8xxx processors which are for 8+ socket servers.
I have trouble understanding why people think a review should include research into every other similar product that might be used for the same purpose.
I mean, I can understand ASKING for a review of another specific product, particularly if you've actually done some research on your own and haven't found the information you want, but to imply a review isn't complete because it didn't mention or test another piece of hardware is a bit - unrealistic.
Sabresiberian, a very sincere thank you for being reasonable. :-)
Frankly I can't imagine a situation where someone would have trouble to decide between a Westmere-EX and an AMD CPU. Most people checking out the Westmere-EX go for the RAS features (dual) or RAS + ultimate high thread performance (Quad). In all other cases dual Xeon EP or Opterons make more sense power and pricewise.
Really? Is it that much trouble to understand that people want to see the latest AMD cpu's compared to the most current generation of Intel hardware? Especially when the previous Intel processor review posted on this site reported on Westmere-EX performance? I have trouble understanding why people wouldn't expect it.
Sorry but neotiger is totally right, choice of benchmark sucks. We are not helped *at all* by your review. What company 32-core server is being used for 3D rendering, cinebench, file compression, truecrypt encryption?? You benchmarked it like it was a CPU of the nineties for a home enthusiast.
You are probably right pointing us to http://www.anandtech.com/show/2694 but your benchmarks don't reflect that AT ALL. Where are file compression, encryption, 3D rendering and cinebench in that chart?
Even performances per watt is not very meaningful because when one purchases a 2-socket or 4-socket server, electricity cost is not an issue. Companies want to simplify deployment with such a system, they want this computer to run as fast as a cluster, in order not to be bound to cluster databases which are a PAIN. So people want to see scalability of applications to full core count on this kind of system, not so much performances per watt.
Virtualization is the ONLY senseful benchmark you included.
TPC as suggested is a totally right benchmark, that's the backend and bottleneck for most of the things you see in your charts at http://www.anandtech.com/show/2694 , and objection on storage is nonsense, just fit a database in ramdisk (don't tell me you need a database larger than 64GB for a benchmark), export as block device, then run the test. And/or use one PCI-e based SSD which you certainly have.
http://www.anandtech.com/show/2694 mentions software development: how much effort does it require to set up a linux kernel compile benchmark?
http://www.anandtech.com/show/2694 mentions HPC: can you set up a couple of bioinformatics benchmarks such as BLAST (integer computation, memory compare), GROMACS (matrix FPU computations) and Fluent? Please note that none of your tests includes memory compares and FPU which are VERY IMPORTANT in HPC. Gromacs and fluent would cover the hole. Bioinformatics is THE hpc of nowdays and there are very few websites, if any, which help with the choice of CPUs for HPC computing.
For email servers (37%!) and web servers (14%) also I am sure you can find some benchmarks.
I'm not sure how the discovery of cores running in their power-saving state for far too long is anything new. My 2600k refuses to ramp up clocks while previewing video in a video editor even though a core is pegged at 100%. If I intervene and force it to 3.4ghz, preview framerate jumps from 8 fps to 16fps.
This has been happening for YEARS! My old quad Phenom 2.2ghz did the exact same thing!
It's extremely annoying and pisses me off I can't benefit from the power savings, let alone turbo.
Sounds like you're running linux or some other strange OS, then. Or you may need a bios update. Generally Intel has its power management quit under control. In the AMD camp physical power state switches often take longer than the impatient OS expects, and thus average frequency is hurt. This was pretty bad for Phenom 1.
win7 home premium x64 and the phenom was with xp 32bit... i haven't found another scenario that causes this, only streaming video that's rendered on-the-fly
you do know that Linux did not have any problems with Phenom I power management unlike Windows ? Same is not with BD. Linux benchmarks look quite different from Windows and the gap is not that dramatic there.
This whole review, the only thought I have is that there are no sandy bridge chips in it. When SB based Xeon chips come out I bet that Interlagos will be completely dominated.
Not really. SB chips don't fit in AMD sockets. AMD's installed customer base like the significant performance increase and power savings by just plugging in a new Opteron 6200/4200.
We have a bunch of 6100 in our data center and the performance has been disappointing. They do no better in single thread performance than old 73xx series Xeons. While this is OK for non-interactive stuff, it really isn't good enough for much else. These results just seem to confirm that the Bulldozer series of processors is over-hyped and that AMD is in danger of becoming irrelevant in the server, mobile and desktop market.
this is exactly what should be fixed now with the turbo when set correct, btw the 73xx series were not that bad on single thread performance, it was wide scale virtualization and IO throughput which was awefull one these systems.
"Let us first discuss the virtualization scene, the most important market." Yea, I don't know about that.
Considering that they've already shipped like some half-a-million cores to the leading supercomputers of the world; where some of them are doing major processor upgrades with this new release; I wouldn't necessarily say that it's the most IMPORTANT market. Important, yes. But MOST important...I dunno.
Looking forward to more HPC benchmark results.
Also, you might have to play with thread schedule/process affinity (masks) to make it work right.
And yes, by any metric (revenue, servers sold) the virtualization market is the most important one for servers. Depending on the report 60 to 80% of the servers are bought to be virtualized.
I would explain it this way: it is the physical, hardware manifestation of simultaneous multi-threading (SMT). Intel's HTT is SMT.
IBM's POWER (since I think as early as POWER4), Sun/Oracle/UltraDense's Niagara (UltraSPARC T-series), maybe even some of the older Crays were all CMT. (Don't quote me on the Crays though. MIPS died before CMT came out. API WOULD have had it probably IF there had been an EV8).
But the way I see it - remember what a CPU IS: it's a glorified calculator. Nothing else/more.
So, if it can't calculate, then it doesn't really do much good. (And I've yet to see an entirely integer-only program).
Doing integer math is fairly easy and straightforward. Doing floating-point math is a LOT harder. If you check the power consumption while solving a linear algebra equation using Gauss elimination (parallelized or using multiple instances of the solver); I can guarantee you that you will consume more power than if you were trying to run VMs.
So the way I see it, if a CPU is a glorified calculator, then a "core" is where/whatever the FPU is. Everything else is just ancillary and that point.
1) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.
I haven't studied the latest Niagaras but the T1 was a fine grained mult-threaded CPU. It switched like a gatling gun between threads, and could not execute two threads at the same time.
SPARC T2 and onwards has additional ALU/AGU resources for a half physical two thread (four logically) solution per core with shared scheduler/pipeline if I remember correctly. That's not when CMT entered the picture according to SUN and Sun engineers any way. They regard the T1 as CMT as it's chip level. It's not just a CMP-chip any how. SMT is just running multiple threads on the cpus, CMP is working the same as SMP on separate sockets. It is not the same as AMDs solution however.
Firstly, this was a very good article, with a lot of information, especially the bits about the differences between server and desktop workloads.
Secondly, it does seem that you need to tune either the software (power management settings) or the chip (CMT) to get the best results from the processor. So, what advise is AMD offering its customers in terms of this tuning? I wouldn't want to pony up hundreds of dollars to have to then search the web for little titbits like switching off CMT in certain cases, or enabling High-performance power management.
Thirdly, why is the BIOS reporting 32 MB of L2 cache instead of 8 MB?
No need for tuning - turbo is OS-independent (unless OS power management explicitly disables it aka Windows). Just disable the power management on the OS level (= high performance fro Windows) and you are good to go.
Thanks, Johan. I run HyperV on Windows Server 2008 R2 SP1 on Phonem II X6 (my workstation) and have noticed the same CPU issue. I previously fixed it by disabling AMD's Cool'n'Quiet BIOS setting. After switching to high performance increase my overall power usage by 9 watts but corrected the CPU capping issue you mentioned.
Yet another excellent article from AnandTech. Well done. This is how I don't mind spending 1 hour of my precious evening time.
I'm guessing it's worse considering the increased general cache latency? I'm not sure how the latency, or syscall, is related if at all.
Just curious as when I do lots of compiling in a guest VM (Gentoo doing lots of checking of packages and hardware capabilities each compile) it tends to spend the majority of time in the kernel context.
Just also wanted to add: Before I had a VT-x enabled chip, it was unbearably slow to compile software in a guest VM. I remember measuring latencies of seconds for some operations.
After getting an i7 920 with VT-x, it considerably improved, and most operations are in the hundred or so millisecond range (measured with latencytop).
Please explain why there is no comparison between the latest AMD processors to Intel's flagship two-way server processors: the Intel Westmere-EX e7-28xx processor family?
Take the gloves off and compare flagship against flagship please, and then scale the results to reflect the price differece if you have to, but there's no good reason not to compare them that I can see. Thanks.
I think you should have done a more thorough VM test than you did. 64GB RAM? We all know single threaded performance is weak, but I still feel the server are underutilized in your test.
These CPU's are screaming heavy multi threading workloads. Many VM's. Many vCPU's.
What would the performance be if you had, say, at least 192GB of RAM and 50 (maybe more) VM's on it?
And offcourse, storage should not be a bottleneck.
I think this is where his 8modules/16threads cpu would shine. A dual socket rack/blade. 16modules/32 threads. Loads of RAM and a bounch of VM's.
So is the AMD system running 8GB DDR3-1600 DIMMS or 4GB DDR3-1333? Because you list the same DDR3-1333 model for both systems and if the Server supports 16 DIMMs well 16*4 is 64GB
I have wondered about this, with more cores per socket and virtualisation (organising new set of servers and buying far less hardware for the same functionality) so I'd have thought in total less server hardware is being purchased. Clearly that isn't the case though, is the money made back from more expensive servers?
While sure which each new generation of server you need much less hardware to do the same amount of work, however worldwide people are looking for servers to do much more work. Each year companies like Google, Facebook, Amazon, Microsoft and Apple add much more computing power than they could get by refreshing their current servers.
Don't forget the big "Cloud" buyers. Facebook has increased the numbers of server from 10.000 somewhere in 2008 tot 10 times more in 2011. That is one of the reasons why the number of units is still growing.
seems like the front page write and this article are from different versions:
from the write up: "Each of the 16 integer threads gets their own integer cluster, complete with integer executions units, a load/store unit, and an L1-data cache"
from the article: "Cores (Modules)/Threads 8/16 [...] L1 Data 8x 64 KB 2-way"
what is really surprising is calling them threads (I thought, like the write up on the front page, that they each had their own independent integer "unit"). If they have their own L1 cache, they are cores as far as I'm concerned. Then again, the article itself seems to suggest just that: they are threads without independent L1 cache.
ps> I post comments only like once a year -- please dont delete my account. every time I do, I have to register anew :D
I suits Intel better to call them threads ... so writers are ordered ... only if the pesky reality did not pop up here and there.
BD 4200 series is an 1-chip, 4-module, 8(4*2)-core, 16(4*2)-thread processor BD 6200 series is a 2-chip, 8(2*4)-module, 16(2*4*2)-core, 16(2*4*2)-thread processor
Xeon 5600 series is an (up to) 1-chip, 6-core, 12(6*2)-thread processor.
One thing that I never see in any reviews is remarks about the fact that more cores with lower IPC has added costs when it comes to licensing. For instance Oracle, IBM and most other suppliers charge per core. These costs can add up pretty fast. 10000 per core is not uncommon.....
Great review as usual. I found all the new AMD opterons very interesting. Pairing two in a dual socket G34 would make a multitasking monster on the cheap, and quite future proof.
Abour cores vs modules vs hyperthreading, people thinking AMD cores aren't true cores, should consider the following:
adding virtual cores on hyperthreading in intel platforms don't make performance increase 100% per core, but only less than 50%
Also if you look at intel processor photographs, you won't notice the virtual cores anywhere in the pictures. While in interlagos/bulldozer you could clearly spot each core by its shape inside each module. What surprises me is how small they are, but that's for an entire different discussion.
I'm waiting to see the follow-up Linux article. The hints in this one confirm my own experiences. At our company, we're 99% FOSS and when using Centos packages, AMD chips run just as fast as Intel chips since it's all compiled with GCC instead of Intel's "disable faster code when running on AMD processors" compiler. As an example, PostgreSQL on native Centos is just as fast on Thuban compared to Sandy Bridge at the same GHz. And when you then virtualize Centos under Centos+KVM, Thuban is 35% faster. (Nehalem goes from 10% slower natively to 50% slower under KVM!)
The compiler issue might be something to look at in virtualization tests. If you fake an Intel identifier in your VM, optimizations for new instruction sets might kick in.
A fairer comparison would be between the Opteron 6272 ($539 / 8-module) and Xeon E5645 ($579 / 6-core); both common and recent processors.
Yet handpicking the higher clocked Opteron 6276 (for what good reason?) seems to be nothing but an aim to make the new 6200 series seem un-remarkable in both power consumption and performance. The 6272 is cheaper, more common, and would beat the Xeon X5670 in power consumption which half this review is weighted on. Otherwise you should've used the 6282 SE which would compete in performance as well as being the appropriate processor according to your own chart.
Even the chart on Page 1 is designed to make Intel look superior all-around. For what reason would you exclude the Opteron 4274 HE (65W TDP) or the Opteron 4256 EE (35W TDP) from the 'Power Optimized' section?
The ignorance on processor tiers is forgivable even if you're likely paid to write this... but the benchmarks themselves are completely irrelevant. Where's the IIS/Apache/Nginx benchmark? PostgreSQL/SQLite? Facebook's HipHop? Node.js? Java? Something relevant to servers and not something obscure enough to sound professional?
If anyone finds me a madman; let me explain this simply by example. Benchmark choices aside...
If this test were to compare any of the top or middle-tier processors on the "AMD vs. Intel 2-socket SKU Comparison" chart ( http://www.anandtech.com/show/5058/amds-opteron-in... ) with their matching competition; this article would tell a different story in essence. Which does in fact, regardless of how fair the written conclusion may be, makes it biased.
Examples: X5650 vs 6282 SE E5649 vs 6276 E5645 vs 6272
"Yet handpicking the higher clocked Opteron 6276 (for what good reason?) seems to be nothing but an aim to make the new 6200 series seem un-remarkable in both power consumption and performance"
Do you realize you are blaming AMD? That is the CPU they sent us.
"The 6272 is cheaper, more common, and would beat the Xeon X5670 in power consumption which half this review is weighted on."
The 6272 is nothing more than a lower speedbin of the 6276. It has the same power consumption but slightly lower performance. Performance/wat is thus worse.
"PostgreSQL/SQLite? Facebook's HipHop? Node.js? Java? Something relevant to servers and not something obscure enough to sound professional? "
We use Zimbra, Phpbb, Apache, MySQL. What is your point? that we don't include every server software on the planet? If you look around how many publications are running good repeatable server benchmarks? If it would be so easy as running Cinebench or Truecrypt, I think everybody would be.
"Even the chart on Page 1 is designed to make Intel look superior all-around. For what reason would you exclude the Opteron 4274 HE (65W TDP) or the Opteron 4256 EE (35W TDP) from the 'Power Optimized' section?"
To be honest, those CPUs were not even in AMD's presentation that we got. We were only briefed about Interlagos.
Did they send you the Xeon X5670 also? I suppose who's ever handling media relations at AMD is either careless or disgruntled. eg. Sending a slightly overclocked processor with a 30% staple that happens to scale unusually bad in terms of power efficiency.
Please just answer this honestly; if you had compared a Opteron 6272 w/ a E5645 ... would your article present a different story?
Fair as you may have tried to be; you don't have to look far to find a comment here that came to the "BD is a joke" conclusion.
---
Using a phpbb stress test is hardly useful or relevent as a server benchmark; nevermind under a VM. Unless configured extensively; it's I/O bound. "Average Response Time" is also irrelevant; how is the reader to know if your 'response time' does not favor processors better with single-threaded applications?
Additionally; VM's on a better single-threaded processor will score higher in benchmarks due to the overhead as parallelism isn't optimized. Yet these results make zero sense in real-world usage. It contradicts the value of VM's; flexible scalability for low-usage applications.
Finally; I'd estimate that less than 5% of servers are virtual (if that). VM's are most popular with web servers and even there they have a small market share as they only appeal to small clients. Large clients use clusters of dedicated; tiny clients use shared dedicated.
Did you even use gcc 4.7 or Open64? In some applications; the new versions yield up to 300% higher performance for Bulldozer.
"if you had compared a Opteron 6272 w/ a E5645 ... would your article present a different story?"
You want us to compare a $551 80W TDP Intel cpu with a $774 115 AMD CPU?
"Unless configured extensively; it's I/O bound." We know how to monitor with ESX top. There is a reason why we have a disk system of 2 SDDs and 6 x 15k SAS disks.
"Average Response Time" is also irrelevant Huh? That is like saying that 0-60 mph acceleration times are irrelevant to sports cars.
"Finally; I'd estimate that less than 5% of servers are virtual (if that)" ....Your estimate unfortunately was true in 2006. We are 2011 now. Your estimate is 10x off, maybe more.
"You want us to compare a $551 80W TDP Intel cpu with a $774 115 AMD CPU?" $539
"The 6272 is nothing more than a lower speedbin of the 6276. It has the same power consumption but slightly lower performance. Performance/wat is thus worse." By your logic; the FX-8120 and FX-8150 have equal power consumption. They don't.
"We know how to monitor with ESX top. There is a reason why we have a disk system of 2 SDDs and 6 x 15k SAS disks." It's still I/O bound unless configured extensively.
"Huh? That is like saying that 0-60 mph acceleration times are irrelevant to sports cars." Yeah; it is if you're measuring the distance traveled by a number of cars. The opteron is obviously slower in handling single requests but it can handle maybe twice as many at the same time. Unless your stress test made every request @ T=0 and your server successfully qued them all, dropped none, and included the que time in the response time... it would favor the xeon immensely. Perhaps it does do all this; which is why I said "how is the reader to know" when you could have just as easily done 'Average Requests Completed Per Second'.
"....Your estimate unfortunately was true in 2006. We are 2011 now. Your estimate is 10x off, maybe more." Very funny. Did the salesman that told you that also recommend these benchmarks? Folklore tells that Google alone has over a million servers, 20X that of Rackspace or ThePlanet, and they aren't running queries on VM's.
" make of lots of DLLs--or in Linux terms, they have more dependencies"
Libraries is the word you're looking for.
I also see the mistake of mixing programming APIs/OS design/Hardware design...
Good software has TLBs, asynchronous locking where possible, etc, as does hardware but they are INDEPENDENT. The glue as you know, is how compiled code is treated at the uCode level. IMO, AMD hardware is fully capable of outperforming Intel hardware, but AMD uCode is incredibly good.
Very interesting review as usual Johan, thx. It is good to see that there are still people who want to thoroughly make reviews.
While the message is clear on the MS OS of both power and performance i think it isn't on the VMware. First of all it is quite confusing to what settings exactly have been used in BIOS and to me it doesn't reflect the real final conclusion. If it ain't right then don't post it to my opinion and keep it for further review....
I have a beta version of interlagos now for about a month and the performance testing depending on bios settings have been very challenging.
When i see your results i have following thoughts.
performance: I don't think that the current vAPU2 was able to stress the 2x16core enough, what was the avarage cpu usage in ESXTOP during these runs? On top of that looking at the result score and both response times it is clear that the current BIOS settings aren't optimal in the balanced mode. As you already mentioned the system is behaving strange. VMware themselves have posted a document for v5 regarding the power best practices which clearly mentions that these needs to be adapted. http://www.vmware.com/files/pdf/hpm-perf-vsphere5....
To be more precise, balanced has never been the right setting on VMware, the preferred mode has always been high performance and this is how we run for example a +400 vmware server farm. We rather use DPM to reduce power then to reduce clock speed since this will affected total performance and response times much more, mainly on the virtualization platform and OEM bios creations (lets say lack of in depth finetuning and options).
Would like to see new performance results and power when running in high performance mode and according the new vSphere settings....
"performance: I don't think that the current vAPU2 was able to stress the 2x16core enough, what was the avarage cpu usage in ESXTOP during these runs?"
93-99%.
"On top of that looking at the result score and both response times it is clear that the current BIOS settings aren't optimal in the balanced mode."
Balanced and high performance gave more or less the same performance. It seems that the ESX power manager is much better at managing p-states than the Windows one.
We are currently testing Balanced + c-states. Stay tuned.
thx for answers, i read the whole thread, just wasn't sure that you took the same settings for both windows and virtual.
according to Vmware you shouldn't take balanced but rather OS controlled, i know my BIOS has that option, not sure for the supermicro one.
quite a strange result with the ESXTOP above 90% with same performance results, there just seems to be a further core scaling issue on the vAPU2 with the performance results or its just not using turbo..... we know that the module doesn't have the same performance but the 10-15% turbo is more then enough to level that difference which would still leave you with 8 more cores
When you put the power mode on high performance it should turbo all cores for the full length at 2.6ghz for the 6276, while you mention it results in same performance are you sure that the turbo was kicking in? ESXTOP CPU higher then 100%? it should provide more performance....
Conclusion: "Intel gives much better performance/watt and performance in general; BD gives better performance/dollar"
Problem: Watts cost dollars, lots of them in the server space because you need to some some pretty extreme cooling. Also absolute performance per physical space matters a lot because that ALSO costs tons of money.
If a server manages an average of 50% load over all time; the Xeon's supposed superior power-efficiency would pay for itself after only 31 years.
Of course you're not taking into consideration that this test is pretty much irrelevant to the server market. Additionally, as the author failed to clarify when asked, Anandtech likely didn't use newer compilers which show up to a 100% performance increase in some applications ~ looky; http://www.phoronix.com/scan.php?page=article&...
Good job AMD, you had one thing to do, test your product and make sure it beat competitors at the same price, or gave comparable performance for a lower price.
Idiots like this is exactly why I say the review is biased. How can anyone with the ability to type be able to scan over this review and come to such a conclusion. At least with the confidence to comment.
Thanks Johan for the ungodly amount of time you and your team spent on this review, also thanks to all contributors to the comments which was very useful to get more context to someone like myself who is not very up to speed with server tech.
The 45 Watt and 65 watt Opterons not mentioned on the front page of the article (but mentioned in the comments - are these based on Interlagos?)
To me it looks like a big win for AMD - and these benchmarks are are not even optimised for the architecture (Linux kernel 3 was not used - can't wait to see updated benchmarks, something like FreeBSD or when we get an updated scheduler for the windows server OS's...should make a big difference.
Really low idle power consumption is nice, and Im planning to pick one of these up (for home use) to play around with FreeBSD, vm's, etc...just for training purposes,
The other point about Intel's sandybridge Xeons, these are just going to be 8 core 3960x right? Which may not change the current server landscape very much depending on their prices.
Respect is due, Johan! You did a very useful review under significant limitations. The very best part is to point an unbiased light at a damned interesting CPU. There is an important "next step," which I will address shortly.
As always, just the mention of AMD brings out hysterical attacks. One would think we were talking about Stem Cell research!! There is no real discussion -- it's pitchforks, lit torches, and a stake ready for poor Johan and anyone else ready and willing to consider the mere possibility that AMD have produced worthy technology!!
Computer technology - doing it, anyway - has changed. It's become ALL about the bloody money, and the "culture" of the people doing technology has also changed -- it has become much more cut-throat, there is far less collegiality, and the number of people willing to take risks on projects has become really uncommon. Qualified people doing serious technology just because they can is uncommon.
There is no end to posers (including some on this board), Machiavellian Fortune 500 IT managers, and "Project Managers" who are clueless (there ARE some great IT managers and wonderful PM's but their numbers are shrinking). My hat to those in Open Source - they are the Last Bastion of decency for the sake of decency, and technology merely for the joy of doing it !!
"Back in the day" people seemed really into the technology, solving difficult problems, and making good things happen. There was truly a culture. For example not taking a moment to help someone on the team or otherwise made you a jerk. Development was a craft or an art, and we were all in it together. We are loosing that, and it's become more dog-eat-dog with a kind of mean spirit. What a shame. Many of the comments here are perfect examples -- people who would rather burn down the temple than give a new and challenging technology a good think.
Personally I can't wait to get my hands on a couple of AMD's new CPU's, build a decent server, and carefully work out the issues with patience. These new Opterons are like a whole new tech that may be the beginning of all new territory.
My passion and some professional work is coding at the back end in C/C++ and I'm just beginning to understand CUDA and using GPU's to beef up parallel code. My work is all around (big) data warehousing, cutting edge columnar databases, almost everything running virtual, all the way through to analytics on BI platforms. I do all of that both on MS Server 2008, Solaris and FreeBSD. All that is a perfect environment to test AMD's "new territory."
Probably worth a blog at some point because these processors are new territory and using them well will take some work just keeping track of all the small things that shake out. That's the "next step" that this and other reviews require to really understand AMD's Bulldozers. Doing that well, if AMD is right with these chips, means being able to build some great back-end servers at a much more approachable price; more importantly without paying an "Intel" tax, and in the end having two strong vendors and thereby more freedom to make the best choice for the requirement.
But if u really want to see what the true story is, have a look at AMD's stock price lately, and their server wins. They absolutely smoke intel on virtualization, and anything that requires a lot of threads. It's not even close. That would be the reason this review pits Interlagos against an Intel processor that costs twice as much.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
106 Comments
Back to Article
DigitalFreak - Tuesday, November 15, 2011 - link
Good to see that CPU-Z correctly reports the 6276 as 8 core, 16 thread, instead of falling for AMD's marketing BS.N4g4rok - Tuesday, November 15, 2011 - link
If each module possess two integer cores to a shared floating point core, what's to say that it can't be considered as a practical 16 core?phoenix_rizzen - Tuesday, November 15, 2011 - link
Each module includes 2x integer cores, correct. But the floating point core is "shared-separate", meaning it an be used as two separate 128-bit FPUs or as a single 256 FPU.Thus, each Bulldozer module can run either 3 or 4 threads simultaneously:
- 2x integer + 2x 128-bit FP threads, or
- 2x integer + 1x 256-bit FP threads
It's definitely a dual-core module. It's just that the number of threads it can run is flexible.
The thing to remember, though, is that these are separate hardware pipelines, not mickey-moused hyperthreaded pipelines.
JohanAnandtech - Tuesday, November 15, 2011 - link
You can get into a long discussion about that. The way that I see it, is that part of the core is "logical/virtual", the other part is real in Bulldozer . What is the difference between an SMT thread and CMT thread when they enter the fetch-decode stages? Nothing AFAIK, both instructions are interleaved, and they both have a "thread tag".The difference is when they are scheduled, the instructions enters a real core with only one context in the CMT Bulldozer. With SMT, the instructions enter a real core which still interleave two logical contexts. So the core still consists of two logical cores.
It is gets even more complicated when look at the FP "cores". AFAIK, the FP cores of Interlagos are nothing more than 8 SMT enabled cores.
alpha754293 - Tuesday, November 15, 2011 - link
I think that Johan is partially correct.The way I see it, the FPU on the Interlagos is this:
It's really a 256-bit wide FPU.
It can't really QUITE separate the ONE physical FPUs into two 128-bit wide FPUs, but it more probably in reality, interleaves them (which is really just code for "FPU-starved").
Intel's original HTT had this as a MAJOR problem, because the test back then can range from -30% to +30% performance increase. Floating-point intensive benchmarks have ALWAYS suffered mostly because suppose you're writing a calculator using ONLY 8-byte (64-bit) double precision.
NORMALLY, that should mean that you should be able to crunch through four DWORDs at the same time. And that's kinda/sorta true.
Now, if you are running two programs, really...I don't think that the CPU, the compiler (well..maybe), the OS, or the program knows that it needs to compile for 128-bit-wide FPUs if you're going to run two instances or two (different) calculators.
So it's resource starved in trying to do the calculation processes at the same time.
For non-FPU-heavy workloads, you can get away with that. For pretty much the entire scientific/math/engineering (SME) community; it's an 8-core processor or a highly crippled 16-core processor.
Intel's latest HTT seems to have addressed a lot of that, and in practical terms, you can see upwards of 30% performance advantage even with FPU-heavy workloads.
So in some cases, the definition of core depends on what you're going to be doing with it. For SME/HPC; it's good cuz it can do 12-actual-cores worth of work with 8 FPUs (33% more efficient), but sucks because unless they come out with a 32-thread/16-core monolithic die; as stated, it's only marginally better than the last. It's just cheaper. And going to get incrementally faster with higher clock speeds.
alpha754293 - Tuesday, November 15, 2011 - link
P.S. Also, like Anand's article about nVidia Optimus:Context switching even at the CPU level, while faster, is still costly. Perhaps maybe not nearly as costly as shuffling data around; but it's still pretty costly.
Samus - Wednesday, November 16, 2011 - link
Ouch, this is going to be AMD's Itanium. That is, it has architecture adoption problems that people simply won't build around. Maybe less substantial than IA64, but still a huge performance loss because of underutilized integer units.leexgx - Wednesday, November 16, 2011 - link
think they way CPU-z reporting it for BD cpus is correct each core has 2 FP, so 8 cores and 16 threads is correctto bad windows does not understand how to spread the load correctly on an amd cpu (windows 7 with HT cpus Intel works fine, spreads the load correctly, SP1 improves that more but for Intel cpus only)
windows 7 sp1 makes biger use of core parking and gives better cpu use on Intel cpus as i have been seeing on 3 systems most work loads now stay on the first 2 cores and the other 2 stay parked, on amd side its still broke with cool and quite enabled
Stuka87 - Tuesday, November 15, 2011 - link
So, what is your definition of a core?Bulldozers do not utilize hyper threading, which takes a single integer core and can at times put two threads into that single integer core. A Bulldozer core has actual hardware two run two threads at the same time. This would suggest there are two physical cores.
Does it perform like an intel 16 core (if there was such a thing), no. But that does not mean that it is not in fact a 16 core device. As the hardware is there. Yes they share an FPU, but that doesn't mean they are not cores.
Filiprino - Tuesday, November 15, 2011 - link
Actually, Bulldozer is 16 cores. It has two dedicated integer units and a float point unit which can act as two 128 bit units or one 256 bit unit for AVX. So, you can have 2 and 2 per module.Bulldozer does not use hyperthreading.
Kevin G - Tuesday, November 15, 2011 - link
I'm curious if CPU-Z polls the hardware for this information or if it queries a database to fetch this information. If it is getting the core and thread count from hardware, it maybe configurable. So while the chip itself does not use Hyperthreading, it maybe reporting to the OS that does it by default. This would have an impact in performance scaling as well as power consumption as load increases.MrSpadge - Tuesday, November 15, 2011 - link
They are integer cores, which share few ressources besides the FPU. On the Intel side there are two threads running concurrently (always, @Stuka87) which share a few less ressources.Arguing which one deserves the name "core" and which one doesn't is almost a moot point. However, both designs are nto that different regarding integer workloads. They're just using a different amount of shared ressources.
People should also keep in mind that a core does not neccessaril equal a core. Each Bulldozer core (or half module) is actually weaker than in Athlon 64 designs. It got some improvements but lost in some other areas. On the other hand Intels current integer cores are quite strong and fat - and it's much easier to share ressources (between 2 hyperthreaded treads) if you've got a lot of them.
MrS
leexgx - Wednesday, November 16, 2011 - link
but on Intel side there are only 4 real cores with HT off or on (on an i7 920 seems to give an benefit, but on results for the second gen 2600k HT seems less important)where as on amd there are 4 cores with each core having 2 FP in them (desktop cpu) issue is the FPs are 10-30% slower then an Phenom cpu clocked at the same speed
anglesmith - Tuesday, November 15, 2011 - link
which version of windows 2008 R2 SP1 x64 was used enterprise/datacenter/standard?Lord 666 - Tuesday, November 15, 2011 - link
People who are purchasing SB-E will be doing similar stuff on workstations. Where are those numbers?Kevin G - Tuesday, November 15, 2011 - link
Probably waiting in the pipeline for SB-E base Xeons. Socket LGA-2011 based Xeon's are still several months away.Sabresiberian - Tuesday, November 15, 2011 - link
I'm not so sure I'd fault AMD too much because 95% of the people that their product users, in this case, won't go through the effort of upgrading their software to get a significant performance increase, at least at first. Sometimes, you have to "force" people to get out of their rut and use something that's actually better for them.I freely admit that I don't know much about running business apps; I build gaming computers for personal use. I can't help but think of my Father though, complaining about Vista and Win 7 and how they won't run his old, freeware apps properly. Hey, Dad, get the people that wrote those apps to upgrade them, won't you? It's not Microsoft's fault that they won't bring them up to date.
Backwards compatibility can be a stone around the neck of progress.
I've tended to be disappointed in AMD's recent CPU releases as well, but maybe they really do have an eye focused on the future that will bring better things for us all. If that's the case, though, they need to prove it now, and stop releasing biased press reports that don't hold up when these things are benched outside of their labs.
;)
JohanAnandtech - Tuesday, November 15, 2011 - link
The problem is that a lot of server folks buy new servers to run the current or older software faster. It is a matter of TCO: they have invested a lot of work into getting webapplication x.xx to work optimally with interface y.yy and database zz.z. The vendor wants to offer a service, not a the latest technology. Only if the service gets added value from the newest technology they might consider upgrading.And you should tell your dad to run his old software in virtual box :-).
Sabresiberian - Wednesday, November 16, 2011 - link
Ah I hadn't thought of it in terms of services, which is obvious now that you say it. Thanks for educating me!;)
IlllI - Tuesday, November 15, 2011 - link
amd was shooting to capture 25% of the market? (this was like when the first amd64 chips came out)neotiger - Tuesday, November 15, 2011 - link
Most of the benchmarks are for rendering: Cinebench, 3DSMax, Maxwell, Blender, etc.How many enterprises actually do 3D rendering?
Far more common enterprise applications would be RDBMS, data warehouse, OLTP, JVM, app servers, etc.
You touched on some of that in just one virtualization benchmark, vApus. That doesn't make sense either - how many enterprises you know run database servers on VM?
A far more useful review would be running separate benchmarks for OLTP, OLAP, RDBMS, JVM, etc. tppc, tpce, tpch would be a good place to start
JohanAnandtech - Tuesday, November 15, 2011 - link
I definitely would like to stay close to what people actually use.In fact we did that:
http://www.anandtech.com/show/2694
But the exploding core counts made it as good as impossible.
1. For example, a website that scales to 32 cores easily: most people will be amazed how many websites have trouble scaling beyond 8 cores.
2. Getting an OLTP database to scale to 32 cores is nothing to sneeze at. If your database is small and you run most of it in memory, chances are that you'll get a lot of locks and that it won't scale anyway. If not, you'll need several parallel RAID cards which have a lot of SSDs. We might pull that one off (the SSDs), but placing several RAID cards inside a server is most of the time not possible. once you solve the storage bottleneck, other ones will show up again. Or you need an expensive SAN... which we don't have.
We had an OLAP/ OLTP and Java benchmarks. And they were excellent benchmarks, but between 8 and 16 cores, they started to show decreasing CPU utilization despite using SSDs, tweaking etc.
Now puts yourself in our place. We can either spend weeks/months getting a database/website to scale (and we are not even sure it will make a real repeatable benchmark) or we can build upon our virtualization knowledge knowing that most people can't make good use of a native 32 core database anyway (or are bottlenecked by I/O and don't care anyway), and buy their servers to virtualize.
At a certain point, we can not justify to invest loads of time in a benchmark that only interest a few people. Unless you want to pay those people :-). Noticed that some of the publications out there use geekbench (!) to evaluate a server? Noticed how many publication run virtualization benchmarks?
"That doesn't make sense either - how many enterprises you know run database servers on VM?"
Lots of people. Actually besides a few massive Oracle OLTP databases, there is no reason any more not to virtualized your databases. SQL server and MySQL are virtualized a lot. Just googling you can find plenty of reports of MySQL and SQL server on top of ESX 4. Since vSphere 4 this has been common practice.
"etc. tppc, tpce, tpch would be a good place to start "
No not really. None of the professional server buyers I know cares about TPC benches. The only people that mentione them are the marketing people and hardware enthusiast that like to discuss high-end hardware.
So you prefer software that requires 300.000$ of storage hardware over a very realistic virtualization benchmarks which are benchmarked with real logs of real people?
Your "poor benchmark choice" title is disappoing after all the time that my fine colleagues and me have spend on getting a nice website + groupware virtualization benchmark running which is stresstested by vApus which uses real logs of real people. IMHO, the latter is much more interesting than some inflated TPC benchmarks with storage hardware that only the fortune 500 can afford. Just HMO.
neotiger - Tuesday, November 15, 2011 - link
While scaling to 32 cores can be problematic for some software, it's worth keeping in mind that the vast majority of dual-socket servers don't have 32 cores.In fact, a dual-CPU Intel server only has *at most* 12 cores, that's a far cry from 32-cores. Postgresql & MySQL has no problem at all to scale to 12 cores and beyond.
Now if AMD decided to make a CPU with crappy per-core performance but has so many cores that most software can't take full advantage of, that's their own fault. It's not like they haven't been warned. Sun tried and failed with the same approach with T2. If AMD is hellbent on making the same mistake, they only have themselves to blame.
My post title is a bit harsh. But it is disappointing to see a review that devotes FOUR separate benchmarks to 3D rendering, an application that the vast majority of enterprises have no use for at all. Meanwhile, the workhorse applications for most enterprises, OLTP, OLAP, and such, received far too little attention.
tiro_uspsss - Wednesday, November 16, 2011 - link
"In fact, a dual-CPU Intel server only has *at most* 12 cores..."Incorrect. There is s1567. This allows 2-8 CPUs, with a max. of 8C/16T per CPU......... which I'm wondering why Anandtech failed to include in this review?
s1567 CPUs also have quad channel memory...
I really wish s1567 was included in this review..
Photubias - Wednesday, November 16, 2011 - link
Intel's S1567?You mean the E7-8830 CPU from the E7-8800 series which has prices *starting* at $2280?
-> http://ark.intel.com/products/series/53672
bruce24 - Wednesday, November 16, 2011 - link
"You mean the E7-8830 CPU from the E7-8800 series which has prices *starting* at $2280?"I'm not sure what he meant, but there are E7-2xxx processors for dual socket servers, which are priced much lower than the E7-8xxx processors which are for 8+ socket servers.
Photubias - Thursday, November 17, 2011 - link
You mean the E7-28xx serieshttp://ark.intel.com/products/series/53670 ?
They are priced a bit lower, is there a comparison you suggest?
Sabresiberian - Wednesday, November 16, 2011 - link
I have trouble understanding why people think a review should include research into every other similar product that might be used for the same purpose.I mean, I can understand ASKING for a review of another specific product, particularly if you've actually done some research on your own and haven't found the information you want, but to imply a review isn't complete because it didn't mention or test another piece of hardware is a bit - unrealistic.
;)
JohanAnandtech - Thursday, November 17, 2011 - link
Sabresiberian, a very sincere thank you for being reasonable. :-)Frankly I can't imagine a situation where someone would have trouble to decide between a Westmere-EX and an AMD CPU. Most people checking out the Westmere-EX go for the RAS features (dual) or RAS + ultimate high thread performance (Quad). In all other cases dual Xeon EP or Opterons make more sense power and pricewise.
JustTheFacts - Thursday, November 17, 2011 - link
Really? Is it that much trouble to understand that people want to see the latest AMD cpu's compared to the most current generation of Intel hardware? Especially when the previous Intel processor review posted on this site reported on Westmere-EX performance? I have trouble understanding why people wouldn't expect it.geoxx - Friday, December 9, 2011 - link
Sorry but neotiger is totally right, choice of benchmark sucks. We are not helped *at all* by your review.What company 32-core server is being used for 3D rendering, cinebench, file compression, truecrypt encryption??
You benchmarked it like it was a CPU of the nineties for a home enthusiast.
You are probably right pointing us to http://www.anandtech.com/show/2694 but your benchmarks don't reflect that AT ALL. Where are file compression, encryption, 3D rendering and cinebench in that chart?
Even performances per watt is not very meaningful because when one purchases a 2-socket or 4-socket server, electricity cost is not an issue. Companies want to simplify deployment with such a system, they want this computer to run as fast as a cluster, in order not to be bound to cluster databases which are a PAIN. So people want to see scalability of applications to full core count on this kind of system, not so much performances per watt.
Virtualization is the ONLY senseful benchmark you included.
TPC as suggested is a totally right benchmark, that's the backend and bottleneck for most of the things you see in your charts at http://www.anandtech.com/show/2694 , and objection on storage is nonsense, just fit a database in ramdisk (don't tell me you need a database larger than 64GB for a benchmark), export as block device, then run the test. And/or use one PCI-e based SSD which you certainly have.
http://www.anandtech.com/show/2694 mentions software development: how much effort does it require to set up a linux kernel compile benchmark?
http://www.anandtech.com/show/2694 mentions HPC: can you set up a couple of bioinformatics benchmarks such as BLAST (integer computation, memory compare), GROMACS (matrix FPU computations) and Fluent? Please note that none of your tests includes memory compares and FPU which are VERY IMPORTANT in HPC. Gromacs and fluent would cover the hole. Bioinformatics is THE hpc of nowdays and there are very few websites, if any, which help with the choice of CPUs for HPC computing.
For email servers (37%!) and web servers (14%) also I am sure you can find some benchmarks.
Iketh - Tuesday, November 15, 2011 - link
I'm not sure how the discovery of cores running in their power-saving state for far too long is anything new. My 2600k refuses to ramp up clocks while previewing video in a video editor even though a core is pegged at 100%. If I intervene and force it to 3.4ghz, preview framerate jumps from 8 fps to 16fps.This has been happening for YEARS! My old quad Phenom 2.2ghz did the exact same thing!
It's extremely annoying and pisses me off I can't benefit from the power savings, let alone turbo.
MrSpadge - Tuesday, November 15, 2011 - link
Sounds like you're running linux or some other strange OS, then. Or you may need a bios update. Generally Intel has its power management quit under control. In the AMD camp physical power state switches often take longer than the impatient OS expects, and thus average frequency is hurt. This was pretty bad for Phenom 1.MrS
Iketh - Tuesday, November 15, 2011 - link
win7 home premium x64 and the phenom was with xp 32bit... i haven't found another scenario that causes this, only streaming video that's rendered on-the-flyZoomer - Wednesday, November 16, 2011 - link
You have a 2600k and aren't running it at 4+ GHz?Iketh - Wednesday, November 16, 2011 - link
4.16 @ 1.32v when encoding, 3.02 @ 1.03v for gaming/internethaplo602 - Wednesday, November 16, 2011 - link
you do know that Linux did not have any problems with Phenom I power management unlike Windows ? Same is not with BD. Linux benchmarks look quite different from Windows and the gap is not that dramatic there.BrianTho2010 - Tuesday, November 15, 2011 - link
This whole review, the only thought I have is that there are no sandy bridge chips in it. When SB based Xeon chips come out I bet that Interlagos will be completely dominated.Beenthere - Tuesday, November 15, 2011 - link
Not really. SB chips don't fit in AMD sockets. AMD's installed customer base like the significant performance increase and power savings by just plugging in a new Opteron 6200/4200.C300fans - Tuesday, November 15, 2011 - link
It will. 2x6174 (24 cores) perform quite similar to 2x6274(32 cores). WTFveri745 - Tuesday, November 15, 2011 - link
Shouldn't there be 8 x 2MB L2 for Interlagos instead of just 4x?ClagMaster - Tuesday, November 15, 2011 - link
A core this complex in my opinion has not been optimized to its fullest potential.Expect better performance when AMD introduces later steppings of this core with regard to power consumption and higher clock frequencies.
I have seen this in earlier AMD and Intel Cores, this new core will be the same.
C300fans - Tuesday, November 15, 2011 - link
1x i7 3960x or 2x Interlagos 6272? It is up to you. Money cow.tech6 - Tuesday, November 15, 2011 - link
We have a bunch of 6100 in our data center and the performance has been disappointing. They do no better in single thread performance than old 73xx series Xeons. While this is OK for non-interactive stuff, it really isn't good enough for much else. These results just seem to confirm that the Bulldozer series of processors is over-hyped and that AMD is in danger of becoming irrelevant in the server, mobile and desktop market.mino - Wednesday, November 16, 2011 - link
Actually, for interactive stuff (read VDI/Citrix/containers) core counts rule the roost.duploxxx - Thursday, November 17, 2011 - link
this is exactly what should be fixed now with the turbo when set correct, btw the 73xx series were not that bad on single thread performance, it was wide scale virtualization and IO throughput which was awefull one these systems.alpha754293 - Tuesday, November 15, 2011 - link
"Let us first discuss the virtualization scene, the most important market." Yea, I don't know about that.Considering that they've already shipped like some half-a-million cores to the leading supercomputers of the world; where some of them are doing major processor upgrades with this new release; I wouldn't necessarily say that it's the most IMPORTANT market. Important, yes. But MOST important...I dunno.
Looking forward to more HPC benchmark results.
Also, you might have to play with thread schedule/process affinity (masks) to make it work right.
See the Techreport article.
JohanAnandtech - Thursday, November 17, 2011 - link
Are you talking about the Euler3D benchmark?And yes, by any metric (revenue, servers sold) the virtualization market is the most important one for servers. Depending on the report 60 to 80% of the servers are bought to be virtualized.
alpha754293 - Tuesday, November 15, 2011 - link
Folks: chip-multithreading (CMT) is nothing new.I would explain it this way: it is the physical, hardware manifestation of simultaneous multi-threading (SMT). Intel's HTT is SMT.
IBM's POWER (since I think as early as POWER4), Sun/Oracle/UltraDense's Niagara (UltraSPARC T-series), maybe even some of the older Crays were all CMT. (Don't quote me on the Crays though. MIPS died before CMT came out. API WOULD have had it probably IF there had been an EV8).
But the way I see it - remember what a CPU IS: it's a glorified calculator. Nothing else/more.
So, if it can't calculate, then it doesn't really do much good. (And I've yet to see an entirely integer-only program).
Doing integer math is fairly easy and straightforward. Doing floating-point math is a LOT harder. If you check the power consumption while solving a linear algebra equation using Gauss elimination (parallelized or using multiple instances of the solver); I can guarantee you that you will consume more power than if you were trying to run VMs.
So the way I see it, if a CPU is a glorified calculator, then a "core" is where/whatever the FPU is. Everything else is just ancillary and that point.
mino - Wednesday, November 16, 2011 - link
1) Power is NOT CMT, it allways was a VERY(even by RISC standards) wide SMT design.2) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.
Bulldozer indeed IS a first of its kind. With all the associated advantages(future scaling) and disadvantages(alfa version).
There is a nice debate somewhere on cpu.arch groups from the original author(think 1990's) of the CMT concept.
JohanAnandtech - Thursday, November 17, 2011 - link
1) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.
I haven't studied the latest Niagaras but the T1 was a fine grained mult-threaded CPU. It switched like a gatling gun between threads, and could not execute two threads at the same time.
Penti - Thursday, November 17, 2011 - link
SPARC T2 and onwards has additional ALU/AGU resources for a half physical two thread (four logically) solution per core with shared scheduler/pipeline if I remember correctly. That's not when CMT entered the picture according to SUN and Sun engineers any way. They regard the T1 as CMT as it's chip level. It's not just a CMP-chip any how. SMT is just running multiple threads on the cpus, CMP is working the same as SMP on separate sockets. It is not the same as AMDs solution however.Phylyp - Tuesday, November 15, 2011 - link
Firstly, this was a very good article, with a lot of information, especially the bits about the differences between server and desktop workloads.Secondly, it does seem that you need to tune either the software (power management settings) or the chip (CMT) to get the best results from the processor. So, what advise is AMD offering its customers in terms of this tuning? I wouldn't want to pony up hundreds of dollars to have to then search the web for little titbits like switching off CMT in certain cases, or enabling High-performance power management.
Thirdly, why is the BIOS reporting 32 MB of L2 cache instead of 8 MB?
mino - Wednesday, November 16, 2011 - link
No need for tuning - turbo is OS-independent (unless OS power management explicitly disables it aka Windows).Just disable the power management on the OS level (= high performance fro Windows) and you are good to go.
JohanAnandtech - Thursday, November 17, 2011 - link
The BIOS is simply wrong. It should have read 16 MB (2 orochi dies of 8 MB L3)gamoniac - Tuesday, November 15, 2011 - link
Thanks, Johan. I run HyperV on Windows Server 2008 R2 SP1 on Phonem II X6 (my workstation) and have noticed the same CPU issue. I previously fixed it by disabling AMD's Cool'n'Quiet BIOS setting. After switching to high performance increase my overall power usage by 9 watts but corrected the CPU capping issue you mentioned.Yet another excellent article from AnandTech. Well done. This is how I don't mind spending 1 hour of my precious evening time.
mczak - Tuesday, November 15, 2011 - link
L1 data and instruction cache are swapped (instruction is 8x64kB 2-way data is 16x16kB 4-way)L2 is 8x2MB 16-way
JohanAnandtech - Thursday, November 17, 2011 - link
fixed. My apologies.hechacker1 - Tuesday, November 15, 2011 - link
Curious if those syscalls for virtualization were improved at all. I remember Intel touting they improved the latency each generation.http://www.anandtech.com/show/2480/9
I'm guessing it's worse considering the increased general cache latency? I'm not sure how the latency, or syscall, is related if at all.
Just curious as when I do lots of compiling in a guest VM (Gentoo doing lots of checking of packages and hardware capabilities each compile) it tends to spend the majority of time in the kernel context.
hechacker1 - Tuesday, November 15, 2011 - link
Just also wanted to add: Before I had a VT-x enabled chip, it was unbearably slow to compile software in a guest VM. I remember measuring latencies of seconds for some operations.After getting an i7 920 with VT-x, it considerably improved, and most operations are in the hundred or so millisecond range (measured with latencytop).
I'm not sure how the latests chips fare.
mino - Wednesday, November 16, 2011 - link
IT had most likely to do with you running it on NetBurst (judging by no VT-X moniker).As much to do with VT-X as with a crappy CPU ... wiht bus architecture ah, thank god they are dead.
JustTheFacts - Wednesday, November 16, 2011 - link
Please explain why there is no comparison between the latest AMD processors to Intel's flagship two-way server processors: the Intel Westmere-EX e7-28xx processor family?Lest you forgot about them, you can find your own benchmarks of this flagship Intel processor here: http://www.anandtech.com/show/4285/westmereex-inte...
Take the gloves off and compare flagship against flagship please, and then scale the results to reflect the price differece if you have to, but there's no good reason not to compare them that I can see. Thanks.
duploxxx - Thursday, November 17, 2011 - link
Westmere EX 2sockets is dead, will be killed by own intel platform called romley which will have 2p and 4p.it was a stupid platform from the start and overrated by sales/consultants with there so called huge memory support.
aka_Warlock - Wednesday, November 16, 2011 - link
I think you should have done a more thorough VM test than you did. 64GB RAM?We all know single threaded performance is weak, but I still feel the server are underutilized in your test.
These CPU's are screaming heavy multi threading workloads. Many VM's. Many vCPU's.
What would the performance be if you had, say, at least 192GB of RAM and 50 (maybe more) VM's on it?
And offcourse, storage should not be a bottleneck.
I think this is where his 8modules/16threads cpu would shine.
A dual socket rack/blade. 16modules/32 threads.
Loads of RAM and a bounch of VM's.
iwod - Wednesday, November 16, 2011 - link
It is power hungry, isn't any better then Intel, and it is only slightly cheaper, at the cost of higher electricity bill.So unless with some software optimization that magically show AMD is good at something, i think they are pretty much doomed.
It is like Pentium 4, except Intel can afford making one or two mistakes, but not with AMD.
mino - Wednesday, November 16, 2011 - link
Then the article served its purpose well.SunLord - Wednesday, November 16, 2011 - link
So is the AMD system running 8GB DDR3-1600 DIMMS or 4GB DDR3-1333? Because you list the same DDR3-1333 model for both systems and if the Server supports 16 DIMMs well 16*4 is 64GBJohanAnandtech - Thursday, November 17, 2011 - link
Copy and paste error, Fixed. We used DDR-3 1600 (Samsung)Johnmcl7 - Wednesday, November 16, 2011 - link
I have wondered about this, with more cores per socket and virtualisation (organising new set of servers and buying far less hardware for the same functionality) so I'd have thought in total less server hardware is being purchased. Clearly that isn't the case though, is the money made back from more expensive servers?John
bruce24 - Wednesday, November 16, 2011 - link
While sure which each new generation of server you need much less hardware to do the same amount of work, however worldwide people are looking for servers to do much more work. Each year companies like Google, Facebook, Amazon, Microsoft and Apple add much more computing power than they could get by refreshing their current servers.mino - Wednesday, November 16, 2011 - link
More workload ... also you need at least 3 servers for any meaningful redundancy ... even when only needing the power of 1/4 of iether of them.BTW. most cpu's sold in the SMB space are far cry from the 16-core monsters reviewed here ...
JohanAnandtech - Thursday, November 17, 2011 - link
Don't forget the big "Cloud" buyers. Facebook has increased the numbers of server from 10.000 somewhere in 2008 tot 10 times more in 2011. That is one of the reasons why the number of units is still growing.roberto.tomas - Wednesday, November 16, 2011 - link
seems like the front page write and this article are from different versions:from the write up: "Each of the 16 integer threads gets their own integer cluster, complete with integer executions units, a load/store unit, and an L1-data cache"
from the article: "Cores (Modules)/Threads 8/16 [...] L1 Data 8x 64 KB 2-way"
what is really surprising is calling them threads (I thought, like the write up on the front page, that they each had their own independent integer "unit"). If they have their own L1 cache, they are cores as far as I'm concerned. Then again, the article itself seems to suggest just that: they are threads without independent L1 cache.
ps> I post comments only like once a year -- please dont delete my account. every time I do, I have to register anew :D
mino - Wednesday, November 16, 2011 - link
I suits Intel better to call them threads ... so writers are ordered ... only if the pesky reality did not pop up here and there.BD 4200 series is an 1-chip, 4-module, 8(4*2)-core, 16(4*2)-thread processor
BD 6200 series is a 2-chip, 8(2*4)-module, 16(2*4*2)-core, 16(2*4*2)-thread processor
Xeon 5600 series is an (up to) 1-chip, 6-core, 12(6*2)-thread processor.
Simple as cake. :D
rendroid1 - Wednesday, November 16, 2011 - link
The L1 D-cache should be 1 per thread, 4-way, etc.The L1 I-cache is shared by 2 threads per "module", and is 2-way, etc.
JohanAnandtech - Thursday, November 17, 2011 - link
Yep. fixed. :-)Novality77 - Wednesday, November 16, 2011 - link
One thing that I never see in any reviews is remarks about the fact that more cores with lower IPC has added costs when it comes to licensing. For instance Oracle, IBM and most other suppliers charge per core. These costs can add up pretty fast. 10000 per core is not uncommon.....fumigator - Wednesday, November 16, 2011 - link
Great review as usual. I found all the new AMD opterons very interesting. Pairing two in a dual socket G34 would make a multitasking monster on the cheap, and quite future proof.Abour cores vs modules vs hyperthreading, people thinking AMD cores aren't true cores, should consider the following:
adding virtual cores on hyperthreading in intel platforms don't make performance increase 100% per core, but only less than 50%
Also if you look at intel processor photographs, you won't notice the virtual cores anywhere in the pictures.
While in interlagos/bulldozer you could clearly spot each core by its shape inside each module. What surprises me is how small they are, but that's for an entire different discussion.
MossySF - Wednesday, November 16, 2011 - link
I'm waiting to see the follow-up Linux article. The hints in this one confirm my own experiences. At our company, we're 99% FOSS and when using Centos packages, AMD chips run just as fast as Intel chips since it's all compiled with GCC instead of Intel's "disable faster code when running on AMD processors" compiler. As an example, PostgreSQL on native Centos is just as fast on Thuban compared to Sandy Bridge at the same GHz. And when you then virtualize Centos under Centos+KVM, Thuban is 35% faster. (Nehalem goes from 10% slower natively to 50% slower under KVM!)The compiler issue might be something to look at in virtualization tests. If you fake an Intel identifier in your VM, optimizations for new instruction sets might kick in.
http://www.agner.org/optimize/blog/read.php?i=49#1...
UberApfel - Wednesday, November 16, 2011 - link
Amazingly biased review from Anandtech.A fairer comparison would be between the Opteron 6272 ($539 / 8-module) and Xeon E5645 ($579 / 6-core); both common and recent processors.
Yet handpicking the higher clocked Opteron 6276 (for what good reason?) seems to be nothing but an aim to make the new 6200 series seem un-remarkable in both power consumption and performance. The 6272 is cheaper, more common, and would beat the Xeon X5670 in power consumption which half this review is weighted on. Otherwise you should've used the 6282 SE which would compete in performance as well as being the appropriate processor according to your own chart.
Even the chart on Page 1 is designed to make Intel look superior all-around. For what reason would you exclude the Opteron 4274 HE (65W TDP) or the Opteron 4256 EE (35W TDP) from the 'Power Optimized' section?
The ignorance on processor tiers is forgivable even if you're likely paid to write this... but the benchmarks themselves are completely irrelevant. Where's the IIS/Apache/Nginx benchmark? PostgreSQL/SQLite? Facebook's HipHop? Node.js? Java? Something relevant to servers and not something obscure enough to sound professional?
UberApfel - Wednesday, November 16, 2011 - link
If anyone finds me a madman; let me explain this simply by example. Benchmark choices aside...If this test were to compare any of the top or middle-tier processors on the "AMD vs. Intel 2-socket SKU Comparison" chart ( http://www.anandtech.com/show/5058/amds-opteron-in... ) with their matching competition; this article would tell a different story in essence. Which does in fact, regardless of how fair the written conclusion may be, makes it biased.
Examples:
X5650 vs 6282 SE
E5649 vs 6276
E5645 vs 6272
JohanAnandtech - Thursday, November 17, 2011 - link
"Yet handpicking the higher clocked Opteron 6276 (for what good reason?) seems to be nothing but an aim to make the new 6200 series seem un-remarkable in both power consumption and performance"Do you realize you are blaming AMD? That is the CPU they sent us.
"The 6272 is cheaper, more common, and would beat the Xeon X5670 in power consumption which half this review is weighted on."
The 6272 is nothing more than a lower speedbin of the 6276. It has the same power consumption but slightly lower performance. Performance/wat is thus worse.
"PostgreSQL/SQLite? Facebook's HipHop? Node.js? Java? Something relevant to servers and not something obscure enough to sound professional? "
We use Zimbra, Phpbb, Apache, MySQL. What is your point? that we don't include every server software on the planet? If you look around how many publications are running good repeatable server benchmarks? If it would be so easy as running Cinebench or Truecrypt, I think everybody would be.
"Even the chart on Page 1 is designed to make Intel look superior all-around. For what reason would you exclude the Opteron 4274 HE (65W TDP) or the Opteron 4256 EE (35W TDP) from the 'Power Optimized' section?"
To be honest, those CPUs were not even in AMD's presentation that we got. We were only briefed about Interlagos.
UberApfel - Thursday, November 17, 2011 - link
Did they send you the Xeon X5670 also? I suppose who's ever handling media relations at AMD is either careless or disgruntled. eg. Sending a slightly overclocked processor with a 30% staple that happens to scale unusually bad in terms of power efficiency.Please just answer this honestly; if you had compared a Opteron 6272 w/ a E5645 ... would your article present a different story?
Fair as you may have tried to be; you don't have to look far to find a comment here that came to the "BD is a joke" conclusion.
---
Using a phpbb stress test is hardly useful or relevent as a server benchmark; nevermind under a VM. Unless configured extensively; it's I/O bound. "Average Response Time" is also irrelevant; how is the reader to know if your 'response time' does not favor processors better with single-threaded applications?
Additionally; VM's on a better single-threaded processor will score higher in benchmarks due to the overhead as parallelism isn't optimized. Yet these results make zero sense in real-world usage. It contradicts the value of VM's; flexible scalability for low-usage applications.
Finally; I'd estimate that less than 5% of servers are virtual (if that). VM's are most popular with web servers and even there they have a small market share as they only appeal to small clients. Large clients use clusters of dedicated; tiny clients use shared dedicated.
Did you even use gcc 4.7 or Open64? In some applications; the new versions yield up to 300% higher performance for Bulldozer.
JohanAnandtech - Thursday, November 17, 2011 - link
"if you had compared a Opteron 6272 w/ a E5645 ... would your article present a different story?"You want us to compare a $551 80W TDP Intel cpu with a $774 115 AMD CPU?
"Unless configured extensively; it's I/O bound."
We know how to monitor with ESX top. There is a reason why we have a disk system of 2 SDDs and 6 x 15k SAS disks.
"Average Response Time" is also irrelevant
Huh? That is like saying that 0-60 mph acceleration times are irrelevant to sports cars.
"Finally; I'd estimate that less than 5% of servers are virtual (if that)"
....Your estimate unfortunately was true in 2006. We are 2011 now. Your estimate is 10x off, maybe more.
UberApfel - Thursday, November 17, 2011 - link
"You want us to compare a $551 80W TDP Intel cpu with a $774 115 AMD CPU?"$539
"The 6272 is nothing more than a lower speedbin of the 6276. It has the same power consumption but slightly lower performance. Performance/wat is thus worse."
By your logic; the FX-8120 and FX-8150 have equal power consumption. They don't.
"We know how to monitor with ESX top. There is a reason why we have a disk system of 2 SDDs and 6 x 15k SAS disks."
It's still I/O bound unless configured extensively.
"Huh? That is like saying that 0-60 mph acceleration times are irrelevant to sports cars."
Yeah; it is if you're measuring the distance traveled by a number of cars. The opteron is obviously slower in handling single requests but it can handle maybe twice as many at the same time. Unless your stress test made every request @ T=0 and your server successfully qued them all, dropped none, and included the que time in the response time... it would favor the xeon immensely. Perhaps it does do all this; which is why I said "how is the reader to know" when you could have just as easily done 'Average Requests Completed Per Second'.
"....Your estimate unfortunately was true in 2006. We are 2011 now. Your estimate is 10x off, maybe more."
Very funny. Did the salesman that told you that also recommend these benchmarks? Folklore tells that Google alone has over a million servers, 20X that of Rackspace or ThePlanet, and they aren't running queries on VM's.
boomshine - Thursday, November 17, 2011 - link
I hope you included MS SQL 2008 performance just like in opteron 6174 review:http://www.anandtech.com/show/2978/amd-s-12-core-m...
JohanAnandtech - Thursday, November 17, 2011 - link
Yes, that test failed to be repeatable for some weird reason. We will publish it as soon as we get some reliable numbers out of it.JohanAnandtech - Thursday, November 17, 2011 - link
"SMT can only execute a single thread at once. "The whole point of SMT is to have one thread in one execution and another thread in the other execution slot.
In fact, the very definition of SMT is that two or more threads can execute in parallel on a superscalar execution engine.
TC2 - Thursday, November 17, 2011 - link
another joke from AMD with their BD "server-centric" architecture - bla-bla! amd 8\16 against intel 6\12 and again can't win!pcfxer - Thursday, November 17, 2011 - link
" make of lots of DLLs--or in Linux terms, they have more dependencies"Libraries is the word you're looking for.
I also see the mistake of mixing programming APIs/OS design/Hardware design...
Good software has TLBs, asynchronous locking where possible, etc, as does hardware but they are INDEPENDENT. The glue as you know, is how compiled code is treated at the uCode level. IMO, AMD hardware is fully capable of outperforming Intel hardware, but AMD uCode is incredibly good.
duploxxx - Thursday, November 17, 2011 - link
Very interesting review as usual Johan, thx. It is good to see that there are still people who want to thoroughly make reviews.While the message is clear on the MS OS of both power and performance i think it isn't on the VMware. First of all it is quite confusing to what settings exactly have been used in BIOS and to me it doesn't reflect the real final conclusion. If it ain't right then don't post it to my opinion and keep it for further review....
I have a beta version of interlagos now for about a month and the performance testing depending on bios settings have been very challenging.
When i see your results i have following thoughts.
performance: I don't think that the current vAPU2 was able to stress the 2x16core enough, what was the avarage cpu usage in ESXTOP during these runs? On top of that looking at the result score and both response times it is clear that the current BIOS settings aren't optimal in the balanced mode. As you already mentioned the system is behaving strange.
VMware themselves have posted a document for v5 regarding the power best practices which clearly mentions that these needs to be adapted. http://www.vmware.com/files/pdf/hpm-perf-vsphere5....
To be more precise, balanced has never been the right setting on VMware, the preferred mode has always been high performance and this is how we run for example a +400 vmware server farm. We rather use DPM to reduce power then to reduce clock speed since this will affected total performance and response times much more, mainly on the virtualization platform and OEM bios creations (lets say lack of in depth finetuning and options).
Would like to see new performance results and power when running in high performance mode and according the new vSphere settings....
JohanAnandtech - Thursday, November 17, 2011 - link
"l it is quite confusing to what settings exactly have been used in BIOS and to me it doesn't reflect the real final conclusion"http://www.anandtech.com/show/5058/amds-opteron-in...
You can see them here with your own eyes.
+ We configured the C-state mode to C6 as this is required to get the highest Turbo Core frequencies
"performance: I don't think that the current vAPU2 was able to stress the 2x16core enough, what was the avarage cpu usage in ESXTOP during these runs?"
93-99%.
"On top of that looking at the result score and both response times it is clear that the current BIOS settings aren't optimal in the balanced mode."
Balanced and high performance gave more or less the same performance. It seems that the ESX power manager is much better at managing p-states than the Windows one.
We are currently testing Balanced + c-states. Stay tuned.
duploxxx - Thursday, November 17, 2011 - link
thx for answers, i read the whole thread, just wasn't sure that you took the same settings for both windows and virtual.according to Vmware you shouldn't take balanced but rather OS controlled, i know my BIOS has that option, not sure for the supermicro one.
quite a strange result with the ESXTOP above 90% with same performance results, there just seems to be a further core scaling issue on the vAPU2 with the performance results or its just not using turbo..... we know that the module doesn't have the same performance but the 10-15% turbo is more then enough to level that difference which would still leave you with 8 more cores
When you put the power mode on high performance it should turbo all cores for the full length at 2.6ghz for the 6276, while you mention it results in same performance are you sure that the turbo was kicking in? ESXTOP CPU higher then 100%? it should provide more performance....
Calin - Friday, November 18, 2011 - link
You're encrypting AES-256, and Anand seem to encryrpt AES-128 in the article you liked to in the Other Tests: TrueCrypt and 7-zip pagetaltamir - Friday, November 18, 2011 - link
Conclusion: "Intel gives much better performance/watt and performance in general; BD gives better performance/dollar"Problem: Watts cost dollars, lots of them in the server space because you need to some some pretty extreme cooling. Also absolute performance per physical space matters a lot because that ALSO costs tons of money.
UberApfel - Sunday, November 20, 2011 - link
A watt-year is about $2.The difference in cost between a X5670 & 6276; $654
On Page 7...
X5670: 74.5 perf / 338 W
6276: 71.2 perf / 363 W
adjusted watt-per-performance for 6276: 363 * (74.5 / 71.2) = 380
difference in power consumption: 42W
If a server manages an average of 50% load over all time; the Xeon's supposed superior power-efficiency would pay for itself after only 31 years.
Of course you're not taking into consideration that this test is pretty much irrelevant to the server market. Additionally, as the author failed to clarify when asked, Anandtech likely didn't use newer compilers which show up to a 100% performance increase in some applications ~ looky; http://www.phoronix.com/scan.php?page=article&...
Thermalzeal - Monday, November 21, 2011 - link
Good job AMD, you had one thing to do, test your product and make sure it beat competitors at the same price, or gave comparable performance for a lower price.Seriously, wtf are you people doing?
UberApfel - Tuesday, November 22, 2011 - link
Idiots like this is exactly why I say the review is biased. How can anyone with the ability to type be able to scan over this review and come to such a conclusion. At least with the confidence to comment.zappb - Tuesday, November 29, 2011 - link
completely agree - some very strange comments along these lines over the last 11 pageszappb - Tuesday, November 29, 2011 - link
posted by ars technica - incredibly tainted in intels favourThe title is enough:
"AMD's Bulldozer server benchmarks are here, and they're a catastrophe"
zappb - Tuesday, November 29, 2011 - link
Thanks Johan for the ungodly amount of time you and your team spent on this review, also thanks to all contributors to the comments which was very useful to get more context to someone like myself who is not very up to speed with server tech.The 45 Watt and 65 watt Opterons not mentioned on the front page of the article (but mentioned in the comments - are these based on Interlagos?)
To me it looks like a big win for AMD - and these benchmarks are are not even optimised for the architecture (Linux kernel 3 was not used - can't wait to see updated benchmarks, something like FreeBSD or when we get an updated scheduler for the windows server OS's...should make a big difference.
Really low idle power consumption is nice, and Im planning to pick one of these up (for home use) to play around with FreeBSD, vm's, etc...just for training purposes,
The other point about Intel's sandybridge Xeons, these are just going to be 8 core 3960x right? Which may not change the current server landscape very much depending on their prices.
JWesterby - Friday, February 10, 2012 - link
Respect is due, Johan! You did a very useful review under significant limitations. The very best part is to point an unbiased light at a damned interesting CPU. There is an important "next step," which I will address shortly.As always, just the mention of AMD brings out hysterical attacks. One would think we were talking about Stem Cell research!! There is no real discussion -- it's pitchforks, lit torches, and a stake ready for poor Johan and anyone else ready and willing to consider the mere possibility that AMD have produced worthy technology!!
Computer technology - doing it, anyway - has changed. It's become ALL about the bloody money, and the "culture" of the people doing technology has also changed -- it has become much more cut-throat, there is far less collegiality, and the number of people willing to take risks on projects has become really uncommon. Qualified people doing serious technology just because they can is uncommon.
There is no end to posers (including some on this board), Machiavellian Fortune 500 IT managers, and "Project Managers" who are clueless (there ARE some great IT managers and wonderful PM's but their numbers are shrinking). My hat to those in Open Source - they are the Last Bastion of decency for the sake of decency, and technology merely for the joy of doing it !!
"Back in the day" people seemed really into the technology, solving difficult problems, and making good things happen. There was truly a culture. For example not taking a moment to help someone on the team or otherwise made you a jerk. Development was a craft or an art, and we were all in it together. We are loosing that, and it's become more dog-eat-dog with a kind of mean spirit. What a shame. Many of the comments here are perfect examples -- people who would rather burn down the temple than give a new and challenging technology a good think.
Personally I can't wait to get my hands on a couple of AMD's new CPU's, build a decent server, and carefully work out the issues with patience. These new Opterons are like a whole new tech that may be the beginning of all new territory.
My passion and some professional work is coding at the back end in C/C++ and I'm just beginning to understand CUDA and using GPU's to beef up parallel code. My work is all around (big) data warehousing, cutting edge columnar databases, almost everything running virtual, all the way through to analytics on BI platforms. I do all of that both on MS Server 2008, Solaris and FreeBSD. All that is a perfect environment to test AMD's "new territory."
Probably worth a blog at some point because these processors are new territory and using them well will take some work just keeping track of all the small things that shake out. That's the "next step" that this and other reviews require to really understand AMD's Bulldozers. Doing that well, if AMD is right with these chips, means being able to build some great back-end servers at a much more approachable price; more importantly without paying an "Intel" tax, and in the end having two strong vendors and thereby more freedom to make the best choice for the requirement.
PhotoPrint - Sunday, December 25, 2011 - link
you should make fair comparison at the same price range!lts like comparing GTX580 VS AMD RADEON 6950!
g101 - Wednesday, January 11, 2012 - link
Wow, anad let the truth about bulldozer leak out.ppennisi - Wednesday, March 7, 2012 - link
To obtain maximum performance from my Dell R715 server equipped with dual Interlagos processor I had to DISABLE C1E in the BIOS.Under VMware the machines performance changed completely, almost doubled in performance.
Maybe you should try it.
anti_shill - Monday, April 2, 2012 - link
Here's a more accurate reflection of Bulldozer/ interlagos performance, untainted by intel ad bucks...http://www.phoronix.com/scan.php?page=article&...
But if u really want to see what the true story is, have a look at AMD's stock price lately, and their server wins. They absolutely smoke intel on virtualization, and anything that requires a lot of threads. It's not even close. That would be the reason this review pits Interlagos against an Intel processor that costs twice as much.