Agreed. There are plenty of places you can go to find out how pretty your games will look, but this sort of stuff is much more interesting to me!
Looking forward to the application numbers. Power8 may shape up to be a nice server alternative. I would like to see about virtualization. With the threaded capabilities, it might just be a good platform for that.
Regarding virtualization, SPARC M7 is more than 4x faster than POWER8 on SPECvirt_sc2013, and more than 2x faster than x86 https://blogs.oracle.com/BestPerf/entry/20151025_s... Regarding SPECcpu2006, SPARC M7 is 1.9x and 1.8x faster than POWER8, and is faster than x86 as well: https://blogs.oracle.com/BestPerf/entry/201510_spe... Regarding memory bandwidth, SPARC M7 is 2.2x and 1.7x faster than POWER8 and 2.4x faster than x86 on STREAM benchmarks: https://blogs.oracle.com/BestPerf/entry/20151025_s... If you dig a bit on that web site, you will find 30ish world records, where SPARC M7 is 2-3x faster than POWER8 and x86, all the way up to 11x faster.
It is interesting to delve in to the technology behind POWER8 and x86, but in the end, what really matters, is how fast the cpu performs in real life workloads and benchmarks. SPARC has lower IPC than x86, but as real life server workloads have an IPC of 0.8, SPARC which is a server cpu, is much faster than x86 in practice. In theory, x86 and POWER8 are fast, but in practice, they are much slower than SPARC. So, you can theoretize all you want, but in the end - which cpu is fastest in real workloads and in real benchmarks? SPARC. Just look at all the benchmarks above, where SPARC M7 is faster in number crunching, Big data, neural networks, Hadoop, virtualization, memory bandwidth, etc etc. And if you also factor in the business benchmarks, such as SAP, Peoplesoft, databases etc - there is no contest. You get twice the performance, or more, with a SPARC M7 server than the competitors.
SPARC M7 can also turn on encryption on everything, and loose 2-3% performance. Whereas encryption on POWER8 and x86 typically reduces performance down to 33% or lower. So, if you benchmark encrypted workloads, then SPARC M7 is not typically 2-3x faster, but another 3 times faster again - i.e. typically faster 6-9x.
The virtualization score is good vs. POWER8 mainly based on the radical different in core count: 32 vs. 6. Yeah, even with lower IPC, I'd expect the higher core count system to fair better. Also note that IBM offers such higher core count systems and at higher clock speeds which would close that gap.
Same for the claims of being twice as fast in raw benchmarks: Oracle isn't comparing there best against IBM's best POWER8. There choice of comparison point was simply arbitrary to make SPARC look good, as is the job of their marketing department. Real performance comparisons come from independent reports.
To get the memory bandwidth advantage Oracle proclaims, they have to use twice as many sockets.
These supposed Oracle "wins" are all based on worst-case scenarios for Power8 - ie, testing a DCM based system and counting each DCM as two processors. This isn't very useful for comparison to Power8 overall, as the entry-level machines like the one in this article, and the S822LC positioned above it, all use SCM's (with as many as twelve cores.)
M7 is a first-rate CPU, but it's also in a totally different cost class; the cheapest M7 config listed on Oracle's website costs over US$40k, for a one-processor machine. Considering you can get a pair of 10-core Power8's with 256GB of RAM in an S822LC for US$14,300 list, this is an exceptionally tough sell for those not wedded to Solaris (and by the way, there's no RHEL, SLES, or Ubuntu for SPARC - Solaris is pretty much the only game in town.)
My company is currently deploying an S812LC and intends to deploy an S822LC in the future; we briefly considered SPARC but found the style of marketing that Oracle and its proxies seem to favor to be deeply offputting, as is the relatively poor perf/$ compared to both Power and Intel. Our loads (mainly a large PostgreSQL application) scale well with memory bandwidth and cache sizes, and we've found S812LC perf/$ to be first-rate. The main downsides have just been related to the relative immaturity of the ppc64le platform (occasional lack of available packages, etc.)
These oracle sparc m7 benchmarks vs IBM power8 are not worst case. The DCM Power8 module, actually consists of two power8 CPUs, in one socket. So there is nothing wrong with these benchmarks. It is up to IBM to release benchmarks with two power8 CPUs in one socket, not oracle choice. IBM has for decades promoted few strong cores instead of many weaker cores. For instance, IBM claimed "dual core power6 @ 5 ghz was superior to 8core sparc niagara2 @ 1.6 ghz because databases runs best on few but strong cores" and IBM talked about future super strong single/dual core 6-7 ghz power CPUs and mocked sparc many but weaker cores because databases are worthless on sparc. Back then sparc were first with 8 cores, and it was very controversial having that many cores. Later IBM realized laws of physics prohibit highly clocked CPUs, so IBM abandoned that path and followed sparc with many knower clocked cores. Just like Intel abanoned Prescott with high clocks. Today everybody have many lower clocked cores, just like spare decades ago.
Of course, if IBM released benchmarks with other configurations of power8, oracle would be happy to use them, but IBM has not. Oracle has no choice than to use those benchmarks that IBM has released. It is not oracles choice what benchmarks IBM release.
We also know that power8 is slower than the latest Intel xeons, and we know that sparc m7 is typically 2-3x faster than Intel Xeon, so probably these benchmarks from IBM vs sparc m7 benchmarks are true. If you find other IBM power8 benchmarks I am sure oracle will compare to them instead. But you can only bench against ibm's own results, right?
Regarding my credibility, yes, I am an sparc supporter. What is the problem with being an supporter? I know there are IBM supporters here, and there are nvidia, Amd, Intel etc supporters. What is wrong with that? Does the fact that I consider sparc to be superior, invalidate the official oracle vs IBM vs Intel benchmarks? I have not created those benchmarks, IBM has. And oracle. And Intel. Instead of you, IBM supporters, linking to official superior IBM power8 benchmarks you claim that because I am an sparc supporter, those official vendor benchmarks can not be trusted. Instead of proving that power8 is faster with benchmarks, you resort to attacking me. That does not win you any discussions. Show us facts and benchmarks if you want invalidate my linked benchmarks, instead of attacking me. Fact is, you have not proven anything regarding power8 inferiority.
And why do I keep talking about sparc m7? Well, it seems people believe that Intel and power8 is so fast, but in fact there are another cpu out there, 2-3x faster, up to 11x faster. People just don't know that sparc is the worlds fastest CPU. I would like anandtech to talk about the best CPU in the world instead of slow IBM power or Intel Xeon CPUs. But anandtech don't.
Regarding myself, yes I have been interviewed in Swedish media, and it is evident that I have always worked finance. I have never worked at Sun nor Oracle. Just read the interview. The last years I am an quantitative analyst concocting trading strategies. I have never worked in IT. i just happen to be a nerd and geek, and i only support the best tech, and it is sparc and Solaris. IBM and Intel sucks. Just compare their lousy performance to sparc m7
"The DCM Power8 module, actually consists of two power8 CPUs, in one socket."
Dude, nobody outside of Oracle marketing cares, just like they didn't care when Xeon and Opteron used MCM's. IBM has SCM's going all the way up to 12 cores and 8 Centaur links, they just use DCM's for cost reasons on some (but not all) smaller machines. These have the same number of Centaur links per socket as the big SCM's, and they're priced as one would expect of one or two socket enterprise systems. Realistically, the 8-Centaur SCM has roughly equivalent memory bandwidth to the 8-Centaur DCM.
"Later IBM realized laws of physics prohibit highly clocked CPUs, so IBM abandoned that path and followed sparc with many knower clocked cores. Just like Intel abanoned Prescott with high clocks. Today everybody have many lower clocked cores, just like spare decades ago."
You mean like when Oracle replaced 16-core 1.65GHz T3 with 8-core 3GHz T4? Which, by the way, had very similar throughput performance (which you say is all that matters) to the T3, but had far higher single-thread and single-core performance? If only throughput matters, why would Oracle do such a thing? It's quite a thing for you to imply Oracle doesn't know what they're doing!
They also have been publishing benchmarks for their shiny new S7 chip where they lose per-chip to the Xeon - but they win per-core, which you've said on many occasions doesn't matter. Here are some examples:
Comparisons to IBM are conspicuously absent, I suspect because Power perf/core is rather impressive.
"For instance, IBM claimed "dual core power6 @ 5 ghz was superior to 8core sparc niagara2 @ 1.6 ghz because databases runs best on few but strong cores" and IBM talked about future super strong single/dual core 6-7 ghz power CPUs and mocked sparc many but weaker cores because databases are worthless on sparc."
IBM has never reduced per-core or single-thread performance generation to generation. P7 and P8 were both massive improvements in both categories. IBM has not historically shown interest in "weaker" cores for Power.
"Well, it seems people believe that Intel and power8 is so fast, but in fact there are another cpu out there"
Yes. For the low, low price of over forty thousand dollars for the lowest-end, one-processor M7 system with public prices on Oracle's website.
"2-3x faster"
Consulting officially published results on an industry-standard benchmark:
Xeon E7-8890v4, 2.2GHz: SPECint rate result of 927/chip, 24 cores (38/core) Power8 SCM, 4GHz: SPECint rate result of 900/chip, 12 cores (75/core) SPARC M7, 4.13GHz: SPECint rate result of 1200/chip, 32 cores (37/core)
Not that impressive - especially given M7's price. And certainly not 2-3x of anything (or even 1.9x). It's 1.3x... while having 2.5x as many cores. Additionally, for a large range of applications, single-thread performance matters.
"up to 11x faster."
When running in-memory queries inside Oracle DB using accelerator instructions added to SPARC M7 specifically for Oracle DB, yes.
By the way, since you mentioned memory bandwidth... how does it feel to have two-processor SPARC S7 losing on STREAM Triad to entry-level, one-processor Power8 machines that cost significantly less? Compare https://blogs.oracle.com/BestPerf/entry/20160629_s... to the entry-level Power8 results in the article we're commenting on!
Oracle proponents need to do better than this. At least Phil Dunn resorts less to copypasta...
"What is the problem with being an supporter? What is wrong with that?" Lying, deceiving, etc.
This is what Oracle does because simply put ever since they acquired Sun those products went to sh*t. Oracle are reverse-alchemists. Whether it's software (like Java) or hardware (like the Sparc) Oracle managed to turn those gold nuggets into lead weights. Java was buried by Google, Sparc was buried by Intel and IBM.
Oracle always resorts to this kind of piss-poor advertising and it's not for the customers themselves. They try to save face with numbers on that site with one reason only: to have something to show during their conferences. Because companies don't rely on numbers in a benchmark when committing to multi-year contracts and getting tied into a specific ecosystem. Right now only a handful of government institutions and some in regulated industries still rely on Sparc and only in corner cases. Most times it's just until they manage to migrate off them.
The EXA products might be the only ones with some solid popularity because it's the full package but they do come with plenty of caveats. Having worked in the defense and financial sectors for a long time I've seen plenty of consolidation being done on newer Oracle/Sparc systems but not so many new deployments (a handful). And the proof is in the numbers. Oracle can't seem to make any headway into this. This isn't the kind of runaway success you'd expect for such an "overpowering" system.
P.S. Google for "For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer" and see the army of posters Oracle is employing and the kind of tactics Oracle they resort to. And that's just the official posters.
Their engineered systems for integrated infrastructure and platforms (the latter being their driver) are great but not because of the hardware or the CPU in particular. It's because of the value of the whole package that includes the software layer. Nobody actually cares about the CPU in those particular products and if the CPU were being sold they would have tough time. And not least, they almost always HAVE to heavily discount the price in order to make the sale. From personal and recent experience Oracle was eager enough to undercut competitors like Cisco, VCE or HP (HP has 3 digit growth in this segment YoY for 2-3 years now) and discounted so aggressively that we ended up with 50% savings...
"These oracle sparc m7 benchmarks vs IBM power8 are not worst case." >Eh? Did Oracle release the complete system configuration of the POWER8 for their testing? From your stream link you can find this PDF ( https://blogs.oracle.com/BestPerf/resource/stream/... ) where Oracle only test with 24 threads out of 96 possible in the environment and out of 192 possible supported with the hardware. This document does not detail how many cDIMMs were installed in a system which has a direct impact on available bandwidth. Case in point, the 512 GB of memory on the POWER8 system can be configured with the bare minimum number of cDIMMs in a system. That is a worst case scenario for POWER8 and we don't know if Oracle used it.
Oracle also made a source code change for STREAM for reverse allocation. The thing that is missing here is a comparison to the original code. This could impact how well prefetchers work and favor a particular architecture and thus impact performance. Thus we don't know if this change is a best or worst case scenario for comparison purposes.
"If you find other IBM power8 benchmarks I am sure oracle will compare to them instead. But you can only bench against ibm's own results, right?" >I find it perfectly fair to use submitted benchmarks from IBM to compare against similarly configured systems submitted by Oracle. POWER8 systems are available with higher clocks and more cores than what is generally used in the open benchmarks IBM has submitted. Thus it is deceptive to claim that SPARC is decisively faster when there is beefier IBM hardware available.
"I am an sparc supporter. What is the problem with being an supporter?" >Nothing inherently wrong with that but you are incredibly closed minded to any other alternative. You are blind to the idea that anything could be better or competitive in any metric. The reality of IT is that there no one tool that best fits every job. Anyone claiming otherwise is trying to sell you something.
"I would like anandtech to talk about the best CPU in the world instead of slow IBM power or Intel Xeon CPUs. But anandtech don't." >How about you use your contacts at Oracle to get Anandtech a test system for some real independent analysis?
"I have never worked in IT." >This explains a lot.
So, no one in the entire comment section mentioned SPARC at all. You come along, start ragging on POWER8, how SPARC is so much better, and then link to benchmarks on Oracle's website, with results provided by Oracle, with the conclusion of Oracle being so much better. Not only that, but the benchmarks you link require Oracle to use much higher end and incredibly higher cost hardware to beat low and mid-range POWER8 with.
On top of all that you make dubious and unsubstantiated claims about server workloads and claims of performance of POWER8 and x86.
And finally to top it off, your comment is barely even related to the comment you replied to. It seems you picked the comment most visible in the thread to reply to.
So, to everyone else I think it's quite clear this is just an Oracle shill, please just ignore him.
So, adding on to this I was curious. I decided to make a simple google search, "site:anandtech.com brutalizer". What did I find? Comments on anything x86 and POWER8, every single one talking about how Oracle and SPARC are so much better than whatever the review is talking about. Consistently linking to Oracle-ran benchmarks on Oracles own site with the conclusion that Oracle is better. Consistently making dubious claims about the non-Oracle hardware. Every single comment I found shilling for SPARC, and every single one as close to the top of the comments list as possible. You seem to want to be as visible as possible.
It's hard to draw a conclusion from those two links but I'll point a few things. All of the non-shilling comments you made were in 2013. Every single pro-Oracle comment you made was at minimum 2014. Sounds to me like you were either bought out at that time, or you bought someone else's account, or perhaps this was the time you were put on Oracle's pay-cheque. It's quite possible that there's more comments that aren't shilling that I've missed here.
Ohhh yes I am well aware, I encounter him on El Reg and other places all the time. But hey, I hate shills, so I'm quite happy to destroy any sense of credibility he may have for those not in the know.
Look at SWaP and TCO. Re-run your "analysis", it's obvious that SPARC sucks.
Can you even RUN Ubuntu on SPARC anymore?
Their FP performance sucks and it always have. That's why The Niagara T2 had to have FPUs ADDED to ALL of the cores because sharing a single FPU with 8 cores was a really bad/dumb idea.
I've looked at SPARC before. Had a couple of them and had a SunFire server before as well, and POWER/Intels can easily beat SPARC, especially once you consider TCO.
The company that I work for now (a Fortune 10 company) dumped all of the SPARC workstations for Intel.
I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply of processors!
Johan de Gelas: blowing minds and educating "the rest of us" since...I dunno, a really long time ago (especially in internet years). Great job on the data, but the real good stuff is in your thoughts and analysis. Thank you!
It seems to me, Intel's focus on bringing their CPU architecture design all the way down to 5W is the reason IBM is able to stand out against them. Intel is focused on creating a scalable architecture while IBM can throw the whole kitchen sink at the server market.
Up until this point. Consumer SkyLake and server SkyLake are going to be two different designs. They're certainly related but server SkyLake will have 512 KB of L2 cache per core and support AVX-512 instructions.
Server SkyLake is also going to support 3D Xpoint DIMMs, though that difference is more with the platform/chipset than the actual CPU core.
Very interesting. It seems odd to me that they chose to configure it in a 2U - except for big data clusters, most of the market space I see this playing is dominated by FC to a SAN. Is this a play in the big data cluster space, or the more traditional AIX/DB2/big iron that IBM has owned for so long? Some questions I'd have: what virtualization is possible with this architecture? presumably just the standard PowerVM? How well does that work? What is the impact of IO latency? Could you throw a P3700 or two in here?
2U: Besides big data storage needs, I suspect 2U is necessary for adequate cooling for the POWER8 chip.
Virtualization: Linux KVM works well as far as I know.
We actually tried out a P3700 in there (see: http://www.anandtech.com/show/9567/the-power-8-rev... ) and it worked very well. I asked IBM what a customer should expect when using third party storage (probably no support, but how about waranty?) but no answer yet.
Hi Johan 2U is not necessary for cooling a POWER 8 Chip. We do that better with our Barreleye (1.25 OU design). Even storage wise Barreleye has 15 Disk storage bay that can be seen in below links.
Let me know if you wanna ever benchmark a Barreleye. What specific POWER8 proc are you benchmarking with ? (Turismo?). I believe it does slightly better than S812LC on many benchmarks based on the variant of power8 proc S812LC runs.
Hi floobit For virtualization: powerVM and out of the box KVM (tested on Fedora 23, Ubuntu 15.04 / 15.10 / 16.04) work quite well. Xen doesn't work well or hasn't been officially tested / released.
Interesting that the L3 eDRAM not only allows them to pack in much more L3 (what was it, 3 SRAM transistors per eDRAM or something?), but it's also low latency which was a cited concern with eDARM by some people. Appears to be an unfounded fear.
And then on top of that they put another large L4 eDRAM cache on.
There was a change in how the L4 cache works from Broadwell to SkyLake on the mobile parts. The implication is that Intel was exploring the idea of a large L4 eDRAM for SkyLake-EP/EX parts. We'll see how that turns out as Intel also has explored using HMC as a cache for high bandwidth applications in Knights Landing. So either way, Intel has thus idea on there radar and we'll see how it pans out next year.
That said, if enough partners ask for it and/or if the numbers make sense for Azure, MS will at the very least have a damn good look at porting Windows over.
It's probably just a case of doing QA and releasing it. They've sold a PPC build in the past; and maintain internal builds for a number of other CPU architectures to avoid accidentally baking x86isms into the core code.
They made PowerPC Windows? Source? I remember the Powermac G5s were the early dev kits for the xbox 360 due to the architecture similarity, but I assumed those stories meant they were just working in OSX or Linux on them.
There were early builds of Windows 2000 for the RISC's as well, during the times when it was still called NT5. I had one of those from WinHEC, but alas I lost it when moving at some point. :(
AFAIK, the little endian PowerPC mode that NT4 used was killed when they went to 64-bit and is different from today's POWER8 little endian mode that was only recently introduced.
The Xbox 360 is a PPC machine, and runs a (heavily modified) version of Windows. My understanding is that most x86 assumptions had to be ferreted out to run on Itanium (early) and then on ARM (later).
MS has builds that will run on anything. The real question is why would you want to? These chips are designed from the ground up to run massive work loads. It's a completely different style of computing than a Windows machine. Even MS server OSes aren't designed for this type of work. We are talking Banking, ERP and other big data applications. MS is still dreaming about scaling on that level. Right now their answer is clustering but that comes with it's own obstacles too.
And IBM has a much better binary translator from when they bought QuickTransit. That one originally translated Power to x86 for the Mac, then Sparc to x86 for Quicktransit and eventually x86 to Power for IBM so they could run Linux workloads on AIX.
Then what exactly do you mean with Windows (assuming this is actually a reasonable question)?
Server applications or desktop?
.NET has been ported to Linux and I guess could be made to run on Power. A Power runtime could certainly be done by Microsoft, if they wanted to.
I don't see why anyone would want to run Windows desktop workloads on this hardware, other than to show that it can be done: QEMU to that!
I was intrigued to see how little effect hyper-threading with your Xeon. My own experience is that it gives a 50% boost although I appreciate there are many variables.
Afaik, Anandtech has always used the chart when presenting things like SPEC. I'd guess it'd be for clutter reasons, but the exact reason is up to the editors to mention.
Just to be clear, the Xeon CPU used today is 3 times more expensive than the Power8 CPU benchmarked? That's really impressive, isn't it? The Power8 has a pretty significant power increase, but if it's 43% faster, that cuts into the perf/w gap.
I know we've only looked at SPEC so far in round 2, but this looks like a good showing for IBM. How big is the efficiency gap between 22nm SOI and 14nm FinFet? Any estimates?
They`d have to have a healthy margin to offset all the R&D, plus IBM as a whole is not in a good financial position. Consider they sold their fab capability not so long ago.
Please correct this error, you are saying you are comparing with BEST Intel can provide, but you did address Xeon for workloads need Xeon Phi Knight Landing which is a standalone CPU, too. If you choose correctly, the benchmark will be sooo different. IBM Power 8 is 90 GB/s, while Intel's Xeon phi knight landing (as 7290F) has a bandwidth of 400 GB/s. IBM power 8 does above 600 gflops single precision and above 300 gflops double precision FLOPs, this is *10 in Xeon phi 7290F. Specint: xeon phi is 1500 vs 1700 for power 8 Power and Price aside....
If we start comparing different product categories, why not bring the GP100 into this as well. It will deliver 10TFLOPS of single precision and can be had for much less than any of these. But then again, there is the same caveat as the Xeon Phi. You cant actually run an OS on it, you need a host CPU and then you dispatch kernels onto the accelerator. Even if its a socketed version.
Then you can add another xeon phi to above statistics... Xeon Phi KL is a CPU like other CPUs it does everything as mentioned even its specint is comparable, not so bad...
Xeon Phi is x86, but it's GPU-like in nature, massively parallel for performance with low per-core performance. The IBM Power8 and other Xeons compete in highly parallel spaces like banking, but where single thread performance also still matters. Can't compare them.
Xeon Phi Knight Landing has 3 times more single thread performance than silvermont (& knight corner).. I don't think it is so bad... The comparison is truly so, see the benchmarks, they say specint for example, or anything parallel performance, additionally, you can use a Xeon high performance with a xeon phi, there is nothing that prevents you. The benchmark is not about Database performance or parsing or anything similar, it is about this article, I don't say xeon phi is currently better positioned than xeon in these uses... But IBM's Power is not so, too, it has lots of core and lots of threads which is usable only in massive parallel uses...
On the IBM server, numactl was used to physically bind the 2, 4, or 8 copies of SPEC CPU to the first 2, 4, or 8 threads of the first core. On the Intel server, the 2 copy benchmark was bound to the first core. It is not single thread, it is a trick IBM uses to cheat in benchmarks, it is 425% percents slower than xeon in single thread.
The benchmarks here pit one core against one core. The IBM cores can run 1, 2, 4 or 8 threads on a single core, the Intel does 1 or 2. The 425%, not sure where that number comes from, but it isn't what shows out of these benchmarks.
The benchmarks show, as described by Johan: In single thread, the IBM does about 13% less work than the Intel core. In 2-thread mode, the IBM does about 20% more than the intel across the two threads. The intel doesn't do more than 2 threads, the IBM can and does then, on average, 43% more work across the eight threads than the Intel does with its two.
So Intel is single-thread master here, IBM is throughput king. Now if you have a HEAVILY threaded workload, with hundreds of threads and little latency requirements for each, Knights Landing or a GPU is a better choice, with their hundreds of cores. If latency is important and you can afford to use two to four threads per core the IBM performs best. If latency is everything, you keep it at 1 thread per core and the Intel Xeon is the best performer.
That is entirely ignoring cost, of course, both Intel and IBM have high and low cost solutions with their downsides and benefits. This set of benchmarks simply pitted one core against another, entirely ignoring the differences in core count (IBM 10, Intel 22) and price (Intel orders of magnitude more expensive). You'll always have to look at a bigger picture: how many cores do you get for your dollar and what are your requirements.
Performance/watt, the Intel probably wins in all area's, at least if the system is idle frequently. Without idle the IBM might be not that bad, perf/power wise.
The big take-away from this article is, though, that IBM has built a system which can be quite price-competitive with Intel in the lower-high end market. To really be able to make a choice, we'd probably need a benchmark of two price-equivalent systems. I bet the workload would make a huge difference in who wins the price/performance fight.
I believe "heavily threaded" is somewhat imprecise here: Knights Landing (KNL) is really more about vectorized workloads, or one very loopy and computationally expensive problem, which has been partitioned into lots of chunks, but has high locality. Same code, related data, far more computational throughput than data flowthrough.
Power8 will do better on such workloads than perhaps Intel, but never as good as a GPU or KNL.
However it does evidently better per core on highly threaded workloads, where lots of execution threads share the same code but distinct or less related datasets, less scientific and more commercial workloads, more data flowing through.
Funnily KNL might even do well there, beating its Xeon-D sibling in every benchmark, even in terms of energy efficience.
But I'm afraid that's because most of the KNL surface area would remain dark on such workload while the invests would burn through any budget.
KNL is an odd beast designed for a rather specific job and only earn its money there, even if you can run Minecraft or Office on it.
I do think comparison with Xeon Phi is fair since it can run/boot itself now with Knight's Landing. Software parity with the normal x86 ecosystem is now there so it can run off the shelf binaries.
I am very curious how well such a dense number of cores perform for workloads that don't need high single threaded performance.
Another interest factor would be memory bandwidth performance as Xeon Phi has plenty. The HMC only further enhances that metric and worth exploring it as both a cache and main memory region for benchmarks.
Will you be addressing virtualization in a future article. I ask this because you are saying the lower cost Power8 systems are intended to compete with the Dell's, HP's, Lenovo etc x86 servers. But these days, a very high percentage of x86 work loads are virutalized either on VMWare or competing products. In 2009 Gartner had it at about 50% and by 2014 it was at 70%. I didn't find a number for '15 or '16 but I expect the percentage would have continued to rise. So if they want to take the place of x86 boxes, they have to be able to do the tasks those boxes do...which tends to largely be to run virtual machines that do the actual workloads.
And, what about all the x86 boxes running Windows Server or more commonly Windows Server Virtual machines? Windows Server shops aren't likely to ditch windows in favor of Linux solely for the privilege of running on Power8?
One last thing to consider regarding price. These days we can buy quite robust Intel based server for around $10K. So, supposing I can buy a Power8 system for about the same price? Essentially the hardware has gotten so cheap compared to the licensing and support costs for the software we are running that its a drop in the bucket. If we needed 10 Intel servers or 6 Power 8's to do the same job (assuming the Power8's could run all our VM's), the Power8's could come out lower priced hardware wise, but the difference is, as I said, a drop in the bucket in the overall scheme of things. Performance wise, with the x86 boxes, you just throw more cores at it.
Near as I can tell, there is a PowerKVM that runs on Power 8 but that doesn't allow you to run Windows Server VM's - seems to support only Linux guests.
AMD should have used IBM's 22nm SOI to make cpu's so that they would not have been totally dead in the performance and server cpu market for years. GF now owns this process as they "bought" IBM's fabs and tech. I think that 22nm SOI might be better for high speed cpu's than the 14nm LPP FinFet that AMD is using for ZEN at the cost of die size.
So a single socket Power8 is somewhat faster than the intel chip. But is being compared in a single socket configuration where the intel is designed for a two socket. Unless the power8 is cheaper than an intel dual socket seems most fare to compare both CPU as they are designed to be used.
OK, this is literally why Anandtech is the best in the tech journalism industry.
There is nowhere else on the net that you can find a head to head comparison between POWER and Xeon, and unless you work in the tech department of a Fortune 500 company, this information has just not been available, until now.
Johan, thank you for your work on this article. I did give you beef in your previous article about using LE Ubuntu but I concede your point. Very happy to you are writing more for Anandtech these days.
Xeons really need some competition. Whether that competition comes from POWER or ARM or Zen, I am happy to see some competition. IBM has big plans for POWER9. Hopefully this is just the start of things to come.
Thanks! it is very exciting to perform benchmarks that nobody has published yet :-).
In hindsight, I have to admit that the first article contained too few benchmarks that really mattered for POWER8. Most of our usual testing and scripting did not work, and so after lot of tinkering, swearing and sweat I got some benchmarks working on this "exotic to me" platform. The contrast between what one would expect to see on POWER8 and me being proud of being able to somewhat "tame the beast" could not have been greater :-). In other words, there was a learning curve.
Not quite sure what the Endianess of a systems adds to the competitive factor. Maybe someone could elaborate why it is so important to run a system in LE?
"Numerous clients, software partners, and IBM’s own software developers have told us that porting their software to Power becomes simpler if the Linux environment on Power supports little endian mode, more closely matching the environment provided by Linux on x86. This new level of support will *** lower the barrier to entry for porting Linux on x86 software to Linux on Power **."
"A system accelerator programmer (GPU or FPGA) who needs to share memory with applications running in the system processor must share data in an pre-determined endianness for correct application functionality."
While correct in theory, this hasn't been a problem for the last 20 years. People are used to using BE on PPC/POWER, the software, the drivers and the infrastructure are very mature (as a matter of fact it was my job 15 years ago to make sure they are). PPC/POWER actually have configurable endianess so if someone wanted to go LE earlier it would have easily been possible but only few ever attempted that stunt; so why have the big disruption now?
I assume that this is about selling POWER boxes to companies that currently run all x86 servers, and have a bunch of custom software that they might be willing to recompile. If the customer has to spend a bunch of time fixing endian dependencies in his software in order to get it to work on POWER, it will probably be less expensive for them to simply stick with x86.
It depends what kind of software you are running. If you are running giant backend workloads on x86, you can seamlessly migrate that data to PPC while keeping custom front ends running on x86.
Johan, maybe the little endian-ness makes a difference in porting proprietary software, but pretty much all open source software on Linux has supported BE POWER for a long time.
If you get the time and the inclination Johan, it would be great if you could say do some benchmarks on BE RHEL 7 vs LE RHEL 7 on the same POWER 8 system. I think it would make for fascinating reading in itself, and would show if there are any differences when POWER operates in BE mode vs LE mode.
Actually scrap that, seems like IBM is fully focusing on LE for Linux on POWER in future. I'm not sure there will be many BE Linux distributions officially supporting POWER9 anyway. So your choice of focusing on LE Linux on POWER is fully justified.
Side note: Once you are running KVM, you can run any mix of BE and LE linux varieties side by side. I'm running FedoraBE, SuSE BE, Ubuntu LE, CentOS LE, and (yes a very slow copy of windows) on one of these chips
If a machine is completely isolated, it doesn't matter much to the machine. I personally find BE easier to read in hex dumps because it follows the left-to-right nature of English numbers, but there are reasons to use LE for human understanding as well.
The problem shows up the instance one tries to interchange binary data. If the endian order does not match, the data is going to get scrambled. Careful programming can work around this issue, but not everyone is a careful programmer - there's a lot of 'get something out the door' from inexperienced or lazy people. If everything is using the same conventions (not only endian, but size of the binary data types (less of a problem now that most everything has converged to 64-bit)), it's not an issue. Thus having LE on Power makes the interchange of binary data easier with the X86 world.
Great Article! Just an FYI, the term "just" as in "just out" on the first page has different meanings on opposite sides of the Atlantic and is usually avoided in writing for international audiences. I'm not quite sure which one is used her. The NaE would mean 'just out' in that it had come out right before while the BrE would mean it came out right after the time period referenced in the sentence.
Skylake does not have 5 decoders, it is still 4. I know that that segment of the optimization manual is written in a cryptic way, but this's what actually happened: up until Broadwell there are 4 decoders and a max bandwidth from the decoder segment of 4 uops. If the first decoder (the complex one) produces 4 uops from one x86 op, the other decoders can't work. If the first produces 3, then the second can produce 1, etc. this means that the decoders can produce one of these combinations of uops from an x86 op, depending on how complex a task the first decoder has: 1/1/1/1, 2/1/1, 3/1, or 4. Skylake changes this so the max bandwidth from that segment is now 5, and the legal combinations become 1/1/1/1, 2/1/1/1, 3/1/1, and 4/1. You still can't do 1/1/1/1/1, so there is still only 4 decoders. Make sense?
Why do the tests with GCC? Why not give each platform their full advantage and go with ICC on Intel and xLC on Power? The compiler can make a HUGE difference with benchmarks.
The point Johan makes is that his goal is not to get the best bechmark scores but the most relevant real life data. One can argue if he succeeded, certainly the results are interesting but there is much more to the CPU's as usual. And I do think his choice is justified, while much scientific code would be recompiled with a faster compiler (though the cost of ICC might be a problem in a educational setting), many businesses wouldn't go through that effort.
I personally would love to see a newer Ubuntu & GCC being used, just to see what the difference is, if any. The POWER ecosystem seems to evolve fast so a newer platform and compiler could make a tangible difference. But, of course, if you in your usecase would use ICC or xLC, these benches are not perfect.
Well since Johan really only tested one core on each CPU, it would have been nice to have him verify the actual clock speed of those cores. You'd assume that they'd be able to maintain top speed for any single core workload independent of the number of threads, but checking is better than guessing.
In addition to what Michael Bay (lel) said, remember that only Intel really has 14nm, when TSMC and GloFlo say 14/16nm they really mean 20nm with finfetts.
using a less capable compiler (GCC) to test a chip, and not using everything the chip has to offer seems incredibly flawed to me, what are you testing exactly
He's testing what actual software people actually run on these things.
On your typical Linux host, pretty most everything is compiled with GCC. You want to get into exotic compilers? Sure both IBM and Intel have their exotic proprietary costly compilers, so what. Very few people outside of niche industries use them.
You want to compare a CPU with CPU? You keep the compiler the same. That's just common sense. It's also how the scientific method works!
right, but you're comparing, say 10% of the silicon on that chip, and saying that the remaining 90% of the transistors making the chip does not matter, they do; if the software is not using them, that's fine, but it's not an accurate comparison of the hardware, it's a comparison of the software
I'd argue it is the other way around, GCC might leave 5-10% performance on the table in some niche cases but does just fine most of the time. There's a reason Intel and IBM contribute to GCC - to make sure it doesn't get too far behind as they know very well most of their customers use these compilers and not their proprietary ones.
Of course, for scientific computing and other niches it makes all the difference and one can argue these heavy systems ARE for niche markets but I still think it was a sane choice to go with GCC.
Actually exercising 90% of all transistors on a CPU die these days, is both very hard to do (next to impossible) and will only slow the clock to avoid overstepping TDP.
And I seriously doubt that the GCC will underuse a CPU at 10% its computational capacity.
Actually from what I saw the GCC by itself (compiling) was best at exploiting the full 8T potential of the Power8. And since the GCC is compiled by itself, that speaks for the quality of machine code that it can produce, if the source allows it. And that speaks for the quality of the GCC source code, ergo prove you can do better before you rant.
Well this is part 1 and describes one scenario. What you want is another scenario and of course it's a valid if a very distinct one.
Actually distinct is the word here: You'd be using a vendor's compiler if your main job is a distinct workload, because you'd want to squeeze every bit of performance out of that.
The problem with that is of course, that any distinct workload makes it rather boring for the general public because they cannot translate the benchmark to their environment.
AT aims to satisfy the broadest meaningful audience and Johan as done a great, great job at that.
I'm sure he'll also write a part 4711 for you specifically, if you make it economically attractive.
Hell, even I'd do that given the proper incentive!
Using GCC as the compiler is also why (in my opinion) the Intel chips aren't using their full TDP. Large areas of Intel chips are dedicated to vector operations in SSE and AVX. If you don't issue those instructions then half the chip isn't even being used.
Some gamers who love their overclocked Intel chips have actually complained to game engine developers who add AVX to the game engine. Because it ruins their overclock even if the game runs much faster. Then they're in the situation of being forced to clock down from 4.5 GHz to 3.7 in order to avoid lockups or thermal throttling.
The Xeon E3 v3's had different clock speeds for AVX code: it consumed too much power and got too hot while under total load.
This holds true on the E5 v4's but the AVX penalty is done on a core-by-core basis, not across the entire chip. The result is improved performance in mixed workloads. This is a good thing as AVX hasn't broken out much beyond the HPC markets.
I made an account to say that this article (along with the subsequent stock-cooler comparison article) is why I really love Anandtech. A lot of the code I run/write for my research is CPU-bottlenecked. Still, until the last year or so, I didn't know very much about hardware. Now, reading Anandtech, I have learned so much more about the hardware I depend on from this website than from any other website. Most just repeat announcements or run meaningless cursory synthetic benchmarks. The fact that Johan De Gelas has written such a deep dive into the inner workings of something as complex as a server CPU architecture, and done it in a way that I can understand, is remarkable. Great job Anandtech, keep it up and I'll always come back.
Excellent work and review as always Johan. I would have been interest to see how the two processors perform in floating point intensive benchmarks though...
I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply.
Johan - interesting article, I enjoyed it - especially after I discovered how to get to the next page.
As far as the comments go - 1) a good article will get a diverse response (from those with an open, read querying, mind. 2) I agree with those who, in other words are saying: "there is no 'one size fits all'." And my gut reaction is that you are providing a level of detail that assists in determining which platform/processor "fits my need"
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
124 Comments
Back to Article
close - Thursday, July 21, 2016 - link
This right here is why I keep coming back to Anandtech. Thumbsup!jardows2 - Thursday, July 21, 2016 - link
Agreed. There are plenty of places you can go to find out how pretty your games will look, but this sort of stuff is much more interesting to me!Looking forward to the application numbers. Power8 may shape up to be a nice server alternative. I would like to see about virtualization. With the threaded capabilities, it might just be a good platform for that.
Brutalizer - Friday, July 22, 2016 - link
Regarding virtualization, SPARC M7 is more than 4x faster than POWER8 on SPECvirt_sc2013, and more than 2x faster than x86https://blogs.oracle.com/BestPerf/entry/20151025_s...
Regarding SPECcpu2006, SPARC M7 is 1.9x and 1.8x faster than POWER8, and is faster than x86 as well:
https://blogs.oracle.com/BestPerf/entry/201510_spe...
Regarding memory bandwidth, SPARC M7 is 2.2x and 1.7x faster than POWER8 and 2.4x faster than x86 on STREAM benchmarks:
https://blogs.oracle.com/BestPerf/entry/20151025_s...
If you dig a bit on that web site, you will find 30ish world records, where SPARC M7 is 2-3x faster than POWER8 and x86, all the way up to 11x faster.
It is interesting to delve in to the technology behind POWER8 and x86, but in the end, what really matters, is how fast the cpu performs in real life workloads and benchmarks. SPARC has lower IPC than x86, but as real life server workloads have an IPC of 0.8, SPARC which is a server cpu, is much faster than x86 in practice. In theory, x86 and POWER8 are fast, but in practice, they are much slower than SPARC. So, you can theoretize all you want, but in the end - which cpu is fastest in real workloads and in real benchmarks? SPARC. Just look at all the benchmarks above, where SPARC M7 is faster in number crunching, Big data, neural networks, Hadoop, virtualization, memory bandwidth, etc etc. And if you also factor in the business benchmarks, such as SAP, Peoplesoft, databases etc - there is no contest. You get twice the performance, or more, with a SPARC M7 server than the competitors.
SPARC M7 can also turn on encryption on everything, and loose 2-3% performance. Whereas encryption on POWER8 and x86 typically reduces performance down to 33% or lower. So, if you benchmark encrypted workloads, then SPARC M7 is not typically 2-3x faster, but another 3 times faster again - i.e. typically faster 6-9x.
Kevin G - Friday, July 22, 2016 - link
Oracle marketing at its finest.The virtualization score is good vs. POWER8 mainly based on the radical different in core count: 32 vs. 6. Yeah, even with lower IPC, I'd expect the higher core count system to fair better. Also note that IBM offers such higher core count systems and at higher clock speeds which would close that gap.
Same for the claims of being twice as fast in raw benchmarks: Oracle isn't comparing there best against IBM's best POWER8. There choice of comparison point was simply arbitrary to make SPARC look good, as is the job of their marketing department. Real performance comparisons come from independent reports.
To get the memory bandwidth advantage Oracle proclaims, they have to use twice as many sockets.
SarahKerrigan - Friday, July 22, 2016 - link
These supposed Oracle "wins" are all based on worst-case scenarios for Power8 - ie, testing a DCM based system and counting each DCM as two processors. This isn't very useful for comparison to Power8 overall, as the entry-level machines like the one in this article, and the S822LC positioned above it, all use SCM's (with as many as twelve cores.)M7 is a first-rate CPU, but it's also in a totally different cost class; the cheapest M7 config listed on Oracle's website costs over US$40k, for a one-processor machine. Considering you can get a pair of 10-core Power8's with 256GB of RAM in an S822LC for US$14,300 list, this is an exceptionally tough sell for those not wedded to Solaris (and by the way, there's no RHEL, SLES, or Ubuntu for SPARC - Solaris is pretty much the only game in town.)
My company is currently deploying an S812LC and intends to deploy an S822LC in the future; we briefly considered SPARC but found the style of marketing that Oracle and its proxies seem to favor to be deeply offputting, as is the relatively poor perf/$ compared to both Power and Intel. Our loads (mainly a large PostgreSQL application) scale well with memory bandwidth and cache sizes, and we've found S812LC perf/$ to be first-rate. The main downsides have just been related to the relative immaturity of the ppc64le platform (occasional lack of available packages, etc.)
Brutalizer - Sunday, July 31, 2016 - link
These oracle sparc m7 benchmarks vs IBM power8 are not worst case. The DCM Power8 module, actually consists of two power8 CPUs, in one socket. So there is nothing wrong with these benchmarks. It is up to IBM to release benchmarks with two power8 CPUs in one socket, not oracle choice. IBM has for decades promoted few strong cores instead of many weaker cores. For instance, IBM claimed "dual core power6 @ 5 ghz was superior to 8core sparc niagara2 @ 1.6 ghz because databases runs best on few but strong cores" and IBM talked about future super strong single/dual core 6-7 ghz power CPUs and mocked sparc many but weaker cores because databases are worthless on sparc. Back then sparc were first with 8 cores, and it was very controversial having that many cores. Later IBM realized laws of physics prohibit highly clocked CPUs, so IBM abandoned that path and followed sparc with many knower clocked cores. Just like Intel abanoned Prescott with high clocks. Today everybody have many lower clocked cores, just like spare decades ago.Of course, if IBM released benchmarks with other configurations of power8, oracle would be happy to use them, but IBM has not. Oracle has no choice than to use those benchmarks that IBM has released. It is not oracles choice what benchmarks IBM release.
We also know that power8 is slower than the latest Intel xeons, and we know that sparc m7 is typically 2-3x faster than Intel Xeon, so probably these benchmarks from IBM vs sparc m7 benchmarks are true. If you find other IBM power8 benchmarks I am sure oracle will compare to them instead. But you can only bench against ibm's own results, right?
Regarding my credibility, yes, I am an sparc supporter. What is the problem with being an supporter? I know there are IBM supporters here, and there are nvidia, Amd, Intel etc supporters. What is wrong with that? Does the fact that I consider sparc to be superior, invalidate the official oracle vs IBM vs Intel benchmarks? I have not created those benchmarks, IBM has. And oracle. And Intel. Instead of you, IBM supporters, linking to official superior IBM power8 benchmarks you claim that because I am an sparc supporter, those official vendor benchmarks can not be trusted. Instead of proving that power8 is faster with benchmarks, you resort to attacking me. That does not win you any discussions. Show us facts and benchmarks if you want invalidate my linked benchmarks, instead of attacking me. Fact is, you have not proven anything regarding power8 inferiority.
And why do I keep talking about sparc m7? Well, it seems people believe that Intel and power8 is so fast, but in fact there are another cpu out there, 2-3x faster, up to 11x faster. People just don't know that sparc is the worlds fastest CPU. I would like anandtech to talk about the best CPU in the world instead of slow IBM power or Intel Xeon CPUs. But anandtech don't.
Regarding myself, yes I have been interviewed in Swedish media, and it is evident that I have always worked finance. I have never worked at Sun nor Oracle. Just read the interview. The last years I am an quantitative analyst concocting trading strategies. I have never worked in IT. i just happen to be a nerd and geek, and i only support the best tech, and it is sparc and Solaris. IBM and Intel sucks. Just compare their lousy performance to sparc m7
SarahKerrigan - Sunday, July 31, 2016 - link
"The DCM Power8 module, actually consists of two power8 CPUs, in one socket."Dude, nobody outside of Oracle marketing cares, just like they didn't care when Xeon and Opteron used MCM's. IBM has SCM's going all the way up to 12 cores and 8 Centaur links, they just use DCM's for cost reasons on some (but not all) smaller machines. These have the same number of Centaur links per socket as the big SCM's, and they're priced as one would expect of one or two socket enterprise systems. Realistically, the 8-Centaur SCM has roughly equivalent memory bandwidth to the 8-Centaur DCM.
"Later IBM realized laws of physics prohibit highly clocked CPUs, so IBM abandoned that path and followed sparc with many knower clocked cores. Just like Intel abanoned Prescott with high clocks. Today everybody have many lower clocked cores, just like spare decades ago."
You mean like when Oracle replaced 16-core 1.65GHz T3 with 8-core 3GHz T4? Which, by the way, had very similar throughput performance (which you say is all that matters) to the T3, but had far higher single-thread and single-core performance? If only throughput matters, why would Oracle do such a thing? It's quite a thing for you to imply Oracle doesn't know what they're doing!
They also have been publishing benchmarks for their shiny new S7 chip where they lose per-chip to the Xeon - but they win per-core, which you've said on many occasions doesn't matter. Here are some examples:
https://blogs.oracle.com/BestPerf/entry/20160629_n...
https://blogs.oracle.com/BestPerf/entry/20160629_r...
Comparisons to IBM are conspicuously absent, I suspect because Power perf/core is rather impressive.
"For instance, IBM claimed "dual core power6 @ 5 ghz was superior to 8core sparc niagara2 @ 1.6 ghz because databases runs best on few but strong cores" and IBM talked about future super strong single/dual core 6-7 ghz power CPUs and mocked sparc many but weaker cores because databases are worthless on sparc."
IBM has never reduced per-core or single-thread performance generation to generation. P7 and P8 were both massive improvements in both categories. IBM has not historically shown interest in "weaker" cores for Power.
"Well, it seems people believe that Intel and power8 is so fast, but in fact there are another cpu out there"
Yes. For the low, low price of over forty thousand dollars for the lowest-end, one-processor M7 system with public prices on Oracle's website.
"2-3x faster"
Consulting officially published results on an industry-standard benchmark:
Xeon E7-8890v4, 2.2GHz: SPECint rate result of 927/chip, 24 cores (38/core)
Power8 SCM, 4GHz: SPECint rate result of 900/chip, 12 cores (75/core)
SPARC M7, 4.13GHz: SPECint rate result of 1200/chip, 32 cores (37/core)
Not that impressive - especially given M7's price. And certainly not 2-3x of anything (or even 1.9x). It's 1.3x... while having 2.5x as many cores. Additionally, for a large range of applications, single-thread performance matters.
"up to 11x faster."
When running in-memory queries inside Oracle DB using accelerator instructions added to SPARC M7 specifically for Oracle DB, yes.
By the way, since you mentioned memory bandwidth... how does it feel to have two-processor SPARC S7 losing on STREAM Triad to entry-level, one-processor Power8 machines that cost significantly less? Compare https://blogs.oracle.com/BestPerf/entry/20160629_s... to the entry-level Power8 results in the article we're commenting on!
Oracle proponents need to do better than this. At least Phil Dunn resorts less to copypasta...
SPEC references:
https://spec.org/cpu2006/results/res2016q2/cpu2006...
https://spec.org/cpu2006/results/res2015q4/cpu2006...
https://spec.org/cpu2006/results/res2015q2/cpu2006...
close - Tuesday, August 2, 2016 - link
"What is the problem with being an supporter? What is wrong with that?"Lying, deceiving, etc.
This is what Oracle does because simply put ever since they acquired Sun those products went to sh*t. Oracle are reverse-alchemists. Whether it's software (like Java) or hardware (like the Sparc) Oracle managed to turn those gold nuggets into lead weights. Java was buried by Google, Sparc was buried by Intel and IBM.
Oracle always resorts to this kind of piss-poor advertising and it's not for the customers themselves. They try to save face with numbers on that site with one reason only: to have something to show during their conferences. Because companies don't rely on numbers in a benchmark when committing to multi-year contracts and getting tied into a specific ecosystem.
Right now only a handful of government institutions and some in regulated industries still rely on Sparc and only in corner cases. Most times it's just until they manage to migrate off them.
close - Tuesday, August 2, 2016 - link
The EXA products might be the only ones with some solid popularity because it's the full package but they do come with plenty of caveats. Having worked in the defense and financial sectors for a long time I've seen plenty of consolidation being done on newer Oracle/Sparc systems but not so many new deployments (a handful). And the proof is in the numbers. Oracle can't seem to make any headway into this.This isn't the kind of runaway success you'd expect for such an "overpowering" system.
P.S. Google for "For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer" and see the army of posters Oracle is employing and the kind of tactics Oracle they resort to. And that's just the official posters.
close - Tuesday, August 2, 2016 - link
Their engineered systems for integrated infrastructure and platforms (the latter being their driver) are great but not because of the hardware or the CPU in particular. It's because of the value of the whole package that includes the software layer. Nobody actually cares about the CPU in those particular products and if the CPU were being sold they would have tough time.And not least, they almost always HAVE to heavily discount the price in order to make the sale. From personal and recent experience Oracle was eager enough to undercut competitors like Cisco, VCE or HP (HP has 3 digit growth in this segment YoY for 2-3 years now) and discounted so aggressively that we ended up with 50% savings...
Kevin G - Tuesday, August 2, 2016 - link
"These oracle sparc m7 benchmarks vs IBM power8 are not worst case.">Eh? Did Oracle release the complete system configuration of the POWER8 for their testing? From your stream link you can find this PDF ( https://blogs.oracle.com/BestPerf/resource/stream/... ) where Oracle only test with 24 threads out of 96 possible in the environment and out of 192 possible supported with the hardware. This document does not detail how many cDIMMs were installed in a system which has a direct impact on available bandwidth. Case in point, the 512 GB of memory on the POWER8 system can be configured with the bare minimum number of cDIMMs in a system. That is a worst case scenario for POWER8 and we don't know if Oracle used it.
Oracle also made a source code change for STREAM for reverse allocation. The thing that is missing here is a comparison to the original code. This could impact how well prefetchers work and favor a particular architecture and thus impact performance. Thus we don't know if this change is a best or worst case scenario for comparison purposes.
"If you find other IBM power8 benchmarks I am sure oracle will compare to them instead. But you can only bench against ibm's own results, right?"
>I find it perfectly fair to use submitted benchmarks from IBM to compare against similarly configured systems submitted by Oracle. POWER8 systems are available with higher clocks and more cores than what is generally used in the open benchmarks IBM has submitted. Thus it is deceptive to claim that SPARC is decisively faster when there is beefier IBM hardware available.
"I am an sparc supporter. What is the problem with being an supporter?"
>Nothing inherently wrong with that but you are incredibly closed minded to any other alternative. You are blind to the idea that anything could be better or competitive in any metric. The reality of IT is that there no one tool that best fits every job. Anyone claiming otherwise is trying to sell you something.
"I would like anandtech to talk about the best CPU in the world instead of slow IBM power or Intel Xeon CPUs. But anandtech don't."
>How about you use your contacts at Oracle to get Anandtech a test system for some real independent analysis?
"I have never worked in IT."
>This explains a lot.
wingar - Saturday, July 23, 2016 - link
So, no one in the entire comment section mentioned SPARC at all. You come along, start ragging on POWER8, how SPARC is so much better, and then link to benchmarks on Oracle's website, with results provided by Oracle, with the conclusion of Oracle being so much better. Not only that, but the benchmarks you link require Oracle to use much higher end and incredibly higher cost hardware to beat low and mid-range POWER8 with.On top of all that you make dubious and unsubstantiated claims about server workloads and claims of performance of POWER8 and x86.
And finally to top it off, your comment is barely even related to the comment you replied to. It seems you picked the comment most visible in the thread to reply to.
So, to everyone else I think it's quite clear this is just an Oracle shill, please just ignore him.
wingar - Saturday, July 23, 2016 - link
So, adding on to this I was curious. I decided to make a simple google search, "site:anandtech.com brutalizer". What did I find? Comments on anything x86 and POWER8, every single one talking about how Oracle and SPARC are so much better than whatever the review is talking about. Consistently linking to Oracle-ran benchmarks on Oracles own site with the conclusion that Oracle is better. Consistently making dubious claims about the non-Oracle hardware. Every single comment I found shilling for SPARC, and every single one as close to the top of the comments list as possible. You seem to want to be as visible as possible.Have some links.
http://www.anandtech.com/comments/10158/the-intel-...
http://www.anandtech.com/comments/9193/the-xeon-e7...
http://www.anandtech.com/comments/10230/ibm-nvidia...
http://www.anandtech.com/comments/9567/the-power-8...
http://www.anandtech.com/comments/7757/quad-ivy-br...
http://www.anandtech.com/comments/7852/intel-xeon-...
http://www.anandtech.com/comments/7285/intel-xeon-...
Infact I found a couple of comments you left that *weren't* shilling. Have some links.
http://www.anandtech.com/comments/7334/a-look-at-a...
http://www.anandtech.com/comments/7371/understandi...
http://www.anandtech.com/comments/5831/amd-trinity...
It's hard to draw a conclusion from those two links but I'll point a few things. All of the non-shilling comments you made were in 2013. Every single pro-Oracle comment you made was at minimum 2014. Sounds to me like you were either bought out at that time, or you bought someone else's account, or perhaps this was the time you were put on Oracle's pay-cheque. It's quite possible that there's more comments that aren't shilling that I've missed here.
So, please. Try again.
Zetbo - Saturday, July 23, 2016 - link
He is a known Oracle Troll/Shill Kebbabert who is probably paid by Oracle to post crap all over internet. If he is not paid then thats just sad...wingar - Saturday, July 23, 2016 - link
Ohhh yes I am well aware, I encounter him on El Reg and other places all the time. But hey, I hate shills, so I'm quite happy to destroy any sense of credibility he may have for those not in the know.tipoo - Tuesday, July 26, 2016 - link
Ooh, good callout. It would almost be weirder if Oracle *didn't* pay him after all those links, lol.Kevin G - Wednesday, July 27, 2016 - link
Here is an interview with him (in Swedish) about how he was invited by then Sun to a party for his efforts:http://it24.idg.se/2.2275/1.202161/staende-ovation...
wingar - Thursday, July 28, 2016 - link
I'd call it sad, really. Very sad.alpha754293 - Wednesday, July 27, 2016 - link
Dude, SPARC sucks.Look at SWaP and TCO. Re-run your "analysis", it's obvious that SPARC sucks.
Can you even RUN Ubuntu on SPARC anymore?
Their FP performance sucks and it always have. That's why The Niagara T2 had to have FPUs ADDED to ALL of the cores because sharing a single FPU with 8 cores was a really bad/dumb idea.
I've looked at SPARC before. Had a couple of them and had a SunFire server before as well, and POWER/Intels can easily beat SPARC, especially once you consider TCO.
The company that I work for now (a Fortune 10 company) dumped all of the SPARC workstations for Intel.
RISC is RISKY! - Tuesday, August 2, 2016 - link
I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply of processors!DomOfSF - Thursday, July 21, 2016 - link
Johan de Gelas: blowing minds and educating "the rest of us" since...I dunno, a really long time ago (especially in internet years). Great job on the data, but the real good stuff is in your thoughts and analysis. Thank you!close - Saturday, July 23, 2016 - link
Over a decade...JohanAnandtech - Thursday, July 28, 2016 - link
13 years in the server business, 18 years now of reviewing hardware :-). Thx !!jamyryals - Thursday, July 21, 2016 - link
It seems to me, Intel's focus on bringing their CPU architecture design all the way down to 5W is the reason IBM is able to stand out against them. Intel is focused on creating a scalable architecture while IBM can throw the whole kitchen sink at the server market.Fascinating article, I really enjoyed it.
smilingcrow - Thursday, July 21, 2016 - link
Intel has plenty of unique features in their server platforms which aren't in the consumer platforms so I don't think that is the issue.jospoortvliet - Tuesday, July 26, 2016 - link
The basic design of the core still is the same so there is probably at least some truth in the statement of Jamy.Kevin G - Wednesday, July 27, 2016 - link
Up until this point. Consumer SkyLake and server SkyLake are going to be two different designs. They're certainly related but server SkyLake will have 512 KB of L2 cache per core and support AVX-512 instructions.Server SkyLake is also going to support 3D Xpoint DIMMs, though that difference is more with the platform/chipset than the actual CPU core.
floobit - Thursday, July 21, 2016 - link
Very interesting. It seems odd to me that they chose to configure it in a 2U - except for big data clusters, most of the market space I see this playing is dominated by FC to a SAN. Is this a play in the big data cluster space, or the more traditional AIX/DB2/big iron that IBM has owned for so long?Some questions I'd have:
what virtualization is possible with this architecture? presumably just the standard PowerVM? How well does that work?
What is the impact of IO latency? Could you throw a P3700 or two in here?
JohanAnandtech - Thursday, July 21, 2016 - link
2U: Besides big data storage needs, I suspect 2U is necessary for adequate cooling for the POWER8 chip.Virtualization: Linux KVM works well as far as I know.
We actually tried out a P3700 in there (see: http://www.anandtech.com/show/9567/the-power-8-rev... ) and it worked very well. I asked IBM what a customer should expect when using third party storage (probably no support, but how about waranty?) but no answer yet.
mystic-pokemon - Friday, July 22, 2016 - link
Hi Johan2U is not necessary for cooling a POWER 8 Chip. We do that better with our Barreleye (1.25 OU design). Even storage wise Barreleye has 15 Disk storage bay that can be seen in below links.
http://www.v3.co.uk/v3-uk/news/2453992/google-and-...
Let me know if you wanna ever benchmark a Barreleye. What specific POWER8 proc are you benchmarking with ? (Turismo?). I believe it does slightly better than S812LC on many benchmarks based on the variant of power8 proc S812LC runs.
JohanAnandtech - Thursday, July 28, 2016 - link
Send me a mail at [email protected]abufrejoval - Thursday, August 4, 2016 - link
Hmm, a bit fuzzy after the first paragraph or so and evidently because I dislike malwaretizement: Such links should be banned!mystic-pokemon - Friday, July 22, 2016 - link
Hi floobitFor virtualization: powerVM and out of the box KVM (tested on Fedora 23, Ubuntu 15.04 / 15.10 / 16.04) work quite well. Xen doesn't work well or hasn't been officially tested / released.
tipoo - Thursday, July 21, 2016 - link
Fun! I was always curious about this processor.tipoo - Thursday, July 21, 2016 - link
Interesting that the L3 eDRAM not only allows them to pack in much more L3 (what was it, 3 SRAM transistors per eDRAM or something?), but it's also low latency which was a cited concern with eDARM by some people. Appears to be an unfounded fear.And then on top of that they put another large L4 eDRAM cache on.
Maybe Intel needs to play with eDRAM more...
tipoo - Thursday, July 21, 2016 - link
Lol, eDRAM, not eDARMKevin G - Thursday, July 21, 2016 - link
There was a change in how the L4 cache works from Broadwell to SkyLake on the mobile parts. The implication is that Intel was exploring the idea of a large L4 eDRAM for SkyLake-EP/EX parts. We'll see how that turns out as Intel also has explored using HMC as a cache for high bandwidth applications in Knights Landing. So either way, Intel has thus idea on there radar and we'll see how it pans out next year.tsk2k - Thursday, July 21, 2016 - link
Is it possible to run Windows on one of these?ZeDestructor - Thursday, July 21, 2016 - link
At the moment, a very solid no.That said, if enough partners ask for it and/or if the numbers make sense for Azure, MS will at the very least have a damn good look at porting Windows over.
DanNeely - Thursday, July 21, 2016 - link
It's probably just a case of doing QA and releasing it. They've sold a PPC build in the past; and maintain internal builds for a number of other CPU architectures to avoid accidentally baking x86isms into the core code.tipoo - Thursday, July 21, 2016 - link
They made PowerPC Windows? Source? I remember the Powermac G5s were the early dev kits for the xbox 360 due to the architecture similarity, but I assumed those stories meant they were just working in OSX or Linux on them.thunderbird32 - Thursday, July 21, 2016 - link
AFAIK, the last build of Windows for PPC was NT 4. So, it's been a while.Sunner - Thursday, July 21, 2016 - link
There were early builds of Windows 2000 for the RISC's as well, during the times when it was still called NT5. I had one of those from WinHEC, but alas I lost it when moving at some point. :(yuhong - Thursday, July 21, 2016 - link
AFAIK, the little endian PowerPC mode that NT4 used was killed when they went to 64-bit and is different from today's POWER8 little endian mode that was only recently introduced.Kevin G - Thursday, July 21, 2016 - link
I used to have such a disc for Windows NT4. That disk also had binaries for DEC Alpha and MIPS.BillyONeal - Thursday, July 21, 2016 - link
The Xbox 360 is a PPC machine, and runs a (heavily modified) version of Windows. My understanding is that most x86 assumptions had to be ferreted out to run on Itanium (early) and then on ARM (later).Einy0 - Thursday, July 21, 2016 - link
MS has builds that will run on anything. The real question is why would you want to? These chips are designed from the ground up to run massive work loads. It's a completely different style of computing than a Windows machine. Even MS server OSes aren't designed for this type of work. We are talking Banking, ERP and other big data applications. MS is still dreaming about scaling on that level. Right now their answer is clustering but that comes with it's own obstacles too.abufrejoval - Thursday, August 4, 2016 - link
Well there is always QEMU.And IBM has a much better binary translator from when they bought QuickTransit. That one originally translated Power to x86 for the Mac, then Sparc to x86 for Quicktransit and eventually x86 to Power for IBM so they could run Linux workloads on AIX.
Then what exactly do you mean with Windows (assuming this is actually a reasonable question)?
Server applications or desktop?
.NET has been ported to Linux and I guess could be made to run on Power. A Power runtime could certainly be done by Microsoft, if they wanted to.
I don't see why anyone would want to run Windows desktop workloads on this hardware, other than to show that it can be done: QEMU to that!
BedfordTim - Thursday, July 21, 2016 - link
I was intrigued to see how little effect hyper-threading with your Xeon. My own experience is that it gives a 50% boost although I appreciate there are many variables.Taracta - Thursday, July 21, 2016 - link
Something seems to be wrong with the Mem Hierarchy charts in the Intel L3 and 16MB section.JohanAnandtech - Thursday, July 21, 2016 - link
I don't think so, we just expressed it in ns so you can compare with IBM's numbers more easily. Can you elaborate why you think they are wrong?Taracta - Thursday, July 21, 2016 - link
Sorry, mixed up cycles with ns especially after reading the part about transition for the Intel from L3 to MEM.Sahrin - Thursday, July 21, 2016 - link
Yikes. Pictures without captions. Anandtech is terrible about this. ALWAYS caption your pictures, guys.djayjp - Thursday, July 21, 2016 - link
Are bar graphs not a thing anymore...?Drumsticks - Thursday, July 21, 2016 - link
Afaik, Anandtech has always used the chart when presenting things like SPEC. I'd guess it'd be for clutter reasons, but the exact reason is up to the editors to mention.JohanAnandtech - Thursday, July 21, 2016 - link
The reason for me is simply to give you the exact numbers and allow people to do their own comparisons.Drumsticks - Thursday, July 21, 2016 - link
Just to be clear, the Xeon CPU used today is 3 times more expensive than the Power8 CPU benchmarked? That's really impressive, isn't it? The Power8 has a pretty significant power increase, but if it's 43% faster, that cuts into the perf/w gap.I know we've only looked at SPEC so far in round 2, but this looks like a good showing for IBM. How big is the efficiency gap between 22nm SOI and 14nm FinFet? Any estimates?
Michael Bay - Thursday, July 21, 2016 - link
Selling at a loss is hardly impressive, especially in IBM`s case. This thing is literally their last chance.tipoo - Friday, July 22, 2016 - link
Is it at a loss, or is it just not at crazy Intel margins?Michael Bay - Saturday, July 23, 2016 - link
They`d have to have a healthy margin to offset all the R&D, plus IBM as a whole is not in a good financial position. Consider they sold their fab capability not so long ago.nobodyblog - Thursday, July 21, 2016 - link
Please correct this error, you are saying you are comparing with BEST Intel can provide, but you did address Xeon for workloads need Xeon Phi Knight Landing which is a standalone CPU, too. If you choose correctly, the benchmark will be sooo different.IBM Power 8 is 90 GB/s, while Intel's Xeon phi knight landing (as 7290F) has a bandwidth of 400 GB/s.
IBM power 8 does above 600 gflops single precision and above 300 gflops double precision FLOPs, this is *10 in Xeon phi 7290F.
Specint: xeon phi is 1500 vs 1700 for power 8
Power and Price aside....
Thanks!
LukaP - Thursday, July 21, 2016 - link
If we start comparing different product categories, why not bring the GP100 into this as well. It will deliver 10TFLOPS of single precision and can be had for much less than any of these. But then again, there is the same caveat as the Xeon Phi. You cant actually run an OS on it, you need a host CPU and then you dispatch kernels onto the accelerator. Even if its a socketed version.smilingcrow - Thursday, July 21, 2016 - link
You can boot from newer Xeon Phi; either current or the next generation due maybe this year!LukaP - Thursday, July 21, 2016 - link
Oh really? :o that is neat, though not sure if that useful, since even highly parallel tasks usually have some IPC dependent components...Anyways have you got a source for that, would love to read more
Drumsticks - Thursday, July 21, 2016 - link
I'm a verification intern on the Phi team right now, and you can indeed boot Knight's Landing! Anandtech mentions it here: http://www.anandtech.com/show/9802/supercomputing-...nobodyblog - Friday, July 22, 2016 - link
Then you can add another xeon phi to above statistics... Xeon Phi KL is a CPU like other CPUs it does everything as mentioned even its specint is comparable, not so bad...Thanks!
tipoo - Friday, July 22, 2016 - link
Xeon Phi is x86, but it's GPU-like in nature, massively parallel for performance with low per-core performance. The IBM Power8 and other Xeons compete in highly parallel spaces like banking, but where single thread performance also still matters. Can't compare them.nobodyblog - Friday, July 22, 2016 - link
Xeon Phi Knight Landing has 3 times more single thread performance than silvermont (& knight corner).. I don't think it is so bad...The comparison is truly so, see the benchmarks, they say specint for example, or anything parallel performance, additionally, you can use a Xeon high performance with a xeon phi, there is nothing that prevents you. The benchmark is not about Database performance or parsing or anything similar, it is about this article, I don't say xeon phi is currently better positioned than xeon in these uses... But IBM's Power is not so, too, it has lots of core and lots of threads which is usable only in massive parallel uses...
Thanks!
nobodyblog - Friday, July 22, 2016 - link
On the IBM server, numactl was used to physically bind the 2, 4, or 8 copies of SPEC CPU to the first 2, 4, or 8 threads of the first core. On the Intel server, the 2 copy benchmark was bound to the first core. It is not single thread, it is a trick IBM uses to cheat in benchmarks, it is 425% percents slower than xeon in single thread.Thanks!
jospoortvliet - Tuesday, July 26, 2016 - link
The benchmarks here pit one core against one core. The IBM cores can run 1, 2, 4 or 8 threads on a single core, the Intel does 1 or 2. The 425%, not sure where that number comes from, but it isn't what shows out of these benchmarks.The benchmarks show, as described by Johan:
In single thread, the IBM does about 13% less work than the Intel core. In 2-thread mode, the IBM does about 20% more than the intel across the two threads. The intel doesn't do more than 2 threads, the IBM can and does then, on average, 43% more work across the eight threads than the Intel does with its two.
So Intel is single-thread master here, IBM is throughput king. Now if you have a HEAVILY threaded workload, with hundreds of threads and little latency requirements for each, Knights Landing or a GPU is a better choice, with their hundreds of cores. If latency is important and you can afford to use two to four threads per core the IBM performs best. If latency is everything, you keep it at 1 thread per core and the Intel Xeon is the best performer.
That is entirely ignoring cost, of course, both Intel and IBM have high and low cost solutions with their downsides and benefits. This set of benchmarks simply pitted one core against another, entirely ignoring the differences in core count (IBM 10, Intel 22) and price (Intel orders of magnitude more expensive). You'll always have to look at a bigger picture: how many cores do you get for your dollar and what are your requirements.
Performance/watt, the Intel probably wins in all area's, at least if the system is idle frequently. Without idle the IBM might be not that bad, perf/power wise.
The big take-away from this article is, though, that IBM has built a system which can be quite price-competitive with Intel in the lower-high end market. To really be able to make a choice, we'd probably need a benchmark of two price-equivalent systems. I bet the workload would make a huge difference in who wins the price/performance fight.
abufrejoval - Thursday, August 4, 2016 - link
I believe "heavily threaded" is somewhat imprecise here: Knights Landing (KNL) is really more about vectorized workloads, or one very loopy and computationally expensive problem, which has been partitioned into lots of chunks, but has high locality. Same code, related data, far more computational throughput than data flowthrough.Power8 will do better on such workloads than perhaps Intel, but never as good as a GPU or KNL.
However it does evidently better per core on highly threaded workloads, where lots of execution threads share the same code but distinct or less related datasets, less scientific and more commercial workloads, more data flowing through.
Funnily KNL might even do well there, beating its Xeon-D sibling in every benchmark, even in terms of energy efficience.
But I'm afraid that's because most of the KNL surface area would remain dark on such workload while the invests would burn through any budget.
KNL is an odd beast designed for a rather specific job and only earn its money there, even if you can run Minecraft or Office on it.
Kevin G - Friday, July 22, 2016 - link
I do think comparison with Xeon Phi is fair since it can run/boot itself now with Knight's Landing. Software parity with the normal x86 ecosystem is now there so it can run off the shelf binaries.I am very curious how well such a dense number of cores perform for workloads that don't need high single threaded performance.
Another interest factor would be memory bandwidth performance as Xeon Phi has plenty. The HMC only further enhances that metric and worth exploring it as both a cache and main memory region for benchmarks.
Ratman6161 - Thursday, July 21, 2016 - link
Will you be addressing virtualization in a future article. I ask this because you are saying the lower cost Power8 systems are intended to compete with the Dell's, HP's, Lenovo etc x86 servers. But these days, a very high percentage of x86 work loads are virutalized either on VMWare or competing products. In 2009 Gartner had it at about 50% and by 2014 it was at 70%. I didn't find a number for '15 or '16 but I expect the percentage would have continued to rise. So if they want to take the place of x86 boxes, they have to be able to do the tasks those boxes do...which tends to largely be to run virtual machines that do the actual workloads.And, what about all the x86 boxes running Windows Server or more commonly Windows Server Virtual machines? Windows Server shops aren't likely to ditch windows in favor of Linux solely for the privilege of running on Power8?
One last thing to consider regarding price. These days we can buy quite robust Intel based server for around $10K. So, supposing I can buy a Power8 system for about the same price? Essentially the hardware has gotten so cheap compared to the licensing and support costs for the software we are running that its a drop in the bucket. If we needed 10 Intel servers or 6 Power 8's to do the same job (assuming the Power8's could run all our VM's), the Power8's could come out lower priced hardware wise, but the difference is, as I said, a drop in the bucket in the overall scheme of things. Performance wise, with the x86 boxes, you just throw more cores at it.
aryonoco - Friday, July 22, 2016 - link
KVM works well on POWER.No idea about proprietary things like VMWare. But that would be up to them to port.
Ratman6161 - Friday, July 22, 2016 - link
Near as I can tell, there is a PowerKVM that runs on Power 8 but that doesn't allow you to run Windows Server VM's - seems to support only Linux guests.Zetbo - Saturday, July 23, 2016 - link
Windows does not support POWER, so there is no point of using POWER if you need Windows!utroz - Thursday, July 21, 2016 - link
AMD should have used IBM's 22nm SOI to make cpu's so that they would not have been totally dead in the performance and server cpu market for years. GF now owns this process as they "bought" IBM's fabs and tech. I think that 22nm SOI might be better for high speed cpu's than the 14nm LPP FinFet that AMD is using for ZEN at the cost of die size.amagriva - Thursday, July 21, 2016 - link
How much you payed your cristal ball?spikebike - Thursday, July 21, 2016 - link
So a single socket Power8 is somewhat faster than the intel chip. But is being compared in a single socket configuration where the intel is designed for a two socket. Unless the power8 is cheaper than an intel dual socket seems most fare to compare both CPU as they are designed to be used.SarahKerrigan - Friday, July 22, 2016 - link
Power is designed for systems up to 16 sockets (IBM E880.) One socket is just the entry point.zodiacfml - Thursday, July 21, 2016 - link
Like a good TV series, I can't wait for the next episode.aryonoco - Friday, July 22, 2016 - link
OK, this is literally why Anandtech is the best in the tech journalism industry.There is nowhere else on the net that you can find a head to head comparison between POWER and Xeon, and unless you work in the tech department of a Fortune 500 company, this information has just not been available, until now.
Johan, thank you for your work on this article. I did give you beef in your previous article about using LE Ubuntu but I concede your point. Very happy to you are writing more for Anandtech these days.
Xeons really need some competition. Whether that competition comes from POWER or ARM or Zen, I am happy to see some competition. IBM has big plans for POWER9. Hopefully this is just the start of things to come.
JohanAnandtech - Friday, July 22, 2016 - link
Thanks! it is very exciting to perform benchmarks that nobody has published yet :-).In hindsight, I have to admit that the first article contained too few benchmarks that really mattered for POWER8. Most of our usual testing and scripting did not work, and so after lot of tinkering, swearing and sweat I got some benchmarks working on this "exotic to me" platform. The contrast between what one would expect to see on POWER8 and me being proud of being able to somewhat "tame the beast" could not have been greater :-). In other words, there was a learning curve.
tipoo - Friday, July 22, 2016 - link
I found it very interesting as well and would certainly not mind seeing more from this space, like maybe Xeon Phi and SPARC M7jospoortvliet - Tuesday, July 26, 2016 - link
Amen. But, to not ask to much, just the prospect of part 2 of the Power benchmark is already super exciting. Yes, the Internetz need more of this!Daniel Egger - Friday, July 22, 2016 - link
Not quite sure what the Endianess of a systems adds to the competitive factor. Maybe someone could elaborate why it is so important to run a system in LE?ZeDestructor - Friday, July 22, 2016 - link
Not much, really, with the compilers being good and all that.Really, it's quite clearly there just for some excellent alliteration.
JohanAnandtech - Friday, July 22, 2016 - link
Basically LE reduces the barrier for an IBM server being integrated in x86 dominated datacenter.see https://www.ibm.com/developerworks/community/blogs...
Just a few reasons:
"Numerous clients, software partners, and IBM’s own software developers have told us that porting their software to Power becomes simpler if the Linux environment on Power supports little endian mode, more closely matching the environment provided by Linux on x86. This new level of support will *** lower the barrier to entry for porting Linux on x86 software to Linux on Power **."
"A system accelerator programmer (GPU or FPGA) who needs to share memory with applications running in the system processor must share data in an pre-determined endianness for correct application functionality."
Daniel Egger - Friday, July 22, 2016 - link
While correct in theory, this hasn't been a problem for the last 20 years. People are used to using BE on PPC/POWER, the software, the drivers and the infrastructure are very mature (as a matter of fact it was my job 15 years ago to make sure they are). PPC/POWER actually have configurable endianess so if someone wanted to go LE earlier it would have easily been possible but only few ever attempted that stunt; so why have the big disruption now?KAlmquist - Friday, July 22, 2016 - link
I assume that this is about selling POWER boxes to companies that currently run all x86 servers, and have a bunch of custom software that they might be willing to recompile. If the customer has to spend a bunch of time fixing endian dependencies in his software in order to get it to work on POWER, it will probably be less expensive for them to simply stick with x86.HellStew - Wednesday, July 27, 2016 - link
It depends what kind of software you are running. If you are running giant backend workloads on x86, you can seamlessly migrate that data to PPC while keeping custom front ends running on x86.aryonoco - Saturday, July 23, 2016 - link
Johan, maybe the little endian-ness makes a difference in porting proprietary software, but pretty much all open source software on Linux has supported BE POWER for a long time.If you get the time and the inclination Johan, it would be great if you could say do some benchmarks on BE RHEL 7 vs LE RHEL 7 on the same POWER 8 system. I think it would make for fascinating reading in itself, and would show if there are any differences when POWER operates in BE mode vs LE mode.
aryonoco - Saturday, July 23, 2016 - link
Actually scrap that, seems like IBM is fully focusing on LE for Linux on POWER in future. I'm not sure there will be many BE Linux distributions officially supporting POWER9 anyway. So your choice of focusing on LE Linux on POWER is fully justified.HellStew - Wednesday, July 27, 2016 - link
Side note: Once you are running KVM, you can run any mix of BE and LE linux varieties side by side. I'm running FedoraBE, SuSE BE, Ubuntu LE, CentOS LE, and (yes a very slow copy of windows) on one of these chipsrootbeerrail - Saturday, July 23, 2016 - link
If a machine is completely isolated, it doesn't matter much to the machine. I personally find BE easier to read in hex dumps because it follows the left-to-right nature of English numbers, but there are reasons to use LE for human understanding as well.The problem shows up the instance one tries to interchange binary data. If the endian order does not match, the data is going to get scrambled. Careful programming can work around this issue, but not everyone is a careful programmer - there's a lot of 'get something out the door' from inexperienced or lazy people. If everything is using the same conventions (not only endian, but size of the binary data types (less of a problem now that most everything has converged to 64-bit)), it's not an issue. Thus having LE on Power makes the interchange of binary data easier with the X86 world.
errorr - Friday, July 22, 2016 - link
Great Article! Just an FYI, the term "just" as in "just out" on the first page has different meanings on opposite sides of the Atlantic and is usually avoided in writing for international audiences. I'm not quite sure which one is used her. The NaE would mean 'just out' in that it had come out right before while the BrE would mean it came out right after the time period referenced in the sentence.xCalvinx - Friday, July 22, 2016 - link
awesome!!..keepup the good work..looking forward to Part2!! ... actualy cant wait.. hurryup lol.. :)double thumbsup
Mpat - Friday, July 22, 2016 - link
Skylake does not have 5 decoders, it is still 4. I know that that segment of the optimization manual is written in a cryptic way, but this's what actually happened: up until Broadwell there are 4 decoders and a max bandwidth from the decoder segment of 4 uops. If the first decoder (the complex one) produces 4 uops from one x86 op, the other decoders can't work. If the first produces 3, then the second can produce 1, etc. this means that the decoders can produce one of these combinations of uops from an x86 op, depending on how complex a task the first decoder has: 1/1/1/1, 2/1/1, 3/1, or 4. Skylake changes this so the max bandwidth from that segment is now 5, and the legal combinations become 1/1/1/1, 2/1/1/1, 3/1/1, and 4/1. You still can't do 1/1/1/1/1, so there is still only 4 decoders. Make sense?ReaperUnreal - Friday, July 22, 2016 - link
Why do the tests with GCC? Why not give each platform their full advantage and go with ICC on Intel and xLC on Power? The compiler can make a HUGE difference with benchmarks.Michael Bay - Saturday, July 23, 2016 - link
It`s right in the text why.jospoortvliet - Tuesday, July 26, 2016 - link
The point Johan makes is that his goal is not to get the best bechmark scores but the most relevant real life data. One can argue if he succeeded, certainly the results are interesting but there is much more to the CPU's as usual. And I do think his choice is justified, while much scientific code would be recompiled with a faster compiler (though the cost of ICC might be a problem in a educational setting), many businesses wouldn't go through that effort.I personally would love to see a newer Ubuntu & GCC being used, just to see what the difference is, if any. The POWER ecosystem seems to evolve fast so a newer platform and compiler could make a tangible difference.
But, of course, if you in your usecase would use ICC or xLC, these benches are not perfect.
Eris_Floralia - Friday, July 22, 2016 - link
Are these two processor both tested at the same frequency?or at their stock clock?tipoo - Friday, July 22, 2016 - link
The latter, page 52.92-3.5 GHz vs 2.2-3.6 GHz
abufrejoval - Thursday, August 4, 2016 - link
Well since Johan really only tested one core on each CPU, it would have been nice to have him verify the actual clock speed of those cores. You'd assume that they'd be able to maintain top speed for any single core workload independent of the number of threads, but checking is better than guessing.roadapathy - Friday, July 22, 2016 - link
22nm? *yawwwwwwwwwn* Come on IBM, you can do better than that, brah.Michael Bay - Saturday, July 23, 2016 - link
Nope, 22 is the best SOI has right now. You have to remember it`s nowhere near standard litographies customer-wise.tipoo - Monday, July 25, 2016 - link
In addition to what Michael Bay (lel) said, remember that only Intel really has 14nm, when TSMC and GloFlo say 14/16nm they really mean 20nm with finfetts.feasibletrash0 - Saturday, July 23, 2016 - link
using a less capable compiler (GCC) to test a chip, and not using everything the chip has to offer seems incredibly flawed to me, what are you testing exactlyaryonoco - Saturday, July 23, 2016 - link
He's testing what actual software people actually run on these things.On your typical Linux host, pretty most everything is compiled with GCC. You want to get into exotic compilers? Sure both IBM and Intel have their exotic proprietary costly compilers, so what. Very few people outside of niche industries use them.
You want to compare a CPU with CPU? You keep the compiler the same. That's just common sense. It's also how the scientific method works!
feasibletrash0 - Sunday, July 24, 2016 - link
right, but you're comparing, say 10% of the silicon on that chip, and saying that the remaining 90% of the transistors making the chip does not matter, they do; if the software is not using them, that's fine, but it's not an accurate comparison of the hardware, it's a comparison of the softwareMichael Bay - Sunday, July 24, 2016 - link
Hardware does not exist for its own sake, it exists to run software. AT is entirely correct in their methodology.jospoortvliet - Tuesday, July 26, 2016 - link
I'd argue it is the other way around, GCC might leave 5-10% performance on the table in some niche cases but does just fine most of the time. There's a reason Intel and IBM contribute to GCC - to make sure it doesn't get too far behind as they know very well most of their customers use these compilers and not their proprietary ones.Of course, for scientific computing and other niches it makes all the difference and one can argue these heavy systems ARE for niche markets but I still think it was a sane choice to go with GCC.
abufrejoval - Thursday, August 4, 2016 - link
Actually exercising 90% of all transistors on a CPU die these days, is both very hard to do (next to impossible) and will only slow the clock to avoid overstepping TDP.And I seriously doubt that the GCC will underuse a CPU at 10% its computational capacity.
Actually from what I saw the GCC by itself (compiling) was best at exploiting the full 8T potential of the Power8. And since the GCC is compiled by itself, that speaks for the quality of machine code that it can produce, if the source allows it. And that speaks for the quality of the GCC source code, ergo prove you can do better before you rant.
abufrejoval - Thursday, August 4, 2016 - link
Well this is part 1 and describes one scenario. What you want is another scenario and of course it's a valid if a very distinct one.Actually distinct is the word here: You'd be using a vendor's compiler if your main job is a distinct workload, because you'd want to squeeze every bit of performance out of that.
The problem with that is of course, that any distinct workload makes it rather boring for the general public because they cannot translate the benchmark to their environment.
AT aims to satisfy the broadest meaningful audience and Johan as done a great, great job at that.
I'm sure he'll also write a part 4711 for you specifically, if you make it economically attractive.
Hell, even I'd do that given the proper incentive!
Zan Lynx - Sunday, July 24, 2016 - link
Using GCC as the compiler is also why (in my opinion) the Intel chips aren't using their full TDP. Large areas of Intel chips are dedicated to vector operations in SSE and AVX. If you don't issue those instructions then half the chip isn't even being used.Some gamers who love their overclocked Intel chips have actually complained to game engine developers who add AVX to the game engine. Because it ruins their overclock even if the game runs much faster. Then they're in the situation of being forced to clock down from 4.5 GHz to 3.7 in order to avoid lockups or thermal throttling.
Kevin G - Sunday, July 24, 2016 - link
The Xeon E3 v3's had different clock speeds for AVX code: it consumed too much power and got too hot while under total load.This holds true on the E5 v4's but the AVX penalty is done on a core-by-core basis, not across the entire chip. The result is improved performance in mixed workloads. This is a good thing as AVX hasn't broken out much beyond the HPC markets.
talonted - Monday, July 25, 2016 - link
For those interested in getting a Power8 workstation. Check out Talos.https://www.raptorengineering.com/TALOS/prerelease...
137ben - Monday, July 25, 2016 - link
I made an account to say that this article (along with the subsequent stock-cooler comparison article) is why I really love Anandtech. A lot of the code I run/write for my research is CPU-bottlenecked. Still, until the last year or so, I didn't know very much about hardware. Now, reading Anandtech, I have learned so much more about the hardware I depend on from this website than from any other website. Most just repeat announcements or run meaningless cursory synthetic benchmarks. The fact that Johan De Gelas has written such a deep dive into the inner workings of something as complex as a server CPU architecture, and done it in a way that I can understand, is remarkable. Great job Anandtech, keep it up and I'll always come back.JohanAnandtech - Thursday, July 28, 2016 - link
You made me a happy man, I achieved my goal :-)alpha754293 - Wednesday, July 27, 2016 - link
Excellent work and review as always Johan. I would have been interest to see how the two processors perform in floating point intensive benchmarks though...JohanAnandtech - Thursday, July 28, 2016 - link
Ah, you will have to wait for the improved P8 which is the first Power going after HPC :-)RISC is RISKY! - Tuesday, August 2, 2016 - link
I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply.RISC is RISKY! - Tuesday, August 2, 2016 - link
of processors!rootvgnet - Friday, August 12, 2016 - link
Johan - interesting article, I enjoyed it - especially after I discovered how to get to the next page.As far as the comments go - 1) a good article will get a diverse response (from those with an open, read querying, mind.
2) I agree with those who, in other words are saying: "there is no 'one size fits all'." And my gut reaction is that you are providing a level of detail that assists in determining which platform/processor "fits my need"
Looking forward to part2.