This is the future. Single thread performance has reached a dead end and parallelism is the only way forward. Intel's legacy architecture is a millstone around its neck. ARM's open model and efficient implementation will deliver more cores and more performance as software adapts.
The monopolists monopolise themselves into irrelevance yet again.
" Intel's legacy architecture is a millstone around its neck."
I wouldn't call those Xeon-D parts putting up excellent performance at lower prices and vastly lower power consumption levels to be any kind of "millstone".
"ARM's open model and efficient implementation "
What's "open" about these Cavium chips exactly? They can only run a few specialized Linux flavors that don't even have the full range of standard PC software available to them.
What is efficient about a brand-new ARM chip from 2016 losing at performance per watt to the 4.5 year old Sandy Bridge parts that you were insulting?
As for monopolies, ARM has monopolized the mobile market and brought us "open" ecosystems like the iPhone walled-garden and Android devices that literally never receive security updates. I'd take a plain x86 PC that I can slap Linux on any day of the week over the true monopoly that ARM has over locked-down smartphones.
You're right to criticize the "millstone" comment, Intel has done quite well achieving both high performance and high performance-per-watt in their server designs.
But your comment about a "true monopoly" in the "locked-down smartphone" market is ridiculous. The openness (or lack thereof) that you're complaining about has nothing to do with the CPU architecture at all. An x86 smartphone or tablet can just as easily be locked down, and they are. I own a Dell Venue 8 7000, which is an Android tablet with an Intel Atom SoC inside. It's a great tablet with great hardware. But it's got a bunch of uninstallable crapware installed, Dell abandoned it after 5.1 (it's ridiculous that a tablet with a quad-core 2GHz SoC and 2GB RAM will never see Marshmallow), and the locked smartphone-esque bootloader means I can't repurpose it to a Linux distro even if one existed that supported all the hardware inside this thing.
On the flipside, the most popular open-source learning/development solution out there right now is the ARM-based Raspberry Pi. There are a number of Linux distros available for it, and everything is OSS, even the GPU driver.
Some mobile devices were coming with Intel. But like Microsoft it entered the market too late, without offering any real value. The phrase "Too little, too late" fit them both.
ARM didn't do a monopoly. They just simply saw an opportunity and embrace it. In the early IBM clone days Intel licensed their architecture to allow competition and broad arrange of products. After the market was won, they went greedy, didn't licensed the architecture anymore and cut a lot of players out, leaving a need for a chip licensing scheme. And that's where ARM got in.
Google develops Android OS, but is up to phone vendors and carriers to deploy them. And they don't want to for economic reasons. They prefer to sell you a new phone for $$$.
Intel and MS got in the mobile/car market exactly what they deserve, nothing else.
they all greedy. Some just play it smartly or have more luck in decision making But, yea, when you read about the way IBM behaved when things were fresh - it's quite amazing. They had much of the market and could do a lot of stuff, but they simply had a very narrow mind set
You make it sound like it's mostly a SW problem, I think it's more complex than that. Actual performance is very dependent on the types of workload and some tasks fit Intel CPUs nicely and the performance per watt for ARM is lacking despite the hype of that architecture being uniquely qualified for low-power. It will be fun to watch how the battle evolves though.
Not necessarily - (read Amdahl's law of diminishing returns). The performance actually depends on the workload. Having a million cores guarantees nothing in terms of performance unless the workload is parallelizable which in the real world is not as much as we think it could be. I'm curious to see how xeon merged with altera programmable fabric performs than ARM on a server.
Technically true but every generation that millstone gets a little smaller, the die area and power needed to translate x86 into uops isn't huge and reduces every generation.
Interesting. Faster in a few workloads where heavy use of multi-thread is important, but significantly slower in more single thread workloads. For server use, you don't always want parallelized tasks. The results are pretty much across the board for all the processors tested: If the ThunderX was slower, it was slower than all the Intel chips. If it were faster, it was faster than all but the highest end Intel Chips. With the price only being slightly lower than the cheapest Intel chip being sold, I don't think this is going to be a Xeon competitor at all, but will take a few niche applications where it can do better.
With no significant energy savings, we should be looking forward to the ThunderX2 to see if it will bring this into a better alternative.
There is hardly a server workload where you don't get better throughput by throwing more cores and servers at it. Servers are NOT about parallelized task, but about concurrent tasks. That's why while desktops are still stuck at 8 cores, server chips come with 20 and more... Server workloads are usually very simple, it is just that there is a lot of them. They are so simple and take so little time it literally makes no sense parallelizing them.
In the scenario you described, the single-thread performance takes on even more importance, thus highlighting the advantage the Xeon's currently have in most server configurations.
But kernels and VMWare know how to schedule multiple threads on 1 core if it's not being fully utilized. Single threaded IPC can make up for not having as many cores. See the iPhone SoCs for another example.
Not if you have thousands of concurrent workloads and only like 8 cores. As fast as each core might be, the overhead from workload context switching will eat it up.
Yeah if each task is not significantly longer than a context switch. Context switches are very fast, especially with processors with many sets of SMT registers per core.
If what you suggest is correct, then intel would not be investing chip TDP in more cores but higher clocks and better single threaded performance. Clearly this is not the case, as they are pushing 20 cores at the fairly modest 2.4 Ghz.
Alright well if you don't understand why many slower cores are more power efficient even if there was a 0 cycle penalty on context switching then you aren't worth having this discussion with.
48 cores of server processing on 16mb of l2 and 4 channels of RAM? What is this thing designed for. Will be like running single channel celerons as server processors, so decent hypervisor hosts are out, and so is any database work more complex than dynamic web pages.
Facebook is specifically mentioned as being interested in this, so dynamic web-pages is definitely a valid use-case here. HHVM for example is pretty light on memory usage (so is PHP7 now), especially in high demand cases where you're really only running a single set of scripts, probably cached in a compiled form, plus both scale really well across as many cores as you can throw at them.
Things like nginx and MariaDB will be the same, so they're absolutely intended use-cases for this kind of chip, and I think it should be very good at it.
With no L3 and slow RAM access I'm not sure where you think the scrips will cache. Assuming you ran them on bare metal (horrifying waste of compute) there would be enough, but if you had docker instances or quick spin vms doing your work (as 99% of web servers are) then each instance will only get the tiniest slice of cache to work with. It would be like running your servers, as I said, on a bank of celerons. Except celerons have L3 and don't carry 12 cores per memory channel.
Hopefully someone will release a server chip using 64 cortex A73 cpu cores, i'm pretty sure the cortex a73 will be more power efficient than xeon d. Xeon d beats cortex a57 in power efficiency but i'm pretty sure than cortex a72 will be similar and cortex a73 will beat it.
Interesting article. This does appear to be the first semi-credible part from an ARM server vendor.
Having said that, the energy efficiency table at the end should put to rest any misconceived notions that ARM is somehow magically energy efficient while X86 isn't.
Considering that Xeon E5-2690 v3 is a 4.5 year old Sandy Bridge part made on a 32 nm process and it still has better performance-per-watt than the best ARM server parts available in 2016, it's pretty obvious that Intel has done an excellent job with power efficiency.
2 CajunArson: (1) you can't compare energy efficiency of CPUs made on different nodes. 28nm versus 14nm? This is apple to oranges. (2) Xeon E5-2690 *v3* is Haswell and not Sandy Bridge and it's not 4.5 years definitely.
While you are right on the actual age of the chip, if you dont compare efficiency on different nodes, how on earth would you know if you made any progress?
Unless you are suggesting that one should never compare one generation of chips to another, which is simply ludicrous. Where is this "you cane compare two different nodes" mindset coming from? I've seen it in the GPU forums as well, and it makes no sense.
The E5-2600 v3 is a Haswell part, meaning it's Intel's second ("tock") core design on 22nm. So not only is this a smaller process, it's a second-gen optimization on a smaller process.
For a first-gen 28nm part that includes power-hungry features like multiple 10GbE, these are some very promising initial results. A 14nm die shrink should create some real improvements off the bat in terms of performance per watt, and further optimizations from there should make this thing really shine.
Given that Intel hasn't cracked 10nm at all yet, and it'll take a while for 10nm Xeons to show up once they do, Cavium has room to play catch-up. I mean, hell, they're keeping up/surpassing Xeon D in some use cases NOW, and that's a 14nm part. What Cavium needs most is power optimization at this point, and I'm sure they'll get there in time.
Last I saw Intel is already running their test fabs at 10nm. Once they perfect it in the test fabs it only takes them about 6 months to roll it into a full scale fab. Maybe you an point to this source that indicates Intel has failed at 10nm.
Nice article, but really looking to see testing of ThunderX2 and X-Gene 3. Will be interesting as Intel seems to be kind of struggling with single-threaded performance recently...
Of course, if Anandtech uses ICC, they should use better flags in gcc for ARM/ThunderX as well (core specific flags, NEON, etc). Both ICC and targeted flags give improvements. Often large ones. This was a generic test.
For integer workloads, ICC is not that much faster than gcc (See Andreas Stiller's work). And there is the fact that ICC requires licensing and other time consuming stuff. From a linux developer/administrator perspective, it is much easier just to use gcc, you simply install it from repositories, no licensing headaches and very decent performance (about 90% of icc). So tha vast majority of the **NON HPC ** software is compiled with gcc. Our added value is that we show how the processors compare with the most popular compiler on linux. That is the big difference between benchmarking to put a CPU in the best light and benchmarking to show what most people will probably experience.
Until Intel makes ICC part of the typical linux ecosystem, it is not an advantage at all in most non-HPC software.
"The one disadvantage of all Supermicro boards remains their Java-based remote management system. It is a hassle to get it working securely (Java security is a user unfriendly mess), and it lacks some features like booting into the BIOS configuration system, which saves time."
It's IPMI, you can use any IPMI client to connect to it. Once you give it an IP and password in the BIOS, you can connect to it using your IPMI client of choice. There's also a web interface that provides most of the features of their Java client (I think that uses Java as well, but just for the console).
For our SuperMicro servers, I just use ipmitool from my Linux station and have full access to the console over the network, including booting it into the BIOS, managing the power states, and even connecting to the serial console over the network.
Not sure why you'd consider a full IPMI 2.0 implementation a downside just because the default client sucks.
Good suggestion. I have been using an ipmi client to manage several other servers, like the IBM servers. However, such a GUI client is still a bit more userfriendly, ipmi commands can get complicated if you don't use them regularly. The thing is that HP and Intel's BMC GUI are a lot easier to use and more reliable.
I think you may have an inaccurate figure of 141 at idle (in the graph) for the Thunder. "makes us suspect that the chip is consuming between 40 and 50W at idle, as measured at the wall"
If you look at the Column "peak vs idle", you see 82W. At peak, we assume that a 120W TDP chip will probably need about 130W. 130W - 82W (both measured at the wall) = 50W for the SoC alone at idle measured at the wall, so anywhere between 40-50W in reality. My Calculation is a "guestimate", but it is clear that the Cavium chip needs much more in idle than the Intel chips.(10-15W) .
It loses very clearly in performance/watt to Xeon-D. In this segment the lower price doesn't matter in that case and the fact that it has a process disadvantage doesn't matter either. What counts is the end result. And I doubt it would cost $800 if made on 14/16nm. I mean why would anyone buying this take the risk? Safer bet to go with Intel also due to more flexible use (single and multi threaded). The latency issue is mentioned but downplayed.
So downplayed. Anandtech desperately wants ARM servers, but its a solution looking for a problem. Big web front ends running on bare metal are such a small percentage of the server market that developing for it seems stupid. Xeon-D was already in development for SANs, they just repurposed it for docker and nginx.
Very nice article. I especially liked the emphasis on relations of test numbers and real world workloads and what was problematic during the testing.
It would be great to see the same style desktop CPU review (Zen?) form you instead of mix of reprinted marketing hype with silly benchmark numbers dump that plagues this site for quite some time now.
Some annoying typos here and there, like "It is clear that the ThunderX is a match for high frequency trading", but nothing really bad.
I could hardly disagree more about the remote management of SuperMicro vs. HP. Remote management of HP is *the horror*, I've never seen worse and I've seen a lot. It's clunky, it requires a license to be useful (others do to but SuperMicro does not have such nonsense), the BCM tends to crash a lot (which is very annoying for a remote management solution), boot is even slower than all other systems I know due to the way they integrate the BIOS and remote management on the system and it also uses Java unless you have Windows machines around to use the .NET version.
For the remote management alone I would chose SuperMicro over most other vendors any day.
I found the .Net client of HP much less sluggish, and I have seen no crashing at all. I guess there is no optimal remote management client, but I really like the "boot into firmware" option that Intel implemented.
Not only that but Supermicro actually releases updates for their BCM's. I had the same shocked reaction to the HP claim. Started to wonder if I was the only one that thought supermicro was light years ahead in usability.
I should note that Supermicro's awful Java tool works on Linux as well as windows. Though it refuses to run if your Java isn't the newest version available.
All these articles and yet still no review for the GTX 1080, while other major sites have already posted their reviews of both 1070 & 1080. Guru3D already has 2 custom 1080 and a custom 1070 review up.
Maybe. Maybe not. But it's my own fault regardless. All I can do is get it done as soon as I reasonably can, and hope it's something you guys find useful.
Not looking to impress anyone. As a long time viewer of this site, I'm simply disappointed that a reputational site like this is constantly late for GPU reviews.
I'm not sure how this is relevant. Johan doesn't review graphics cards, other people at Anandtech do. I bet Guru3D has a much bigger team for that, and I imagine that they have a much narrower scope (i.e. no server stuff).
I don't think I've looked at a review recently that hasn't had the comments section polluted with "where is the review for x".
Intel allows their Xeons to sometimes pull double their TDP? No wonder our new machines trip breakers long before I thought they would. I need to test instead of assuming accurate documentation.
I can see why you chose C-Ray, I'm just sorry a more general ray tracer was not chosen. Still, not it's intended market, though I am suddenly very interested. Ray-tracing and video encoding are my top two tasks.
Yes, it stands for thermal, but power doesn't consumed doesn't just disappear. Convert it to light, convert it to motion, convert it to heat, etc. In this case there is a small amount of motion (electrons) and the rest has to be heat. I expect much higher instantaneous pulls, but this was sustained power. Anyway, I will track down the AVX documentation mentioned below.
I saw the h264ref. I'll be curious about x264 (handbrake) as the authors seem interested in ARM in the last few years. Unsurprisingly, it is far less optimized than x64. I benchmarked handbrake on the Pi2, Pandaboard, and CI-20 last year, just to see what it would do.
Why would Cavium not try and use 54 x A73s in their next chip?
If ARM are not in the business of making Silicon, and ARM think the '1.2W Ares' will help them break into the Server market ... Then Why do we think ARM isn't working with the likes of Cavium to get a Server SoC that rocks the Intel boat?
Typos From memory : send -> sent. Through-> thought. There were a few others.
How do you know ARM aren't working with such a vendor? ARM has always said that they expect ARM server CPUs to only be marginally competitive (for very limited situations) in 2017, and to only be really competitive in 2020.
That suggests, among other things, that if they are working with partners, they have a target launch between those two dates, and they regard all launches before 2017 as essentially nice for PR and fr building up the ecosystem, but essentially irrelevant for commercial purposes.
The problem as pointed out early in this article is that ARM keeps targeting Intel's current products, not the ones that will be out when they get their products out. We've had almost a dozen vendors get to the point of releasing the chip and drop it because it is simply not competitive with Intel. Most of these arm products were under taken when Intel was targeting performance without regard to performance/watt. Now that intel targets the later metric arm server chips haven't been competitive with them.
Fact is Intel could decimate and totally take over all the markets arm chips occupy, but to do it they'd have to cannibalize their existing high profit sales. This is why they keep canceling Atom chips, the chips turned out so good they were worried they'd cannibalize much more expensive products. This is the reason Avoton is highly restricted in what products and price segments it's allowed into. If Intel opened the flood gates on Avoton they would risk cannibalizing their own server profits.
I think AMD themselves admitted that the Opteron X1100 was for testing the waters, with K12 being the first proper solution, but that was delayed to get Zen out of the door. I imagine that both products will be on sale concurrently at some point, but even with AMD's desktop-first approach for Zen, it will probably still come to the server market before K12 (both are due 2017).
still, quite strange, no? AMD is in the server business for years. I'm not talking about their ARM solution only, but their other solutions seem to be less interesting..
" It is the first time the Xeon D gets beaten by an ARM v8 SoC..."
The Apple A9X in the 12" iPad Pro delivers 40GB/s on Stream... (That's the Stream built into Geekbench. Conceivably it's slightly different from what's being measured here, but it delivers around 25GB/s for standard desktop/laptop Intel CPUs, and for the A9 and the 9" iPad's A9X, so it seems in the same sort of ballpark.)
Fantastic article as always Johan. Thank you so much for your very informative articles. I can only imagine how much time and effort writing this article took. It is very much appreciated.
The first good showing by an ARMv8 server. Nearly 5 years later than expected, but they are getting there. This thing was still produced on 28 HKMG. Give it one more year, a jump to 14nm, and a more mature software ecosystem, and I think the Xeons might finally have some competition on their hands.
Even if the ThunderX is half the price of equivalent Xeon, I would still buy Intel Xeon instead. This isn't Smartphone market. In Server, The cost memory and Storage, Networking etc adds up. Not only does it uses a lot more power in Idle, the total TCO AND Pref / Watts still flavours Intel.
There is also the switching cost of Software involved. And those who say Single Core / Thread Performance dont matter have absolutely no idea what they are talking about.
As far as I can tell, Xeon-D offers a very decent value proposition for even the ARM SoC minded vendors. This will likely continue to be the case as we move to 10nm. I just dont see how ARM is going to get their 20% market share by 2020 as they described in their Shareholder meetings.
If you have to switch software on your severs because you switch architecture you are doing something wrong and are far too dependent on proprietary products. I'm being a bit facetious here but the only reason architecture should limit you is you are using Microsoft products or are in a highly specialized computing field. Linux should dominate your general servers.
Even if you are on Linux, still stack support is best on i386/amd64. Look at IBM how it throws a lot of money to get somewhere with POWER8. ARM can't do that, so it's more on vendors to do that and they are doing it a little bit more slowly. Anyway, even AArch64 will mature in LLVM/GCC tool chain, GNU libC, musl libC, linux kernel etc but it'll take some time...
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
82 Comments
Back to Article
Spunjji - Wednesday, June 15, 2016 - link
Well, this is certainly promising. Absent AMD, Intel need some healthy competition in this market - even if it is in something of a niche area.niva - Wednesday, June 15, 2016 - link
This is the area where profits are made, not "something of a niche area."Shadow7037932 - Wednesday, June 15, 2016 - link
Yeah, I mean getting some big customers like Facebook or Google would be rather profitable I'd imagine.JohanAnandtech - Thursday, June 16, 2016 - link
More than 30% of Intel's revenue, and the most profitable area for years, and for years to come...prisonerX - Wednesday, June 15, 2016 - link
This is the future. Single thread performance has reached a dead end and parallelism is the only way forward. Intel's legacy architecture is a millstone around its neck. ARM's open model and efficient implementation will deliver more cores and more performance as software adapts.The monopolists monopolise themselves into irrelevance yet again.
CajunArson - Wednesday, June 15, 2016 - link
" Intel's legacy architecture is a millstone around its neck."I wouldn't call those Xeon-D parts putting up excellent performance at lower prices and vastly lower power consumption levels to be any kind of "millstone".
"ARM's open model and efficient implementation "
What's "open" about these Cavium chips exactly? They can only run a few specialized Linux flavors that don't even have the full range of standard PC software available to them.
What is efficient about a brand-new ARM chip from 2016 losing at performance per watt to the 4.5 year old Sandy Bridge parts that you were insulting?
As for monopolies, ARM has monopolized the mobile market and brought us "open" ecosystems like the iPhone walled-garden and Android devices that literally never receive security updates. I'd take a plain x86 PC that I can slap Linux on any day of the week over the true monopoly that ARM has over locked-down smartphones.
shelbystripes - Wednesday, June 15, 2016 - link
You're right to criticize the "millstone" comment, Intel has done quite well achieving both high performance and high performance-per-watt in their server designs.But your comment about a "true monopoly" in the "locked-down smartphone" market is ridiculous. The openness (or lack thereof) that you're complaining about has nothing to do with the CPU architecture at all. An x86 smartphone or tablet can just as easily be locked down, and they are. I own a Dell Venue 8 7000, which is an Android tablet with an Intel Atom SoC inside. It's a great tablet with great hardware. But it's got a bunch of uninstallable crapware installed, Dell abandoned it after 5.1 (it's ridiculous that a tablet with a quad-core 2GHz SoC and 2GB RAM will never see Marshmallow), and the locked smartphone-esque bootloader means I can't repurpose it to a Linux distro even if one existed that supported all the hardware inside this thing.
On the flipside, the most popular open-source learning/development solution out there right now is the ARM-based Raspberry Pi. There are a number of Linux distros available for it, and everything is OSS, even the GPU driver.
TheLightbringer - Thursday, June 16, 2016 - link
You haven't done your homework.Some mobile devices were coming with Intel. But like Microsoft it entered the market too late, without offering any real value. The phrase "Too little, too late" fit them both.
ARM didn't do a monopoly. They just simply saw an opportunity and embrace it. In the early IBM clone days Intel licensed their architecture to allow competition and broad arrange of products. After the market was won, they went greedy, didn't licensed the architecture anymore and cut a lot of players out, leaving a need for a chip licensing scheme. And that's where ARM got in.
Google develops Android OS, but is up to phone vendors and carriers to deploy them. And they don't want to for economic reasons. They prefer to sell you a new phone for $$$.
Intel and MS got in the mobile/car market exactly what they deserve, nothing else.
junky77 - Friday, June 17, 2016 - link
they all greedy. Some just play it smartly or have more luck in decision makingBut, yea, when you read about the way IBM behaved when things were fresh - it's quite amazing. They had much of the market and could do a lot of stuff, but they simply had a very narrow mind set
soaringrocks - Wednesday, June 15, 2016 - link
You make it sound like it's mostly a SW problem, I think it's more complex than that. Actual performance is very dependent on the types of workload and some tasks fit Intel CPUs nicely and the performance per watt for ARM is lacking despite the hype of that architecture being uniquely qualified for low-power. It will be fun to watch how the battle evolves though.vivs26 - Wednesday, June 15, 2016 - link
Not necessarily - (read Amdahl's law of diminishing returns). The performance actually depends on the workload. Having a million cores guarantees nothing in terms of performance unless the workload is parallelizable which in the real world is not as much as we think it could be. I'm curious to see how xeon merged with altera programmable fabric performs than ARM on a server.maxxbot - Wednesday, June 22, 2016 - link
Technically true but every generation that millstone gets a little smaller, the die area and power needed to translate x86 into uops isn't huge and reduces every generation.jardows2 - Wednesday, June 15, 2016 - link
Interesting. Faster in a few workloads where heavy use of multi-thread is important, but significantly slower in more single thread workloads. For server use, you don't always want parallelized tasks. The results are pretty much across the board for all the processors tested: If the ThunderX was slower, it was slower than all the Intel chips. If it were faster, it was faster than all but the highest end Intel Chips. With the price only being slightly lower than the cheapest Intel chip being sold, I don't think this is going to be a Xeon competitor at all, but will take a few niche applications where it can do better.With no significant energy savings, we should be looking forward to the ThunderX2 to see if it will bring this into a better alternative.
ddriver - Wednesday, June 15, 2016 - link
There is hardly a server workload where you don't get better throughput by throwing more cores and servers at it. Servers are NOT about parallelized task, but about concurrent tasks. That's why while desktops are still stuck at 8 cores, server chips come with 20 and more... Server workloads are usually very simple, it is just that there is a lot of them. They are so simple and take so little time it literally makes no sense parallelizing them.jardows2 - Wednesday, June 15, 2016 - link
In the scenario you described, the single-thread performance takes on even more importance, thus highlighting the advantage the Xeon's currently have in most server configurations.niva - Wednesday, June 15, 2016 - link
Not if the Xeon doesn't have enough cores to actually process 40+ singlethreaded tasks con-currently.hechacker1 - Wednesday, June 15, 2016 - link
But kernels and VMWare know how to schedule multiple threads on 1 core if it's not being fully utilized. Single threaded IPC can make up for not having as many cores. See the iPhone SoCs for another example.ddriver - Wednesday, June 15, 2016 - link
Not if you have thousands of concurrent workloads and only like 8 cores. As fast as each core might be, the overhead from workload context switching will eat it up.willis936 - Thursday, June 16, 2016 - link
Yeah if each task is not significantly longer than a context switch. Context switches are very fast, especially with processors with many sets of SMT registers per core.ddriver - Thursday, June 16, 2016 - link
If what you suggest is correct, then intel would not be investing chip TDP in more cores but higher clocks and better single threaded performance. Clearly this is not the case, as they are pushing 20 cores at the fairly modest 2.4 Ghz.willis936 - Thursday, June 16, 2016 - link
Are you sure that the there are more cores at lower clocks to keep voltage lower? Power consumption is proportional to v^2*f.ddriver - Friday, June 17, 2016 - link
Say what? Go back, read my previous post again, and if you are going to respond, make sure it is legible.willis936 - Friday, June 17, 2016 - link
Alright well if you don't understand why many slower cores are more power efficient even if there was a 0 cycle penalty on context switching then you aren't worth having this discussion with.blaktron - Wednesday, June 15, 2016 - link
48 cores of server processing on 16mb of l2 and 4 channels of RAM? What is this thing designed for. Will be like running single channel celerons as server processors, so decent hypervisor hosts are out, and so is any database work more complex than dynamic web pages.Haravikk - Wednesday, June 15, 2016 - link
Facebook is specifically mentioned as being interested in this, so dynamic web-pages is definitely a valid use-case here. HHVM for example is pretty light on memory usage (so is PHP7 now), especially in high demand cases where you're really only running a single set of scripts, probably cached in a compiled form, plus both scale really well across as many cores as you can throw at them.Things like nginx and MariaDB will be the same, so they're absolutely intended use-cases for this kind of chip, and I think it should be very good at it.
blaktron - Wednesday, June 15, 2016 - link
With no L3 and slow RAM access I'm not sure where you think the scrips will cache. Assuming you ran them on bare metal (horrifying waste of compute) there would be enough, but if you had docker instances or quick spin vms doing your work (as 99% of web servers are) then each instance will only get the tiniest slice of cache to work with. It would be like running your servers, as I said, on a bank of celerons. Except celerons have L3 and don't carry 12 cores per memory channel.spaceship9876 - Wednesday, June 15, 2016 - link
Hopefully someone will release a server chip using 64 cortex A73 cpu cores, i'm pretty sure the cortex a73 will be more power efficient than xeon d. Xeon d beats cortex a57 in power efficiency but i'm pretty sure than cortex a72 will be similar and cortex a73 will beat it.Flunk - Wednesday, June 15, 2016 - link
ARM with ambition?I've heard that before, nothing came of it.
CajunArson - Wednesday, June 15, 2016 - link
Interesting article. This does appear to be the first semi-credible part from an ARM server vendor.Having said that, the energy efficiency table at the end should put to rest any misconceived notions that ARM is somehow magically energy efficient while X86 isn't.
Considering that Xeon E5-2690 v3 is a 4.5 year old Sandy Bridge part made on a 32 nm process and it still has better performance-per-watt than the best ARM server parts available in 2016, it's pretty obvious that Intel has done an excellent job with power efficiency.
kgardas - Wednesday, June 15, 2016 - link
2 CajunArson: (1) you can't compare energy efficiency of CPUs made on different nodes. 28nm versus 14nm? This is apple to oranges. (2) Xeon E5-2690 *v3* is Haswell and not Sandy Bridge and it's not 4.5 years definitely.TheinsanegamerN - Thursday, June 16, 2016 - link
While you are right on the actual age of the chip, if you dont compare efficiency on different nodes, how on earth would you know if you made any progress?Unless you are suggesting that one should never compare one generation of chips to another, which is simply ludicrous. Where is this "you cane compare two different nodes" mindset coming from? I've seen it in the GPU forums as well, and it makes no sense.
shelbystripes - Wednesday, June 15, 2016 - link
The E5-2600 v3 is a Haswell part, meaning it's Intel's second ("tock") core design on 22nm. So not only is this a smaller process, it's a second-gen optimization on a smaller process.For a first-gen 28nm part that includes power-hungry features like multiple 10GbE, these are some very promising initial results. A 14nm die shrink should create some real improvements off the bat in terms of performance per watt, and further optimizations from there should make this thing really shine.
Given that Intel hasn't cracked 10nm at all yet, and it'll take a while for 10nm Xeons to show up once they do, Cavium has room to play catch-up. I mean, hell, they're keeping up/surpassing Xeon D in some use cases NOW, and that's a 14nm part. What Cavium needs most is power optimization at this point, and I'm sure they'll get there in time.
Michael Bay - Thursday, June 16, 2016 - link
Good to know Intel is keeping you up to date with what`s happening in their uv labs.rahvin - Thursday, June 16, 2016 - link
Last I saw Intel is already running their test fabs at 10nm. Once they perfect it in the test fabs it only takes them about 6 months to roll it into a full scale fab. Maybe you an point to this source that indicates Intel has failed at 10nm.kgardas - Wednesday, June 15, 2016 - link
Nice article, but really looking to see testing of ThunderX2 and X-Gene 3. Will be interesting as Intel seems to be kind of struggling with single-threaded performance recently...Drazick - Wednesday, June 15, 2016 - link
Just a question.You emphasized the performance are x3 instead of x5 but I bet Intel used Intel ICC for those tests.
Intel works hard on their Compilers and anyone who wants to extract the best of Intel CPU uses them as well.
Since CPU means Compilers, if Intel has advantage on that department you should show that as well.
Namely give us some results using Intel ICC.
Thank You.
UrQuan3 - Wednesday, June 15, 2016 - link
Of course, if Anandtech uses ICC, they should use better flags in gcc for ARM/ThunderX as well (core specific flags, NEON, etc). Both ICC and targeted flags give improvements. Often large ones. This was a generic test.JohanAnandtech - Thursday, June 16, 2016 - link
For integer workloads, ICC is not that much faster than gcc (See Andreas Stiller's work). And there is the fact that ICC requires licensing and other time consuming stuff. From a linux developer/administrator perspective, it is much easier just to use gcc, you simply install it from repositories, no licensing headaches and very decent performance (about 90% of icc). So tha vast majority of the **NON HPC ** software is compiled with gcc. Our added value is that we show how the processors compare with the most popular compiler on linux. That is the big difference between benchmarking to put a CPU in the best light and benchmarking to show what most people will probably experience.Until Intel makes ICC part of the typical linux ecosystem, it is not an advantage at all in most non-HPC software.
patrickjp93 - Friday, June 17, 2016 - link
His work is woefully incomplete, lacking any analysis on vectorized integer workloads, which Intel destroys GCC in to the tune of a 40% lead.phoenix_rizzen - Wednesday, June 15, 2016 - link
"The one disadvantage of all Supermicro boards remains their Java-based remote management system. It is a hassle to get it working securely (Java security is a user unfriendly mess), and it lacks some features like booting into the BIOS configuration system, which saves time."It's IPMI, you can use any IPMI client to connect to it. Once you give it an IP and password in the BIOS, you can connect to it using your IPMI client of choice. There's also a web interface that provides most of the features of their Java client (I think that uses Java as well, but just for the console).
For our SuperMicro servers, I just use ipmitool from my Linux station and have full access to the console over the network, including booting it into the BIOS, managing the power states, and even connecting to the serial console over the network.
Not sure why you'd consider a full IPMI 2.0 implementation a downside just because the default client sucks.
JohanAnandtech - Wednesday, June 15, 2016 - link
Good suggestion. I have been using an ipmi client to manage several other servers, like the IBM servers. However, such a GUI client is still a bit more userfriendly, ipmi commands can get complicated if you don't use them regularly. The thing is that HP and Intel's BMC GUI are a lot easier to use and more reliable.fanofanand - Wednesday, June 15, 2016 - link
I think you may have an inaccurate figure of 141 at idle (in the graph) for the Thunder. "makes us suspect that the chip is consuming between 40 and 50W at idle, as measured at the wall"JohanAnandtech - Wednesday, June 15, 2016 - link
If you look at the Column "peak vs idle", you see 82W. At peak, we assume that a 120W TDP chip will probably need about 130W. 130W - 82W (both measured at the wall) = 50W for the SoC alone at idle measured at the wall, so anywhere between 40-50W in reality. My Calculation is a "guestimate", but it is clear that the Cavium chip needs much more in idle than the Intel chips.(10-15W) .djayjp - Wednesday, June 15, 2016 - link
Many spelling/grammar issues here. It impacts readability. Please read before posting.djayjp - Wednesday, June 15, 2016 - link
That is to say in the article.mariush - Wednesday, June 15, 2016 - link
These guys are already working on ThunderX2 (54 cores, 3 Ghz , 14nm , ARMv8) and they already have functional chips : https://www.youtube.com/watch?v=ei9uVskwPNEMeteor2 - Thursday, June 16, 2016 - link
It's always jam tomorrow, isn't it? Intel is working on new chips too, you know.beginner99 - Wednesday, June 15, 2016 - link
It loses very clearly in performance/watt to Xeon-D. In this segment the lower price doesn't matter in that case and the fact that it has a process disadvantage doesn't matter either. What counts is the end result. And I doubt it would cost $800 if made on 14/16nm. I mean why would anyone buying this take the risk? Safer bet to go with Intel also due to more flexible use (single and multi threaded). The latency issue is mentioned but downplayed.blaktron - Wednesday, June 15, 2016 - link
So downplayed. Anandtech desperately wants ARM servers, but its a solution looking for a problem. Big web front ends running on bare metal are such a small percentage of the server market that developing for it seems stupid. Xeon-D was already in development for SANs, they just repurposed it for docker and nginx.Senti - Wednesday, June 15, 2016 - link
Very nice article. I especially liked the emphasis on relations of test numbers and real world workloads and what was problematic during the testing.It would be great to see the same style desktop CPU review (Zen?) form you instead of mix of reprinted marketing hype with silly benchmark numbers dump that plagues this site for quite some time now.
Some annoying typos here and there, like "It is clear that the ThunderX is a match for high frequency trading", but nothing really bad.
Daniel Egger - Wednesday, June 15, 2016 - link
I could hardly disagree more about the remote management of SuperMicro vs. HP. Remote management of HP is *the horror*, I've never seen worse and I've seen a lot. It's clunky, it requires a license to be useful (others do to but SuperMicro does not have such nonsense), the BCM tends to crash a lot (which is very annoying for a remote management solution), boot is even slower than all other systems I know due to the way they integrate the BIOS and remote management on the system and it also uses Java unless you have Windows machines around to use the .NET version.For the remote management alone I would chose SuperMicro over most other vendors any day.
JohanAnandtech - Thursday, June 16, 2016 - link
I found the .Net client of HP much less sluggish, and I have seen no crashing at all. I guess there is no optimal remote management client, but I really like the "boot into firmware" option that Intel implemented.rahvin - Thursday, June 16, 2016 - link
Not only that but Supermicro actually releases updates for their BCM's. I had the same shocked reaction to the HP claim. Started to wonder if I was the only one that thought supermicro was light years ahead in usability.I should note that Supermicro's awful Java tool works on Linux as well as windows. Though it refuses to run if your Java isn't the newest version available.
pencea - Wednesday, June 15, 2016 - link
All these articles and yet still no review for the GTX 1080, while other major sites have already posted their reviews of both 1070 & 1080. Guru3D already has 2 custom 1080 and a custom 1070 review up.Ryan Smith - Wednesday, June 15, 2016 - link
It'll be done when it's done.pencea - Wednesday, June 15, 2016 - link
Unacceptably late for something that should've been posted weeks ago.Meteor2 - Thursday, June 16, 2016 - link
Will anyone read it though? Your ad impressions are going to suffer.Ryan Smith - Thursday, June 16, 2016 - link
Maybe. Maybe not. But it's my own fault regardless. All I can do is get it done as soon as I reasonably can, and hope it's something you guys find useful.name99 - Thursday, June 16, 2016 - link
Give it a freaking rest. No-one is impressed by your constant whining about this.pencea - Thursday, June 16, 2016 - link
Not looking to impress anyone. As a long time viewer of this site, I'm simply disappointed that a reputational site like this is constantly late for GPU reviews.silverblue - Thursday, June 16, 2016 - link
I'm not sure how this is relevant. Johan doesn't review graphics cards, other people at Anandtech do. I bet Guru3D has a much bigger team for that, and I imagine that they have a much narrower scope (i.e. no server stuff).I don't think I've looked at a review recently that hasn't had the comments section polluted with "where is the review for x".
UrQuan3 - Wednesday, June 15, 2016 - link
Intel allows their Xeons to sometimes pull double their TDP? No wonder our new machines trip breakers long before I thought they would. I need to test instead of assuming accurate documentation.I can see why you chose C-Ray, I'm just sorry a more general ray tracer was not chosen. Still, not it's intended market, though I am suddenly very interested. Ray-tracing and video encoding are my top two tasks.
Meteor2 - Thursday, June 16, 2016 - link
The 'T' in 'TDP' is for thermal. It's a measure of the maximum waste heat which needs to be removed over a certain period of time.UrQuan3 - Wednesday, June 22, 2016 - link
Yes, it stands for thermal, but power doesn't consumed doesn't just disappear. Convert it to light, convert it to motion, convert it to heat, etc. In this case there is a small amount of motion (electrons) and the rest has to be heat. I expect much higher instantaneous pulls, but this was sustained power. Anyway, I will track down the AVX documentation mentioned below.I saw the h264ref. I'll be curious about x264 (handbrake) as the authors seem interested in ARM in the last few years. Unsurprisingly, it is far less optimized than x64. I benchmarked handbrake on the Pi2, Pandaboard, and CI-20 last year, just to see what it would do.
JohanAnandtech - Thursday, June 16, 2016 - link
C-Ray was just a place holder to measure FPU energy consumption. I look into bringing a more potent raytracer into our benchmark suite (povray)Video encoding was in the review though, somewhat (h264ref).
patrickjp93 - Friday, June 17, 2016 - link
ARM chips with vector extensions allow it as well. Intel provides separate documentation for AVX-workload TDPs.Antony Newman - Wednesday, June 15, 2016 - link
Fascinating article.Why would Cavium not try and use 54 x A73s in their next chip?
If ARM are not in the business of making Silicon, and ARM think the '1.2W Ares' will help them break into the Server market ... Then Why do we think ARM isn't working with the likes of Cavium to get a Server SoC that rocks the Intel boat?
Typos From memory : send -> sent. Through-> thought. There were a few others.
AJ
name99 - Thursday, June 16, 2016 - link
How do you know ARM aren't working with such a vendor?ARM has always said that they expect ARM server CPUs to only be marginally competitive (for very limited situations) in 2017, and to only be really competitive in 2020.
That suggests, among other things, that if they are working with partners, they have a target launch between those two dates, and they regard all launches before 2017 as essentially nice for PR and fr building up the ecosystem, but essentially irrelevant for commercial purposes.
rahvin - Thursday, June 16, 2016 - link
The problem as pointed out early in this article is that ARM keeps targeting Intel's current products, not the ones that will be out when they get their products out. We've had almost a dozen vendors get to the point of releasing the chip and drop it because it is simply not competitive with Intel. Most of these arm products were under taken when Intel was targeting performance without regard to performance/watt. Now that intel targets the later metric arm server chips haven't been competitive with them.Fact is Intel could decimate and totally take over all the markets arm chips occupy, but to do it they'd have to cannibalize their existing high profit sales. This is why they keep canceling Atom chips, the chips turned out so good they were worried they'd cannibalize much more expensive products. This is the reason Avoton is highly restricted in what products and price segments it's allowed into. If Intel opened the flood gates on Avoton they would risk cannibalizing their own server profits.
junky77 - Wednesday, June 15, 2016 - link
So, they did what AMD couldn't for years? I'm trying to figure it out.. their offering seems to be a lot more interesting than AMD's stuff currentlysilverblue - Thursday, June 16, 2016 - link
I think AMD themselves admitted that the Opteron X1100 was for testing the waters, with K12 being the first proper solution, but that was delayed to get Zen out of the door. I imagine that both products will be on sale concurrently at some point, but even with AMD's desktop-first approach for Zen, it will probably still come to the server market before K12 (both are due 2017).junky77 - Thursday, June 16, 2016 - link
still, quite strange, no? AMD is in the server business for years. I'm not talking about their ARM solution only, but their other solutions seem to be less interesting..silverblue - Thursday, June 16, 2016 - link
I am looking forward to both Zen and K12; there's very little chance that AMD will fail with both.name99 - Wednesday, June 15, 2016 - link
" It is the first time the Xeon D gets beaten by an ARM v8 SoC..."The Apple A9X in the 12" iPad Pro delivers 40GB/s on Stream...
(That's the Stream built into Geekbench. Conceivably it's slightly different from what's being measured here, but it delivers around 25GB/s for standard desktop/laptop Intel CPUs, and for the A9 and the 9" iPad's A9X, so it seems in the same sort of ballpark.)
aryonoco - Thursday, June 16, 2016 - link
Fantastic article as always Johan. Thank you so much for your very informative articles. I can only imagine how much time and effort writing this article took. It is very much appreciated.The first good showing by an ARMv8 server. Nearly 5 years later than expected, but they are getting there. This thing was still produced on 28 HKMG. Give it one more year, a jump to 14nm, and a more mature software ecosystem, and I think the Xeons might finally have some competition on their hands.
JohanAnandtech - Thursday, June 16, 2016 - link
Thank you, and indeed it was probably the most time consuming review ... since Calxeda. :-)Yes, there is potential.
iwod - Thursday, June 16, 2016 - link
Even if the ThunderX is half the price of equivalent Xeon, I would still buy Intel Xeon instead. This isn't Smartphone market. In Server, The cost memory and Storage, Networking etc adds up. Not only does it uses a lot more power in Idle, the total TCO AND Pref / Watts still flavours Intel.There is also the switching cost of Software involved.
And those who say Single Core / Thread Performance dont matter have absolutely no idea what they are talking about.
As far as I can tell, Xeon-D offers a very decent value proposition for even the ARM SoC minded vendors. This will likely continue to be the case as we move to 10nm. I just dont see how ARM is going to get their 20% market share by 2020 as they described in their Shareholder meetings.
rahvin - Thursday, June 16, 2016 - link
If you have to switch software on your severs because you switch architecture you are doing something wrong and are far too dependent on proprietary products. I'm being a bit facetious here but the only reason architecture should limit you is you are using Microsoft products or are in a highly specialized computing field. Linux should dominate your general servers.kgardas - Friday, June 17, 2016 - link
Even if you are on Linux, still stack support is best on i386/amd64. Look at IBM how it throws a lot of money to get somewhere with POWER8. ARM can't do that, so it's more on vendors to do that and they are doing it a little bit more slowly. Anyway, even AArch64 will mature in LLVM/GCC tool chain, GNU libC, musl libC, linux kernel etc but it'll take some time...tuxRoller - Thursday, June 16, 2016 - link
Aarch64 has very limited conditional execution support.http://infocenter.arm.com/help/index.jsp?topic=/co...
BlueBlazer - Friday, June 17, 2016 - link
Cavium is quite aware of their ThunderX single thread weakness, and directly from Cavium themselves https://www.youtube.com/watch?v=ei9uVskwPNE thanks to ARMdevices.net.TiffanyTown - Thursday, July 28, 2016 - link
hi, The JDK version you used is OpenJDK 1.8.0_91 . Did you build it yourself?