The main problem with the non-Intel systems is not only that they use older processes compared to Intel, but that they use older processes even compared to the rest of the non-Intel chip industry. AMD is typically always behind 1 process node among non-Intel chip makers. If they'd at least use the cutting edge processes as they become available from non-Intel processes, maybe they'd stand a chance, especially now that the gap in process technologies is shrinking.
AMD simply isn't as bad as people continually make them out to be. Yes, they're "behind" Intel but it's all in the approach. We are talking about two engineering houses that share nothing in common but a cross licensing agreement. AMD has very competitive CPU's to Intel's i5's for nearly half the price, but yes, they use more power (at times 1/3 more.)
But facts are facts: AMD is the second high-tech CPU manufacture in the world. Not Qualcomm, not Samsung. It's pretty obvious AMD engineering talent spreads more diversity than anyone other than Intel, and potentially superior to Intel on GPU design (although this has obviously been shifting over the years as Intel hires more "GPU talent.")
AMD in servers is a hard pill to swallow though. If purchasing based on price alone, it can be a compelling alternative, but for rack space or low-energy computing?
AMD doesn't even make it in top 10 semiconductor companies in sales. Qualcomm is three, Samsung semicondutors six and Intel almost ten times the size of AMD.
Outside of the gaming consoles they are being completely overrun by competition.
I'm sorry, at one point I was an AMD fanboy, back when they actually deserved it based on their products, but you just sound like an apologist. Facts are the facts, FX processors aren't competitive with i5's in performance or power or performance/$ because they get smacked so hard they can't be cheap enough to make up for it. Their CPU designs are woefully out of date, their APU's are bandwidth starved and use way too much power to be useful in the one place they'd be great (mobile), and their lagging process tech means theres not much better coming on the horizon. I don't want to see them go, but at the rate ARM is eating up general computing share, it won't be long before AMD becomes completely irrelevant. It will be Intel vs. ARM and AMD will be an afterthought.
Qualcomm is used in pretty much used in most cell phones in the US to the point you'd think Qualcomm is the only SoC manufacturer. I'm pretty sure that's also how it looks in most of the other markets as Korea. Plus even if their SoCs aren't being used, they're modems are heavily used.
If anything, Qualcomm is bigger than AMD. Or rather, Qualcomm is the Intel of the SoC market.
Bear in mind that the Atom parts were commercially available in 2013, so they are by no means brand-new technology and the 14nm Atom upgrades will definitely help power efficiency even if raw performance doesn't jump a whole lot.
Anandtech is also a bit behind the curve because Intel is about to release Xeon-D (8 Broadwell cores and integrated I/O in a 45 watt TDP, or lower), which is designed for exactly this type of workload and is going to massively improve performance in the low-power envelope sphere:
Maybe it would be good to mention the X-Gene is made on a 40nm process at the start of the article. I read the article and think for myself that the X-Gene is crap and in the end you get the explanation. It's on 40 nm vs Atoms on Intel 22 nm. It's a huge difference and currently the article is a bit misleading eg. shining a bad light on X-Gene and ARM. (And I say this even though I always was a proponent of Intel Big cores in almost all server applications).
If APM had a newer part to test then we would have tested it. XG2 is simply not out yet. So the fact that APM has their flagship SoC on an older process is not misleading... Its the facts. The currently available Intel parts have a process advantage.
Mentioning it at the start would be good from a technical disclosure standpoint, but I'm not sure for the purposes of this article it truly matters. The article is comparing what is currently available now from APM and Intel. Reality is Intel will likely have a significant process advantage for the foreseeable future, and if you wanted to see a like for like comparison on a process basis, then you'll probably need to wait 2-3 years for X-Gene to get on 22nm, meanwhile Intel will have moved on to 10nm.
The 40nm process is only really relevant when it comes to the power-consumption comparisons. A 28nm.. or 20nm or 16nm... part with the same cores at the same clockspeeds will register the exact same level of performance. The only difference will be that the smaller lithographic processes should provide that level of performance in a smaller power envelope.
well, with so much time invested in an article, I always hope people will read the pages between page 1 and 18 too :-p. It is mentioned in the overview of the SoCs on page 5 and quite a few times at other pages too.
I use the Xeon E3-1230v3 in desktop applications all the time. It's basically an i7 for the price of an i5.
And a lot of IT dept dump them on eBay cheap when they upgrade their servers. They can be had well under $200 lightly used. The 80w TDP could theoretically have some drawbacks for boost time, but the real-world performance according to passmark elongated tests doesn't seem to show any difference between it's boost potential and that of an 88w i7-k
In both your compilers, you need to specify the -march=native so the the compiler can optimize for the architecture you are running on, -o3 is not enough. This enables the compiler to use cpu specific commands.
Update. march=native does not work. I have tried -march=armv8-a but does not do much (it is probably the default). O3 makes the biggest difference. Omit it and you get 5.7 GB/s. With -O3, I am at 18 GB/s and more (stream m400)
Apologies. For AArch64 the only is "armv8-a", for intel, -march=native sets it to use the one for your CPU. https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/AArch... https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-... From version 4.9.x and above of GCC, you can really start to add tuning for the CPU. https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/AArch... -mtune=name Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: ‘generic’, ‘cortex-a53’, ‘cortex-a57’. Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. The only permissible value is ‘cortex-a57.cortex-a53’.
Where none of -mtune=, -mcpu= or -march= are specified, the code will be tuned to perform well across a range of target processors.
Also support for the XGene1 as a compilation target is only from GCC5. https://gcc.gnu.org/gcc-5/changes.html Support has been added for the following processors (GCC identifiers in parentheses): ARM Cortex-A72 (cortex-a72) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (cortex-a72.cortex-a53), Cavium ThunderX (thunderx), Applied Micro X-Gene 1 (xgene1). The GCC identifiers can be used as arguments to the -mcpu or -mtune options, for example: -mcpu=xgene1
Don't count them out yet. I really wish that intel didn't abandon ARM for the Atom, I bet they could come out with a sweet armv8 core if they had to, and on their process it would be sweet.
That AMD Opteron A1100 looking more like abandonware as more time passes on, and that was like 8 months ago. Until now not a single real world deployment nor was used in any of AMD's own SeaMicro servers. Currently available as development kit with a rather steep price tag.
You REALLY should be using GCC 5. that includes many improvements for the armv8 isa. I'd suggest grabbing a nightly of Fedora 22, but Ubuntu 15.04 may be using gcc5 as well.
Agreed, nobody doing anything on AArch64 should contemplate using GCC4.8. Even 4.9 is way out of date. GCC5.0 with latest GLIBC gives major speedups across the board.
"Way out of date?" We tried out 4.9.2, which has been released on October 30th 2014. That is about 4 months old. https://www.gnu.org/software/gcc/releases.html. Latest version is 4.8.4, 5.0 has not even been released AFAIK.
GCC4.9 doesn't contain all the work in GCC5.0 (close to final release, but you can build trunk). As you hinted in the article, it is early days for AArch64 support, so there is a huge difference between a 4.9 and 5.0 compiler, so 5.0 is what you'd use for benchmarking.
You must realize that the situation in the ARM ecosystem is not as mature as on x86. the X-Gene runs on a specially patched kernel that has some decent support for ACPI, PCIe etc. If you do not use this kernel, you'll get in all kinds of hardware trouble. And afaik, gcc needs a certain version of the kernel.
No you can use any newer GCC and GLIBC with an older kernel - that's the whole point of compatibility.
Btw your results look wrong - X-Gene 1 scores much lower than Cortex-A15 on the single threaded LZMA tests (compare with results on http://www.7-cpu.com/). I'm wondering whether this is just due to using the wrong compiler/options, or running well below 2.4GHz somehow.
Hmm. the A57 scores 1500 at 1.9 GHz on compression. The X-Gene scores 1580 with Gcc 4.8 and 1670 with gcc 4.9. Our scores are on the low side, but it is not like they are impossibly low.
Ubuntu 14.04, 3.13 kernel and gcc 4.8.2 was and is the standard environment that people will get on the the m400. You can tweak a lot, but that is not what most professionals will do. Then we can also have to start testing with icc on Intel. I am not convinced that the overall picture will change that much with lots of tweaking
Yes, and I'd expect the 7420 will do a lot better than the 5433. But the real surprise to me is that X-Gene 1 doesn't even beat the A15 in Tegra K1 despite being wider, newer and running at a higher frequency - that's why the results look too low.
I wouldn't call upgrading to the latest compiler tweaking - for AArch64 that is kind of essential given it is early days and the rate of development is extremely high. If you tested 32-bit mode then I'd agree GCC 4.8 or 4.9 are fine.
This is all part of the problem: Requiring people to use cutting edge software with custom recompilation just to beat a freakin' Atom much less a real CPU?
You do realize that we could play the same game with all the Intel parts. Believe me, the people who constantly whine that Haswell isn't any faster than Sandy Bridge have never properly recompiled computationally intensive code to take advantage of AVX2 and FMA.
The fact that all those Intel servers were running software that was only compiled for a generic X86-64 target without requiring any special tweaking or exotic hacking is just another major advantage for Intel, not some "cheat".
And if we are going for cutting edge compiler, then why not ICC with Intel's nice libraries... (pretty sure even ancient atom would suddenly look not that bad)
To make a fair comparison you'd either need to use the exact same compiler and options or go all out and allow people to write hand optimized assembler for the kernels.
You can't seriously claim that recompiling an existing program with a different (well known and mature) compiler is equal to hand optimize things in assembler. Hint, one of the options is ridiculous expensive, one is trivial.
Thank you Johan. Very very informative article. This is one of the least reported areas of IT in general, and one that I think is poised for significant uptake in the next 5 years or so.
Very much appreciate your efforts into putting this together.
Thanks! It is been a long journey to get all the necessary tests done on different pieces of hardware and it is definitely not complete, but at least we were able to quantify a lot of paper specs. (25 W TDP of Xeon E3, 20W Atom, X-Gene performance etc.)
SeaMicro focused on density, capacity, and bandwidth.
How did you come to that statement? Have you ever benchmark (or even play with) any SeaMicro server? What capacity or bandwidth are you referring to? Are you aware of their plan down the road? Did you read AMD's Q4 earning report?
BTW, AMD doesn't call their server as micro-server anymore. They use the term dense server.
Johan, I would also like to congratulate you on a well written and thorough examination of subject matter that is not widely evaluated.
That being said, I do have some questions concerning the performance/watt calculations. Mainly, I'm concerned as to why you are adding the idle power of the CPUs in order to obtain the "Power SoC" value. The Power Delta should take into account the difference between the load power and the idle power and therefore you should end up with the power consumed by the CPU in isolation. I can see why you would add in the chipset power since some of the devices are SoCs and do no require a chipset and some are not. However, I do not understand the methodology in adding the idle power back into the Delta value. It seems that you are adding the load power of the CPU to the idle power of the CPU and that is partially why you have the conclusion that they are exceeding their TDPs (not to mention the fact that the chipset should have its own TDP separate from the CPU).
Also, if one were to get nit picky on the power measurements, it is unclear if the load power measurement is peak, average, or both. I would assume that the power consumed by the CPUs may not be constant since you state that "the website load is a very bumpy curve with very short peaks of high CPU load and lots of lows." If possible, it may be more beneficial to measure the energy consumed over the duration of the test.
Thanks for the encouragement. About your concerns about the perf/watt calculations. Power delta = average power (high web load measured at 95% percentile = 1 s, an average of about 2 minutes) - idle power. Since idle power = total idle of node, it contains also the idle power of the SoC. So you must add it to get the power of the SoC. If you still have doubts, feel free to mail me.
The approach looks absolutely sound to me. The idle power will be drawn in any case, so it makes sense to add it in the calculation. Perhaps it would also be interesting to compare the power consumed by the differents systems at the same load levels, such as 100 req/s, 200 req/s, ... (clearly, some higher loads will not be achievable by all of them).
Johan, thanks a lot for this excellent, very informative article! I can imagine how much work has gone into it.
If these had 10gbit - instead of gbit - NICs, these things could do some interesting stuff with virtual SANs. I'd feel hesitant shuttling storage data over my primary network connection without some additional speed, though.
Looking at that moonshot machine, for instance: 45 x 480 SSD's is a decent sized little SAN in a box if you could share most of that storage amongst the whole moonshot cluster.
Anyway, with all the stuff happening in the virtual SAN space, I'm sure someone is working on that.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
47 Comments
Back to Article
gdansk - Monday, March 9, 2015 - link
xgene is not looking so great. Even if it is 50% more efficient as they promise they'll still be behind Atom.Samus - Monday, March 9, 2015 - link
HP Moonshot chassis are still *drool*Krysto - Monday, March 9, 2015 - link
The main problem with the non-Intel systems is not only that they use older processes compared to Intel, but that they use older processes even compared to the rest of the non-Intel chip industry. AMD is typically always behind 1 process node among non-Intel chip makers. If they'd at least use the cutting edge processes as they become available from non-Intel processes, maybe they'd stand a chance, especially now that the gap in process technologies is shrinking.Samus - Monday, March 9, 2015 - link
AMD simply isn't as bad as people continually make them out to be. Yes, they're "behind" Intel but it's all in the approach. We are talking about two engineering houses that share nothing in common but a cross licensing agreement. AMD has very competitive CPU's to Intel's i5's for nearly half the price, but yes, they use more power (at times 1/3 more.)But facts are facts: AMD is the second high-tech CPU manufacture in the world. Not Qualcomm, not Samsung. It's pretty obvious AMD engineering talent spreads more diversity than anyone other than Intel, and potentially superior to Intel on GPU design (although this has obviously been shifting over the years as Intel hires more "GPU talent.")
AMD in servers is a hard pill to swallow though. If purchasing based on price alone, it can be a compelling alternative, but for rack space or low-energy computing?
Taneli - Tuesday, March 10, 2015 - link
AMD doesn't even make it in top 10 semiconductor companies in sales. Qualcomm is three, Samsung semicondutors six and Intel almost ten times the size of AMD.Outside of the gaming consoles they are being completely overrun by competition.
owan - Tuesday, March 10, 2015 - link
I'm sorry, at one point I was an AMD fanboy, back when they actually deserved it based on their products, but you just sound like an apologist. Facts are the facts, FX processors aren't competitive with i5's in performance or power or performance/$ because they get smacked so hard they can't be cheap enough to make up for it. Their CPU designs are woefully out of date, their APU's are bandwidth starved and use way too much power to be useful in the one place they'd be great (mobile), and their lagging process tech means theres not much better coming on the horizon. I don't want to see them go, but at the rate ARM is eating up general computing share, it won't be long before AMD becomes completely irrelevant. It will be Intel vs. ARM and AMD will be an afterthought.xenol - Wednesday, March 11, 2015 - link
Qualcomm is used in pretty much used in most cell phones in the US to the point you'd think Qualcomm is the only SoC manufacturer. I'm pretty sure that's also how it looks in most of the other markets as Korea. Plus even if their SoCs aren't being used, they're modems are heavily used.If anything, Qualcomm is bigger than AMD. Or rather, Qualcomm is the Intel of the SoC market.
xenol - Wednesday, March 11, 2015 - link
[Response to myself since I can't edit]Qualcomm's next major competitor is Apple. But that's about it.
Also I meant to say other markets except Korea.
CajunArson - Monday, March 9, 2015 - link
Bear in mind that the Atom parts were commercially available in 2013, so they are by no means brand-new technology and the 14nm Atom upgrades will definitely help power efficiency even if raw performance doesn't jump a whole lot.Anandtech is also a bit behind the curve because Intel is about to release Xeon-D (8 Broadwell cores and integrated I/O in a 45 watt TDP, or lower), which is designed for exactly this type of workload and is going to massively improve performance in the low-power envelope sphere:
http://techreport.com/review/27928/intel-xeon-d-br...
SarahKerrigan - Monday, March 9, 2015 - link
14nm server Atom isn't coming.http://www.eetimes.com/document.asp?doc_id=1325955
"Atom will become a consumer only SoC."
IBleedOrange - Monday, March 9, 2015 - link
EETimes is wrong.Google "Intel Denverton"
beginner99 - Monday, March 9, 2015 - link
Maybe it would be good to mention the X-Gene is made on a 40nm process at the start of the article. I read the article and think for myself that the X-Gene is crap and in the end you get the explanation. It's on 40 nm vs Atoms on Intel 22 nm. It's a huge difference and currently the article is a bit misleading eg. shining a bad light on X-Gene and ARM. (And I say this even though I always was a proponent of Intel Big cores in almost all server applications).Stephen Barrett - Monday, March 9, 2015 - link
If APM had a newer part to test then we would have tested it. XG2 is simply not out yet. So the fact that APM has their flagship SoC on an older process is not misleading... Its the facts. The currently available Intel parts have a process advantage.warreo - Monday, March 9, 2015 - link
Mentioning it at the start would be good from a technical disclosure standpoint, but I'm not sure for the purposes of this article it truly matters. The article is comparing what is currently available now from APM and Intel. Reality is Intel will likely have a significant process advantage for the foreseeable future, and if you wanted to see a like for like comparison on a process basis, then you'll probably need to wait 2-3 years for X-Gene to get on 22nm, meanwhile Intel will have moved on to 10nm.CajunArson - Monday, March 9, 2015 - link
The 40nm process is only really relevant when it comes to the power-consumption comparisons.A 28nm.. or 20nm or 16nm... part with the same cores at the same clockspeeds will register the exact same level of performance. The only difference will be that the smaller lithographic processes should provide that level of performance in a smaller power envelope.
JohanAnandtech - Monday, March 9, 2015 - link
well, with so much time invested in an article, I always hope people will read the pages between page 1 and 18 too :-p. It is mentioned in the overview of the SoCs on page 5 and quite a few times at other pages too.colinstu - Monday, March 9, 2015 - link
what server is on the bottom of the first page?JohanAnandtech - Monday, March 9, 2015 - link
A very old MSI server :-). Just to show people what webfarms used before the micro server era.Samus - Monday, March 9, 2015 - link
I use the Xeon E3-1230v3 in desktop applications all the time. It's basically an i7 for the price of an i5.And a lot of IT dept dump them on eBay cheap when they upgrade their servers. They can be had well under $200 lightly used. The 80w TDP could theoretically have some drawbacks for boost time, but the real-world performance according to passmark elongated tests doesn't seem to show any difference between it's boost potential and that of an 88w i7-k
Great CPU's.
Alone-in-the-net - Monday, March 9, 2015 - link
In both your compilers, you need to specify the -march=native so the the compiler can optimize for the architecture you are running on, -o3 is not enough. This enables the compiler to use cpu specific commands.JohanAnandtech - Tuesday, March 10, 2015 - link
Are you sure this is up to date? gcc tells me -march=native is not supported.JohanAnandtech - Tuesday, March 10, 2015 - link
Update. march=native does not work. I have tried -march=armv8-a but does not do much (it is probably the default). O3 makes the biggest difference. Omit it and you get 5.7 GB/s. With -O3, I am at 18 GB/s and more (stream m400)Alone-in-the-net - Tuesday, March 10, 2015 - link
Apologies. For AArch64 the only is "armv8-a", for intel, -march=native sets it to use the one for your CPU.https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/AArch...
https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-...
From version 4.9.x and above of GCC, you can really start to add tuning for the CPU.
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/AArch...
-mtune=name
Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: ‘generic’, ‘cortex-a53’, ‘cortex-a57’.
Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. The only permissible value is ‘cortex-a57.cortex-a53’.
Where none of -mtune=, -mcpu= or -march= are specified, the code will be tuned to perform well across a range of target processors.
Alone-in-the-net - Tuesday, March 10, 2015 - link
Also support for the XGene1 as a compilation target is only from GCC5.https://gcc.gnu.org/gcc-5/changes.html
Support has been added for the following processors (GCC identifiers in parentheses): ARM Cortex-A72 (cortex-a72) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (cortex-a72.cortex-a53), Cavium ThunderX (thunderx), Applied Micro X-Gene 1 (xgene1). The GCC identifiers can be used as arguments to the -mcpu or -mtune options, for example: -mcpu=xgene1
The_Assimilator - Monday, March 9, 2015 - link
So AMD, how's that bet on ARM you made looking now?extide - Monday, March 9, 2015 - link
Don't count them out yet. I really wish that intel didn't abandon ARM for the Atom, I bet they could come out with a sweet armv8 core if they had to, and on their process it would be sweet.BlueBlazer - Monday, March 9, 2015 - link
That AMD Opteron A1100 looking more like abandonware as more time passes on, and that was like 8 months ago. Until now not a single real world deployment nor was used in any of AMD's own SeaMicro servers. Currently available as development kit with a rather steep price tag.tuxRoller - Monday, March 9, 2015 - link
You REALLY should be using GCC 5. that includes many improvements for the armv8 isa. I'd suggest grabbing a nightly of Fedora 22, but Ubuntu 15.04 may be using gcc5 as well.Wilco1 - Monday, March 9, 2015 - link
Agreed, nobody doing anything on AArch64 should contemplate using GCC4.8. Even 4.9 is way out of date. GCC5.0 with latest GLIBC gives major speedups across the board.JohanAnandtech - Tuesday, March 10, 2015 - link
"Way out of date?" We tried out 4.9.2, which has been released on October 30th 2014. That is about 4 months old. https://www.gnu.org/software/gcc/releases.html. Latest version is 4.8.4, 5.0 has not even been released AFAIK.Wilco1 - Tuesday, March 10, 2015 - link
GCC4.9 doesn't contain all the work in GCC5.0 (close to final release, but you can build trunk). As you hinted in the article, it is early days for AArch64 support, so there is a huge difference between a 4.9 and 5.0 compiler, so 5.0 is what you'd use for benchmarking.JohanAnandtech - Tuesday, March 10, 2015 - link
You must realize that the situation in the ARM ecosystem is not as mature as on x86. the X-Gene runs on a specially patched kernel that has some decent support for ACPI, PCIe etc. If you do not use this kernel, you'll get in all kinds of hardware trouble. And afaik, gcc needs a certain version of the kernel.Wilco1 - Tuesday, March 10, 2015 - link
No you can use any newer GCC and GLIBC with an older kernel - that's the whole point of compatibility.Btw your results look wrong - X-Gene 1 scores much lower than Cortex-A15 on the single threaded LZMA tests (compare with results on http://www.7-cpu.com/). I'm wondering whether this is just due to using the wrong compiler/options, or running well below 2.4GHz somehow.
JohanAnandtech - Tuesday, March 10, 2015 - link
Hmm. the A57 scores 1500 at 1.9 GHz on compression. The X-Gene scores 1580 with Gcc 4.8 and 1670 with gcc 4.9. Our scores are on the low side, but it is not like they are impossibly low.Ubuntu 14.04, 3.13 kernel and gcc 4.8.2 was and is the standard environment that people will get on the the m400. You can tweak a lot, but that is not what most professionals will do. Then we can also have to start testing with icc on Intel. I am not convinced that the overall picture will change that much with lots of tweaking
Wilco1 - Tuesday, March 10, 2015 - link
Yes, and I'd expect the 7420 will do a lot better than the 5433. But the real surprise to me is that X-Gene 1 doesn't even beat the A15 in Tegra K1 despite being wider, newer and running at a higher frequency - that's why the results look too low.I wouldn't call upgrading to the latest compiler tweaking - for AArch64 that is kind of essential given it is early days and the rate of development is extremely high. If you tested 32-bit mode then I'd agree GCC 4.8 or 4.9 are fine.
CajunArson - Tuesday, March 10, 2015 - link
This is all part of the problem: Requiring people to use cutting edge software with custom recompilation just to beat a freakin' Atom much less a real CPU?You do realize that we could play the same game with all the Intel parts. Believe me, the people who constantly whine that Haswell isn't any faster than Sandy Bridge have never properly recompiled computationally intensive code to take advantage of AVX2 and FMA.
The fact that all those Intel servers were running software that was only compiled for a generic X86-64 target without requiring any special tweaking or exotic hacking is just another major advantage for Intel, not some "cheat".
Klimax - Tuesday, March 10, 2015 - link
And if we are going for cutting edge compiler, then why not ICC with Intel's nice libraries... (pretty sure even ancient atom would suddenly look not that bad)Wilco1 - Tuesday, March 10, 2015 - link
To make a fair comparison you'd either need to use the exact same compiler and options or go all out and allow people to write hand optimized assembler for the kernels.68k - Saturday, March 14, 2015 - link
You can't seriously claim that recompiling an existing program with a different (well known and mature) compiler is equal to hand optimize things in assembler. Hint, one of the options is ridiculous expensive, one is trivial.aryonoco - Monday, March 9, 2015 - link
Thank you Johan. Very very informative article. This is one of the least reported areas of IT in general, and one that I think is poised for significant uptake in the next 5 years or so.Very much appreciate your efforts into putting this together.
JohanAnandtech - Tuesday, March 10, 2015 - link
Thanks! It is been a long journey to get all the necessary tests done on different pieces of hardware and it is definitely not complete, but at least we were able to quantify a lot of paper specs. (25 W TDP of Xeon E3, 20W Atom, X-Gene performance etc.)enzotiger - Tuesday, March 10, 2015 - link
SeaMicro focused on density, capacity, and bandwidth.How did you come to that statement? Have you ever benchmark (or even play with) any SeaMicro server? What capacity or bandwidth are you referring to? Are you aware of their plan down the road? Did you read AMD's Q4 earning report?
BTW, AMD doesn't call their server as micro-server anymore. They use the term dense server.
Peculiar - Tuesday, March 10, 2015 - link
Johan, I would also like to congratulate you on a well written and thorough examination of subject matter that is not widely evaluated.That being said, I do have some questions concerning the performance/watt calculations. Mainly, I'm concerned as to why you are adding the idle power of the CPUs in order to obtain the "Power SoC" value. The Power Delta should take into account the difference between the load power and the idle power and therefore you should end up with the power consumed by the CPU in isolation. I can see why you would add in the chipset power since some of the devices are SoCs and do no require a chipset and some are not. However, I do not understand the methodology in adding the idle power back into the Delta value. It seems that you are adding the load power of the CPU to the idle power of the CPU and that is partially why you have the conclusion that they are exceeding their TDPs (not to mention the fact that the chipset should have its own TDP separate from the CPU).
Also, if one were to get nit picky on the power measurements, it is unclear if the load power measurement is peak, average, or both. I would assume that the power consumed by the CPUs may not be constant since you state that "the website load is a very bumpy curve with very short peaks of high CPU load and lots of lows." If possible, it may be more beneficial to measure the energy consumed over the duration of the test.
JohanAnandtech - Wednesday, March 11, 2015 - link
Thanks for the encouragement. About your concerns about the perf/watt calculations. Power delta = average power (high web load measured at 95% percentile = 1 s, an average of about 2 minutes) - idle power. Since idle power = total idle of node, it contains also the idle power of the SoC. So you must add it to get the power of the SoC. If you still have doubts, feel free to mail me.jdvorak - Friday, March 13, 2015 - link
The approach looks absolutely sound to me. The idle power will be drawn in any case, so it makes sense to add it in the calculation. Perhaps it would also be interesting to compare the power consumed by the differents systems at the same load levels, such as 100 req/s, 200 req/s, ... (clearly, some higher loads will not be achievable by all of them).Johan, thanks a lot for this excellent, very informative article! I can imagine how much work has gone into it.
nafhan - Wednesday, March 11, 2015 - link
If these had 10gbit - instead of gbit - NICs, these things could do some interesting stuff with virtual SANs. I'd feel hesitant shuttling storage data over my primary network connection without some additional speed, though.Looking at that moonshot machine, for instance: 45 x 480 SSD's is a decent sized little SAN in a box if you could share most of that storage amongst the whole moonshot cluster.
Anyway, with all the stuff happening in the virtual SAN space, I'm sure someone is working on that.
Casper42 - Wednesday, April 15, 2015 - link
Johan, do you have a full Moonshot 1500 chassis for your testing? Or are you using a PONK?