Original Link: https://www.anandtech.com/show/7265/intels-second-generation-microserver-soc
The Next Generation of Micro Server SoCs: ECX-2000 vs Atom 2000
by Johan De Gelas on October 29, 2013 5:59 AM EST- Posted in
- Intel
- Arm
- Enterprise
- Calxeda
- Avoton
Our review of the Boston Viridis, one of the first Calxeda ECX-1000 based servers, was a pretty weird one. Instead of trying out different server workloads, we deliberately went for one of the few scenarios where the server might make sense: hosting light webservers. There were a few others like Content Delivery Network server or storage server, but those were about it. The quad ARM Cortex A9 inside the ECX-1000 was faster than the contemporary Atom SoCs, but missed the RAM capacity and raw performance of low power Xeons to be an alternative in most server workloads. The measured (!) 8 Watt per server node was however simply spectacular and the network fabric was one of the best in the industry. Calxeda was on the right track - they only needed more RAM and single thread performance in a server node.
Calxeda has announced its second generation server SoC yesterday, the EnergyCore ECX-2000. Based upon the more powerful ARM Cortex A15, this new SoC should be able to deliver up to twice as much performance at 1.8 GHz than the ECX-1000 at 1.4 GHz and offer four times more RAM (16 GB per node). Although we will not believe the performance claims until we have tested them ourselves, it is not impossible to speculate. Anand compared the Google Nexus 10 with the Samsung Galaxy Tab 3 8.0: the former has a Samsung Exynos 4 based upon a dual Cortex A15 at 1.7 GHz inside, the latter a very similar Samsung design, the 4212 based upong a dual Cortex A9 at 1.5 GHz.
Benchmark | A15 vs A9 |
Sun Spider 1.0 | 140% |
Mozilla Kraken | 176% |
Octane v1 | 168% |
It is impossible to estimate the performance of server SoCs by looking at browser benchmarks on tablet SoCs, but it gives us rough idea of how much extra crunching power the A15 delivers. At 7-zip.com we can compare an A15 at 1.7 GHz (Samsung Exynos 5250) with an A9 at 1.4 GHz (Samsung Exynos 4412):
Benchmark | A9 | A15 | A15 vs A9 |
LZMA compression | 1200 | 2270 | 189% |
LZMA decompression | 2400 | 3560 | 148% |
As we posted before, the LZMA compression does have some similarities with typical server workloads. A Xeon "Sandy Bridge EP" 1.8 GHz scored 2793 with one thread, an EnergyCore ECX-1000 at 1.4 GHz scored 833 according to our own benchmarking. So we can estimate that a ECX-2000 would probably score around 1600, or similar to a modern Xeon at 1 GHz. Not earth shattering, but when you start looking at power consumption these numbers start to make sense.
Power
While the ECX-1000 needed 5 (1.1 GHz) to 6W (1.4 GHz per SoC), according to Calxeda the ECX-2000 needs about 7 to 10W (1.8 GHz). This equates to about 2.5 W per 1.8 GHz core. The best low power Xeon, the Xeon E3-1230L V3, has 4 cores (with HT) at 1.8 GHz with a TDP of 25W, or around 6W per physical core. Even though we do not know exactly what kind of server performance the ECX-2000 at 1.8 GHz will deliver, the limited data that we have makes it very likely that the ECX-2000 is going to be very interesting from a performance/watt point of view.
Of course, the real challenge will be the newly released Intel Atom C2000. Let us compare the new Calxeda SoCs with Intel's second generation of Server SoCs.
Calxeda feels that the ECX-2000 at 1.8 GHz is competitor of the C2530 1.7 GHz (2 GHz Turbo, 4 cores, 9W TDP). If we look at Intel's SKUs, we noticed that the C2730 1.7 GHz (2 GHz Turbo, 8 cores, 12W TDP) might be also be a close competitor. So we list the ECX-1000 (the previous Calxeda SoC), the ECX-2000 and the two closest Intel Atom competitors. The "integrated" part is a bit short on details, but it is out of the scope of the article to discuss the different levels of I/O integration. We'll discuss that in a later article.
CPU |
Atom S1260 |
ECX-1000 |
Atom C2530 |
ECX-2000 |
Atom C2730 |
Launch Date | Q3 2012 | Q2 2012 | Q3 2013 | Q4 2013 | Q3 2013 |
Process Technology | 32 nm | 40 nm | 22 nm trigate | 28 nm | 22 nm trigate |
Cores µ-Architecture |
2 + 2 logical (SMT) Saltwell |
4 physical |
4 physical |
4 physical Cortex-A15 |
8 physical Silvermont |
Clockspeed | 2 GHz | 1.4 GHz | 1.7/2 GHz | 1.8 GHz | 1.7/2 GHz |
L1-Cache (per core) L2-Cache |
24/32 KB D/I 2x 0.5 MB |
32/32 KB D/I 4 MB |
24/32 KB D/I 2x1 MB |
32/32 KB D/I 4 MB |
24/32 KB D/I 4x1 MB |
Memory controller |
Single Channel 64-bit |
64-bit |
Dual Channel 64-bit |
128-bit |
Dual Channel 64-bit |
Fastest Supported RAM | DDR3 at 1.33 GT/s | DDR3 at 1.33 GT/s | DDR3 at 1.6 GT/s | DDR3 at 1.6 GT/s | DDR3 at 1.6 GT/s |
Addressing | 64 bit | 32 bit | 64 bit | 32 bit with LPAE | 64 bit |
Max RAM | 8 GB | 4 GB | 64 GB | 16 GB | 64 GB |
Integrated PCIe | Yes | Yes | Yes | Yes | Yes |
Integrated Network | No | Yes | Yes | Yes | Yes |
Integrated SATA | No | Yes | Yes | Yes | Yes |
Typical Server node Power usage | 20W (*) | +/- 8 W |
15-18W ? (**) |
12-16W ? (**) | +/- 20W (*) |
(*) Based upon Intel's "22 nm Intel Atom server SoCs Performance Overview"
(**) Rough estimates
Although the Atom S1260 had a TDP of only 8.5W, the power numbers were simply not comparable to the other SoCs as the S1260 needed more additional chips to perform the same tasks. In practice this means that a server node based up on the S1260 need just as much power as the 12W TDP Atom C2730.
The performance/watt of the ECX-2000 SoC has probably not made a giant leap over the predecessor but the overall server efficiency should improve significantely as Calxeda also implemented Energy Efficient Ethernet (EEE) and other tricks to reduce the energy consumption of the "Fleet Fabric". And the point is of course that the number of applications where the performance per node is "good enough" has increased significantely.
The Atom C2000 can support up to 64 GB, where the ECX-2000 is limited to 16 GB. The trade-off is that the C2000 uses up to 4 DIMM slots, where the ECX-2000 is limited to one. Obviously, more DIMM slots offer more flexibility but also make the server node larger and consume more energy.
The benchmarking team of Intel Portland did their best to produce some really interesting benchmarks at the last server workshop in San Francisco, but many of the benchmarks did not work well on the ECX-1000 due to the very limited 4 GB RAM capacity. The most interesting benchmark can be found below: a front end web performance benchmark with high network traffic.
In this benchmark, Intel finally admits that the S1260 is nothing to be excited about. The Intel findings are very similar to ours: the ECX-1000 beats the the Atom S1260 by a wide margin in typical server workloads. So where will the ECX-2000 end up? We can not be sure, but we can roughly estimate that it will land somewhere between being 3 to 4 times faster than the Atom S1260. That is not enough to beat the Atom C2750, but that is after all a 20W TDP chip and the top SKU. Digging deeper in the Intel docs, we find that the C2730 at 1.7 GHz (12 W TDP) consumes about 20W for the whole server node (16 GB and 250 GB HD) and the C2750 about 28W when running SPECint_rate_2006. The harddisk will have consumed very little, since the SPECint_rate_2006 benchmark runs out of memory.
The ECX-2000 at 1.8 GHz will probably need roughly 12-16W per server node. So our first rough estimates tell us that the C2730 is out of the (performance) reach of the ECX-2000, and that Calxeda's estimate of the C2530 is right on the mark.
However, the story does not end there. The total power consumption of the ECX-1000 based Boston Viridis server we tested was remarkably low, the very efficient network fabric made sure there was little "power overhead" (PHYs, Backplane,...). This Fleet Fabric has been improved even further, so there is a good chance that the ECX-2000 based servers will offer a very competitive performance/watt, although the Atom C2730 has an edge when the application benefits from more threads. But when that is not the case, i.e. scaling is mediocre beyond 4 threads, the tables might turn. Anyway, there is a very good chance that the ECX-2000 is very competitive with the 4-core Atoms, to say the least.
There is indeed a reason why HP will use the Calxeda SoC in its new Moonshot server cluster in 2014. The picture above shows such a moonshot module. We felt that the Atom S1260 SoC was a bad match for the HP moonshot, but "HP's Moonshot 2.0" will be an entirely different story. And for those of us with less cash to burn we are looking forward what Penguin computing and Boston will make off their ECX-2000 based server.
Next stop, the 64-bit SoC code-named “Sarita,” based upon the 50% faster Cortex-A57 core, which is pin-compatible with the ECX-1000 and new ECX-2000. This reduces development time and expense for the ODMs. But right now, we can look forward to some interesting microserver comparisons in Q1 2014...