Original Link: https://www.anandtech.com/show/9956/the-silver-lining-of-the-late-amd-opteron-a1100-arrival
The Silver Lining of the Late AMD Opteron A1100 Arrival
by Johan De Gelas on January 14, 2016 1:19 PM EST- Posted in
- AMD
- Arm
- Opteron
- Cloud Computing
- Opteron A1100
AMD announced their ARM server SoC plans at the end of 2012. At the beginning of 2014, AMD was ready "to accelerate the ARM Server Ecosystem with the first ARM-based server SoC" with a development kit. Around March 2014, the A1100 SoCs started sampling. But the quad core dev kits were not only expensive ($3000!), they also had quite a bit of teething problems as performance did not meet expectations, some of the peripheral hardware did not work properly, and the software ecosystem was far from ready. We were expecting to see Opteron A1100 based servers at the end of 2014, but instead we got more than a year of almost complete silence. Frustratingly long for anyone who was hoping that AMD would finally bring something competitive to the server world.
Today, AMD is finally announcing that their Octal core Opteron A1100 server SoC and platform is fully ready as "a right sized option for the edge of the datacenter". A few smaller partners are even shipping it. but there is no sign of tier 1 OEMs yet. Most people following this part of the semiconductor part are thinking "too little, too late". We are pretty sure that includes you, dear reader. But there is more than meets the eye or we would not bother to write this article.
Cards On the Table
AMD is launching 3 SKUs and the spec table is available below.
From the early reports, performance is somewhere between 80 to 90% of the Atom C2750 (Eight 2.4 GHz Silvermont cores at 2.4-2.6 GHz). Even if AMD has used the delay to tune the A1100 significantly, it is very unlikely that the chip will be able to beat the Atom chip by any tangible margin.
The performance per watt ratio of the 28 nm ARM Opteron is not stunning either: a TDP of 32W at 2 GHz is worse than the 20W that the 22 nm C2750 needs at 2.4 GHz. Of course, TDP is one thing, the real power consumption at low and medium loads is more important for a server SoC. Nevertheless, we were promised an octal core at Atom C2750 levels at 25W TDP.
And last but not least, the Atom has been available for more than 2 years now. Its successor, Denverton - or the 14 nm Atom C3xxx - should arrive in the second half of 2016. But Intel is late too, as Denverton was also supposed to be in the market by early 2015. Contrary to AMD however, is Intel executing very well in all other server markets. With the much less than hyped/predicted growth of the microserver market, it is only natural that Intel prioritize other products in its datacenter portefolio.
Overall the A1100's 32W TDP looks high and performance expectations are low, so why bother? Pricing perhaps? Unfortunately the answer to that will be "no," as AMD told us that the initial price of the A1170 SoC will be $150. That is the same price of the Atom C2730 (eight cores at 1.7 GHz), which will probably perform at the same level, but the Intel chip is a 12W TDP chip. At $150 the A1100 even comes dangerously close to the quad core, 8 thread Xeon D-1520 at 2.2 GHz ($199). Do not let the 4-cores fool you, we found that the octal core (16 thread) Xeon D-1540 is no less than 5.5 times faster than an Atom C2750 in real server workloads. Take a look below.
The A1100 will probably score around 300, the Xeon D-1520 will probably score beyond 600. So an Xeon D-1520 will still be at least twice as fast as an Atom C2750 or Opteron A1100. So the new AMD SoC has no performance/watt advantage and no price/performance advantage over Intel's offerings.
Feature Rich
But not all servers are compute limited. Quite a lot of server tasks are I/O limited. Think static webserving, reverse proxys (Varnish server), in memory key value stores (Memcached), all kind of network servers and "cold" storage servers.
Low End Server SoCs: feature comparison | |||
Feature | Opteron A1100 | Atom C2000 | Xeon-D |
Max. RAM Capacity | 4 x 32 GB RDIMM | 4x 16 GB RDIMM | 4 x 32 GB RDIMM |
PCIe | 8 gen 3.0 lanes | 16 gen 2.0 lanes | 24 gen 3.0 lanes 8 gen 2.0 lanes |
SATA | 14x SATA3 | 2x SATA3 4xSATA2 |
6x SATA3 |
Ethernet | Dual 10 Gb | Dual 1 Gb | Dual 10 Gb |
USB | Not Integrated | 4x USB 2.0 | 4x USB 3.0 4x USB 2.0 |
With 14 SATA ports and two real 10 Gb Ethernet ports, AMD's A1100 is a great place to start to build a storage device. Considering that quite a few storage devices now use a quadcore A15, which is limited to 4 GB of RAM (16 GB with PAE tricks), an octal core A57 that can address 128 GB opens up new opportunities. The quad core A1120 will do nicely even though it might consume up to twice as much (25W) as Annapunra Labs Alpina AL5140 SoC (Quad A15 at 1.7 GHz), which needs around 10W. In a storage device with 16 disks, 10W should not be a deal breaker, especially if you can offer more caching, faster encryption and higher overall performance.
The specs do not look bad for a caching server either, as 32 GB RDIMMs are less expensive per GB than 8 GB RDIMMs now.
Software Support, or Why it Took So Long
The other big question is of course why A1100 took so long. The answer to that is actually pretty simple. Some of the building blocks like fine tuned ACPI and PCI Express support for ARM CPUs were not initially adapted to the server world, and AMD needed to wait for those to come along to give A1100 a fighting chance.
Just look at the slide with software support and see the comment "supports ACPI and PCIe". That would look pretty odd on an announcement of an x86 server CPU, but it is relatively new for a 64 bit ARM server environment. You might ask yourself how our Applied X-Gene server worked well with Ubuntu server nine months ago. The X-Gene server ran a specially adapted version of Ubuntu. That is fine as a temporary solution, but unless the modifications go "mainstream linux", each new version must be adapted again to make it work with your server. Costly and time consuming, so AMD went the other way, making sure that the necessary improvements were part of the official Linux kernel.
For the Ubuntu fans: the A1100 runs on top of ubuntu 15.10. According to AMD, it is fully functional but at the moment without support of Canonical.
... In the Cloud
In the early age of ARM server hype, the word micro server was used a lot. Than that word was associated with "wimpy cores" and marketing people avoided it at almost any cost. But the word might make a comeback as developers are starting to write more and more micro services, a way of breaking down complex software in small components that perform a distinct task.
One of the cool things micro services make possible is to make software scale horizontally very quickly when necessary, but run on instances/virtual machines/servers with modest resources when that is not the case. And this helps a lot to keep costs down and performance high when you are running on top of a public cloud. In other words, public clouds are encouraging this kind of development.
Granted, at the moment, micro services mostly run inside virtual machines on top of the brawnier Xeon E5s. But it is not too far fetched to say that some of the I/O intensive micro services could find a home in a cheaper and "right sized" physical server. All these micro services need low latency network connections as one of the disadvantages of micro services is that you get software components talking to each other over the network instead of exchanging data and messages in RAM.
And of course, webfarms already moved to this kind of architecture, way before before the rise of micro services. Caching servers, static and dynamic webservers, databases are all running on separate machines. The distributed architecture of these webfarms craves fast and low latency networking.
The Silver Lining
Remember our coverage of the first ARM based server, the Calxeda based server? The single thread performance was pretty bad, the pricing was too high and the 32 bit A9 cores limited each server node to a paltry 4 GB. But the low power, high performance network fabric and the server node technology delivered a pretty amazing performance/watt ratio and very low network latency. Calxeda's fabric came too early as the ARM SoCs were not simply not good enough at that time. An A15 based ECX-2000 was developed as stop gap measure, but Calxeda run out of money. But that was not the end of the story.
Yes, Silver Lining has bought up the IP of Calxeda. The current offering is still based upon the ECX-2000 (A15 cores). Once they adopt the Opteron A1100, the "Calxeda Fabric" is finally freed from the old 32 bit ARM shackles.
We don't have to wait for a "fully clustered server". Silver Lining also has an FIA-2100 fabric switch available, a PCIe card. Basically you can now have a Calxeda cluster, but then at rack level.
You buy one top of rack (ToR) switch (the light blue bar above) and 12 FIA fabric switches to make a cluster of 12 servers. You connect only one out of three servers to the Tor switch and you interconnect the other servers via the FIA-2100 NICs. The (programmable) intelligence of the FIA-2100 fabric then takes over and gives you a computer cluster with very low latency, redundancy and failover at much lower costs than the traditional networking, just like the good old Calxeda based cluster. At least, that is what Silver Lining claims, but we give them the benefit of the doubt. It is definitely an elegant way to solve the networking problems.
The FIA-2100 NICs is supported on the new A1100 platform. However, it is not all good news for AMD and ARM. This technology used to be limited to just ARM SoCs, but now the Calxeda fabric is PCIe technology, so it will also work with all Intel x86 servers. There is good chance that the first "Calxeda fabric based cluster in a rack" will be powered by Xeon Ds.
We might assume though that the "non-rack" or "inside one server" product of Silver Lining will be most likely A1100 based as their current product is also ARM based.
So there is a chance that the AMD A1100 will find a home in its own "MoonShot alike chassis". A Silver Lining in the dark clouds of delays.
Conclusion
So how do we feel about the A1100? It is late, very late. The expected performance and power consumption are most likely not competitive with what Intel has available, let alone what Intel will launch in a few months. But at last, AMD has managed to launch a 64 bit ARM server SoC which has the support of all major Linux distributions and which can benefit from all progress that the Linux community makes, instead of relying on a special adapted distribution.
The most important things like ACPI and support for PCI Express seems to be working. AMD has paid a high "time to market" price for being an 64 bit ARM server pioneer. The A1100 time schedule suffered from the teething problems of the ARM server ecosystem. Still, the A1100 might be a good way to finally kickstart the ARM server market. Thanks to the Linaro "96boards enterprise edition", a 300-400$ SoC + board should be available soon and make it much cheaper to build software for the 64 bit ARM ecosystem. Thanks to Silver Lining, complete clusters of A1100 servers might get the attention of the cloud providers.
This may pay off in the near future, on the condition that the K12 core is delivered in a timely manner (2017). Because at the end of the day there are no excuses left for AMD or for ARM. If ARM servers are to be successful they will finally have to deliver instead of promising dreamt up server marketshare percentages on a progressively further in the future date.