Comments for Amazon Announces Graviton2 SoC Along With New AWS Instances: 64-Core Arm With Large Performance Uplifts

Amazon Announces Graviton2 SoC Along With New AWS Instances: 64-Core Arm With Large Performance Uplifts

by Andrei Frumusanu on 12/3/2019 12:30 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

41 Comments

Back to Article

Raqia - Tuesday, December 3, 2019 - link
I assume those are multi-core figures being quoted against the M5?
Andrei Frumusanu - Tuesday, December 3, 2019 - link
Correct.
Raqia - Tuesday, December 3, 2019 - link
So they're comparing a 24 core x86 Xeon to a 64 core Neoverse implementation.
PeachNCream - Tuesday, December 3, 2019 - link
But it can tow the rear drive only Xeon uphill with an only slightly obvious rolling start so it's clearly better until Intel requests Amazon send a system over to them for an "apples-to-apples" compairson.
andrewaggb - Tuesday, December 3, 2019 - link
Unless I'm misunderstanding something it sounds like it'll have worse perf/$ than epyc and not be x64.
shompa - Tuesday, December 3, 2019 - link
Not being X64 is great. Why use fake 64bit extensions that need a 32bit CPU core to work when you can use real 64bit and remove the whole 32bit CPU block and save energy and die space.
scineram - Wednesday, December 4, 2019 - link
No.
kallinteris - Wednesday, December 4, 2019 - link
what do you mean by " use fake 64bit extensions that need a 32bit CPU core to work when you can use real 64bit"
all modern x86 programs are compiled for 64 bit anyway
vanilla_gorilla - Tuesday, December 3, 2019 - link
And probably at lower cost and power usage.
SarahKerrigan - Tuesday, December 3, 2019 - link
How do you figure? It says "per vCPU." A vCPU is a single thread.
Raqia - Wednesday, December 4, 2019 - link
It looks like they're running the multi-threaded benchmark across all vCPUs then dividing by the number of vCPUs. So it's a renormalized multicore figure:

https://zdnet1.cbsistatic.com/hub/i/2019/11/30/3d9...

from:

https://www.zdnet.com/article/aws-graviton2-what-i...
ksec - Tuesday, December 3, 2019 - link
2x Performance Per Core over A1. I think that is finally reaching the Desktop / Server Class CPU performance, possibly 70% of Skylake.

This is going to KickStart the ARM Server usage. Would love to see some benchmarks on those.
webdoctors - Tuesday, December 3, 2019 - link
There's no kickstarting ARM server usage. Amazon is MASSIVE. Basically if they move their hosted services to their own internal ARM servers, ARM server usage could go from 0 to 50% overnight. Its like the Costco analogy. Whatever Amazon uses internally for their storage, networking, CPU, would become one of the largest in the world overnight.

I heard their internal network chip is like 3rd biggest in the world in volume after the big guys because they use so much networking hardware internally. Even without external customers they've got the volume to skew the numbers.
R0H1T - Tuesday, December 3, 2019 - link
>ARM server usage could go from 0 to 50% overnight

You're clearly exaggerating. Pretty sure Google+FB have a bigger share of the market driving server sales in the Enterprise arena, they aren't 50% combined so AWS can't possibly be more than 50% on their own. Heck YouTube alone might be consuming more storage+chips (server) than AWS.
ksec - Wednesday, December 4, 2019 - link
Well it wont be 50%, but i know it is big. I missed the announcement I thought this is going to be like their A1 instances, turns out they intend to have all of the SaaS offer, ( DB, SMS, Mail, DNS Whatever it is ) to be running on ARM.

But HyperScaler together ( That is Alibaba, Amazon, Google, Facebook, etc ) together owns 50%+ of the market shipment, and AWS were estimated at 50% of Hyperscaler, so that is likely 20 - 25% of Intel DC revenue vanishing.
ksec - Wednesday, December 4, 2019 - link
Also add they have ( I dont know where it was posted ) also gave out the single core performance to be 30% faster then a Skylake 3.1Ghz thread.

That is very impressive.
Operandi - Tuesday, December 3, 2019 - link
30B transistors? Ins't the Xeon chip they are comparing it to like half that?

If thats the case that doesn't look all that impressive at all.
blu42 - Tuesday, December 3, 2019 - link
Being stuck in 14nm land is farm more impressive, I agree.
Operandi - Tuesday, December 3, 2019 - link
Nice one. Nothing at all to do with my statement but also completely irreverent to the story as a whole.
Wilco1 - Tuesday, December 3, 2019 - link
Yeah, so you need not 1 but 2 expensive 24-core Xeon chips to get similar performance. Not impressive at all...
mode_13h - Tuesday, December 3, 2019 - link
Sure, Amazon could compare to what they *estimate* a 10 nm Ice Lake server chip could deliver, but that just adds more variable into the mix. There's value in comparing to a known quantity (i.e. a current instance, whether their own or a competitive one).

Anyway, such apples-to-apples comparisons will certainly be made, once both types of instances are actually available.
name99 - Tuesday, December 3, 2019 - link
It's not meant to be "impressive", it's meant to be informative.
Some of us can put the number in context; for everyone else it's irrelevant.
Spunjji - Wednesday, December 4, 2019 - link
Who cares how many transistors it uses if a comparable Intel product *doesn't exist*?
phoenix_rizzen - Thursday, December 12, 2019 - link
Xeon: 1.5 MB of L1, 24 MB of L2, and 33 MB of L3 cache.
Graviton: ? MB of L1, 64 MB of L2, 32 MB of L3 cache.

24 cores vs 64 cores.

6 memory controllers vs 8.

48 PCIe lanes vs 64.

And so on. There's other blocks in the CPU die as well that aren't compared here (media, AVX, etc) It's not that hard to figure out why one has more transistors than another.
bryanlarsen - Tuesday, December 3, 2019 - link
A vCPU is half a core on Intel but a full core on Neoverse. So 40% faster per vCPU is actually 30% slower per core.
Wilco1 - Tuesday, December 3, 2019 - link
44% faster per vCPU means 95% per core since Hyperthreading gives about 30% on average.
SarahKerrigan - Tuesday, December 3, 2019 - link
In terms of throughput, the correct comparison point for one Neoverse vCPU is two Purley vCPUs, because AFAIK a vCPU is added per hard context on EC2. Based on that, a Purley core is still considerably higher throughput than an N1 core.

I suspect single-thread is close, or at least would be at base clocks; their mature turbo implementation continues to be a strong point for Intel. I also expect the Graviton2 chip's perf/W to be far better than the Xeon it is compared to.
Wilco1 - Tuesday, December 3, 2019 - link
Sure, but the Neoverse cores are much smaller so you get 2.5 times the cores. If you're interested in throughput you need to compare total throughput per chip rather than per core. According to the SPECINT score a 24-core Skylake-SP gets only half the throughput of one Graviton2 chip, so you need 2 of them.

The Platinum 8175 AWS m5 instances have a 3.1GHz all-core turbo (https://en.wikichip.org/wiki/intel/xeon_platinum/8... so getting ~95% of single-threaded performance of the Skylake at its max turbo is pretty impressive!
Antony Newman - Tuesday, December 3, 2019 - link
Ultimate multicore performance for single SoC x86 is being limited by dark silicon on Intel 14nm.
For a 64 Core Intel monster - they need their (Intel) 7nm process - or a multi SoC solution.

When TSMC’s 5nm ovens are ready is ready - Amazon will be able to ARMs next Cores that will close the per Core performance gap - but allow considerably more cores before bottlenecking occurs,

A 128 Core Arm Poseidon SoC on TSMC 5nm could very well eclipse a 64 Core Intel CPU bakes on Intel 7nm - but cost Amazon a fraction of the cost.

AJ
mdriftmeyer - Wednesday, December 4, 2019 - link
When TSMC's 5nm is ready AMD's future Zen cores will curb stomp anything ARM can offer, like they already do.

Language is a funny thing, ``New Generation of ARM-based instances powered by AWS Graviton2 processors offer 40% better price/performance than current x86-based instances.''

A. That's 40% over previous Graviton processor nodes. BFD.
B. Our upcoming x86-based instances drastically knee cap our current x86-based instances in price/performance but we won't say that as we're trying to sell our own schtick here.
Gondalf - Friday, December 6, 2019 - link
TSMC 5nm do not give such area advantages over 7nm to allow Poseidon. 5nm is more like an half node.
mode_13h - Tuesday, December 3, 2019 - link
Just to nit pick, Purley is Intel's LGA 3647-based platform spec - not the core uArch or anything like that.
techbug - Friday, December 6, 2019 - link
how per vCPU calculated is totally over my head. Is it total-score on intel processor divided by the number of hardware thread (96, 2 * 48 threads/socket) compared against ARM processor score divided by (128, 2* 64threads/socket) ?
name99 - Tuesday, December 3, 2019 - link
How many people buying AWS services care about latency rather than throughput?
Sure, you need to hit a minimum per-core performance level, but once that's achieved what matters is the throughput/dollar (including eg rack volume and watts).

Judging a design like this by metrics appropriate to the desktop is just silly.
ksec - Wednesday, December 4, 2019 - link
It doesn't matter, you get 1 thread on Intel vCPU, you get 1 Core per ARM vCPU . The unit are the same. Not to mention a lot of Clients and Workload likes to have HT disabled.

As long as the 1 ARM vCPU is cheaper, ( which it is ), and provides comparable performance ( which it does, according to AWS it is 30% faster than a Single Thread 3.1Ghz Skylake ) then it is all that matters.
Sychonut - Tuesday, December 3, 2019 - link
Now imagine this, but on 14++++++.
name99 - Tuesday, December 3, 2019 - link
The numbers seem a bit strange, Andrei. I assume we all agree that, while this is a nice step forward in the ARM server space, the individual cores are no Lightning's.
So let's look at area; TSMC 7nm so basically like with like:

IF one chip has 32 cores (per yesterday's article) then one core (+support ie L3 etc) is ~10mm^2.
Meanwhile Apple is about 16mm^2 (eyeballing it as about 1/6th of the die for 2 large+small cores,+ L2s + large system cache).
So Apple seems to be getting a LOT more out of their die... Even put aside the small cores and their per big core (+LOTS of cache) is ~8mm^2.

Of course DRAM PHYs take some space, but mainly around the edges.
So possibilities?
- 64 cores on the die, not 32? AND/OR
- LOTS of IO? A few ethernet phy's, some flash controllers, some USB and PCIe?
- lost of the die devoted to GPU/NPU?

The only way I can square it is likely all three are true. Half the die is IO+GPU/NPU (which gets us to 5mm^2/core) AND there are actually 64 cores? WikiChip says an N1+L2 is supposed to be around 1.4mm^2 on 7nm, so throw in L3 and the numbers kinda work out.
ksec - Wednesday, December 4, 2019 - link
They are 32 Core, not 64

I/O takes up more space and does not scale well with node changes. Yes. There are lot of I/O needs for Server, especially PCI-E lanes.
Wilco1 - Wednesday, December 4, 2019 - link
The chip has 64 cores, 8 DDR interfaces and 64 PCI lanes.

I can't see the confusion about core count, a 48-core Centriq has 18 Billion transistors, this has 30 for 64.
name99 - Wednesday, December 4, 2019 - link
The "confusion" is that this article
https://www.anandtech.com/show/15181/aws-designs-3...
claimed 32 cores.

And it's not a confusion, it's an attempt to confirm various points that would appear to be obvious (the number of cores, the amount of IO, AND --- you left this out --- the amount of non-CPU logic [GPU or NPU]) but which were omitted by this article or, apparently, simply incorrect in an earlier article.
SanX - Thursday, December 5, 2019 - link
Would be nice if these chips had FP units and AVX to really compete with Intel and AMD at supercomputer level.

Amazon Announces Graviton2 SoC Along With New AWS Instances: 64-Core Arm With Large Performance Uplifts

Post Your Comment

41 Comments

Back to Article

Raqia - Tuesday, December 3, 2019 - link

Andrei Frumusanu - Tuesday, December 3, 2019 - link

Raqia - Tuesday, December 3, 2019 - link

PeachNCream - Tuesday, December 3, 2019 - link

andrewaggb - Tuesday, December 3, 2019 - link

shompa - Tuesday, December 3, 2019 - link

scineram - Wednesday, December 4, 2019 - link

kallinteris - Wednesday, December 4, 2019 - link

vanilla_gorilla - Tuesday, December 3, 2019 - link

SarahKerrigan - Tuesday, December 3, 2019 - link

Raqia - Wednesday, December 4, 2019 - link

ksec - Tuesday, December 3, 2019 - link

webdoctors - Tuesday, December 3, 2019 - link

R0H1T - Tuesday, December 3, 2019 - link

ksec - Wednesday, December 4, 2019 - link

ksec - Wednesday, December 4, 2019 - link

Operandi - Tuesday, December 3, 2019 - link

blu42 - Tuesday, December 3, 2019 - link

Operandi - Tuesday, December 3, 2019 - link

Wilco1 - Tuesday, December 3, 2019 - link

mode_13h - Tuesday, December 3, 2019 - link

name99 - Tuesday, December 3, 2019 - link

Spunjji - Wednesday, December 4, 2019 - link

phoenix_rizzen - Thursday, December 12, 2019 - link

bryanlarsen - Tuesday, December 3, 2019 - link

Wilco1 - Tuesday, December 3, 2019 - link

SarahKerrigan - Tuesday, December 3, 2019 - link

Wilco1 - Tuesday, December 3, 2019 - link

Antony Newman - Tuesday, December 3, 2019 - link

mdriftmeyer - Wednesday, December 4, 2019 - link

Gondalf - Friday, December 6, 2019 - link

mode_13h - Tuesday, December 3, 2019 - link

techbug - Friday, December 6, 2019 - link

name99 - Tuesday, December 3, 2019 - link

ksec - Wednesday, December 4, 2019 - link

Sychonut - Tuesday, December 3, 2019 - link

name99 - Tuesday, December 3, 2019 - link

ksec - Wednesday, December 4, 2019 - link

Wilco1 - Wednesday, December 4, 2019 - link

name99 - Wednesday, December 4, 2019 - link

SanX - Thursday, December 5, 2019 - link

Log in

Don't have an account? Sign up now