Name: NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems
Item: NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems
Author: Ryan Smith

NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems

by Ryan Smith on 4/12/2021 12:20 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

119 Comments

Back to Article

SarahKerrigan - Monday, April 12, 2021 - link
I think everyone saw this coming - Nvidia's been hiring a bunch of server-focused design engineers for a while, and the apparent breakup with IBM (Power10 has no NVlink) pointed pretty strongly to an Nvidia server processor.

So what's the core going to be? N2, or a future Neoverse V-series? If it's going to be in systems in early 2023, it's gotta be pretty far in development already.
gescom - Monday, April 12, 2021 - link
Amazing.
mdriftmeyer - Monday, April 12, 2021 - link
It won't be in production until 2025.
SarahKerrigan - Monday, April 12, 2021 - link
Where did you get that idea? They said systems would be installed by early 2023.
eva02langley - Monday, April 12, 2021 - link
First gen is Gravityron CPUs by AWS.
SarahKerrigan - Tuesday, April 13, 2021 - link
No, it isn't.
eva02langley - Tuesday, April 13, 2021 - link
Jensen said it live... first instance is Graviton CPU this year.

https://youtu.be/eAn_oiZwUXA?t=3935
SarahKerrigan - Tuesday, April 13, 2021 - link
That's a separate thing. That is not the Grace CPU, which is specifically an HPC CPU for on-prem use arriving in 2023.
MetaCube - Wednesday, April 14, 2021 - link
Are you trolling ?
mode_13h - Thursday, April 15, 2021 - link
I, for one, am confused. "First gen" of what? Graviton 1 or 2?

Because Graviton 1 used A72 cores. So, it's not even first gen of Neoverse.

If the comment was about first gen of Neoverse, then Graviton 2 did use N1 cores. But, I have no idea what that even has to do with anything, much less the post it's written as a reply to.

So, I think a short reply to a totally ambiguous claim is warranted. It's sufficient to call into question said claim, without getting sucked into a potential troll like I appear to be doing.
mode_13h - Monday, April 12, 2021 - link
If you're looking at the roadmap slide, that's "Grace Next" -- its successor.

From the article:

"Both systems will be built by HPE’s Cray group, and are set to come online in 2023."
mode_13h - Monday, April 12, 2021 - link
N2 gets my vote. With so much compute power in the GPU, they'd probably prefer a larger number of narrower and more efficient cores.
SarahKerrigan - Monday, April 12, 2021 - link
I think that's likely, depending on what the timeline for N2 IP availability looks like.
name99 - Monday, April 12, 2021 - link
Ah Sarah, you really think "everyone saw this coming"?

In the world I live in, last week "everyone" was chattering about how Intel was already back, ready to crush it with Ice Lake Scalable Xeon 3D Turbo Supermax (or whatever the lunatic official name of the product is).

I don't think you can coherently hold in one brain both the idea that "nV is making a serious play for the data center/HPC" AND "Intel will be just fine. Daddy's home and the bad times are over."
I expect than in a week this nV announcement will be forgotten and we'll be back to the fantasy that Intel's all poised to deliver multiple rabbits out of multiple hats over the next few years.

(Don't get me wrong. I'm not saying Intel will never sell another chip. I am saying that we're very close to peak Intel. Will it be 2020 or 2022? No idea. But it won't be 2025, I just don't see that happening. It's gone beyond just process issues now; now we're hitting design issues, product-market fit issues, almost fifty years of accumulated and never paid off technical debt.)
SarahKerrigan - Monday, April 12, 2021 - link
I basically agree. I think we're reaching the point where x86 is going to stop being the unchallenged default for servers, which it has been since the early-mid 2000s. There will be a niche for it indefinitely - folks who need a piece of highly optimized existing software, or people who need a 4+ socket server for running MSSQL, or whatever - but the era where x86 is dominant, rather than one option out of multiple good ones, looks to me like it's probably over, and is extra-super-over for hyperscalers. Amazon already has its own server CPU, Google has announced they will soon, and Microsoft has been speculated to have its own server CPU program too. And beyond that, there's the merchant CPU vendors. This is turning into a bona fide ecosystem, even if it took a few false starts to get there. That doesn't mean it will stick around indefinitely; there's a lot of noise around RV, but right now, ARM is where the high-end IP is. That could change.

There's things that could divert this - ARM failing to deliver improvements on the IP side, or AMD starting to execute above the pace they already have been - but I have a hard time imagining a 2025 server market that doesn't have some serious ARM share. Graviton - really Grav2, since Grav was not ever very relevant or really even good - was the beginning, but there will be others.

You can count on it.
eva02langley - Monday, April 12, 2021 - link
The thing is AWS chips are not matching specific x86 chips in the majority of API and even there, the compatibility is a huge issue. And let`s not get into security... I don`t trust Nvidia and their proprietary technology on that aspect.
eastcoast_pete - Monday, April 12, 2021 - link
In that regard, and purely on wild speculation, Intel could, in theory (unlikely) pull a massive rabbit out of their hat with Sapphire Rapids, if (!) they a. can deliver it at all, in time, b. Get that on-package HBM design right and c. Then integrate their high-performance Xe GPUs with it. At least theoretically, the parts are there, if they could only execute on them.
SarahKerrigan - Tuesday, April 13, 2021 - link
Getting substantial HPC share in GPUs is hard, though, and is as much a software/ecosystem problem as a hardware problem - just ask AMD. Nvidia is the undisputed incumbent in that space, and I think it'll take more than a couple of wins at national labs for Intel or AMD to change that. It's a good start, but time will tell if either of them are able to follow through.
silverblue - Tuesday, April 13, 2021 - link
+1 for Top Gun quote.
SarahKerrigan - Tuesday, April 13, 2021 - link
It seemed appropriate in context~
mode_13h - Wednesday, April 14, 2021 - link
One word: China. I'm sure they don't like ARM being under USA ownership, not that they were thrilled about its former Japanese owner.

China can single-handedly build the market for RISC V (or resurrect MIPS, etc). And if you think that doesn't matter to outsiders, look at who's dominating the global electronics market and even things like wi fi routers. They could go top-to-bottom RISC V, establishing it as at least a presence in virtually all markets and tiers.
Santoval - Tuesday, April 13, 2021 - link
If Intel do not manage to take back the performance and efficiency lead in 2023/2024 (as they said they plan) then "peak Intel" was actually in 2015, when Skylake was released; btw, I am not referring to sales but to performance, efficiency and by and large advantages over the competition. I doubt they can recover after that. Since 2015 they have released for the desktop 5 or 6 (I lost count) minor variations of the exact same design fabbed on minor variations of the exact same process node, simply because they crapped the bed of their 10nm node and were too proud, stubborn or secretive to outsource production to third party fabs.

In the laptop space they arguably did a bit better but not with Ice Lake; that was a new design which traded ~18% higher IPC for ~18% lower clocks (lol), so it was roughly as fast as its predecessor. Its Gen11 iGPU was newer and better but still semi-crappy compared to the Vega iGPUs of AMD. Intel's Tiger Lake (released in late 2020) was the first real threat to AMD, since it sported both higher clocks and a spanking new Xe iGPU which -surprise- was *faster* than the equivalent ancient Vega iGPUs (because AMD turned complacent and pulled their own Gen9 UHD crap with Vega..).

Still, Intel as always played games (perhaps more due to low yields than deceptive marketing this time) with pricing and configurations, with the actually fast parts with big fat Xe iGPUs being very rare to find, and very pricey for those who did find them. The CPU cores of Tiger Lake normally beat the equivalent AMD APUs (I only mean the Zen 2 based ones, obviously; Cezanne can decimate them; so Intel's single thread lead lasted about an entire quarter..) but only in single thread performance, since Tiger Lake was capped to 4 cores.

Since Intel (according to their new CEO) plan to surpass AMD in 2023/2024 they obviously do not think Alder Lake will be competitive enough, which is crazy to think about. Maybe its successor Meteor Lake will be (against Zen 4 or 5?), maybe not. Either way I feel that Intel's peak is behind them, not in front of them; unless of course Meteor Lake or its successor deliver..
mode_13h - Wednesday, April 14, 2021 - link
It's a fantasy to think that Intel could ever just pick up the phone and order enough wafer capacity from TSMC to replace their own 10 nm fabs. Intel basically had no choice but to make their 10 nm work, to a reasonable extent.

Going beyond that, they are embracing new options, and they seem to have been quick to move some of their GPUs off their internal production path.
JayNor - Sunday, May 2, 2021 - link
Why would SPR not be considered a performance leader with DDR5 and PCIE5/CXL?
yetanotherhuman - Tuesday, April 13, 2021 - link
Nobody cared about the new Intel CPUs. They're trying to play catch-up with AMD. Not that exciting.
CiccioB - Tuesday, April 13, 2021 - link
Nobody, right, apart those few thousands customers that order some tens of millions of their chips.
The others do not care, and buy AMD. You are right.
Linustechtips12#6900xt - Tuesday, April 13, 2021 - link
I think other than in the sub 250$ basically anyone should get AMD, but for the sub 250$ they really only have the 3300x/3100 and 3600, unless you want something like a 3400g, I think that a 5600/3600 and a Navi GPU similar in size to the vega GPU on 3400g WOULD BE AMAZING at 250$
Linustechtips12#6900xt - Tuesday, April 13, 2021 - link
and there we have it boys the 5000 siries of AMD apus YESSIRRRR
CiccioB - Tuesday, April 13, 2021 - link
I haven't seen the 4000 series yet, and I doubt the 5000 will be more available than the previous one... AMD is really production limited on 7nm. It bet everything on it, but that was too much also for TSMC production capacity, seen that AMD is not the only customer on that node.

Others have exchanged "high performances at lower power consumption" with "higher availability". And seen the period we are witnessing, there is no doubts on who made the best bet.
Linustechtips12#6900xt - Monday, May 3, 2021 - link
i don't think the 4000 series came out of OEM production just for them because they are special
Qasar - Tuesday, April 13, 2021 - link
most of those i know that are looking for cpu upgrades, including one with a 2600K, are ALL looking to go with a ryzen 5000.
ChrisGX - Tuesday, April 13, 2021 - link
>> I am saying that we're very close to peak Intel. Will it be 2020 or 2022?

Some time in that timeframe, I expect. ARM based devices, robots and systems are dominant in new domains/sectors getting the autonomous intelligent systems treatment - motor vehicles, factory floors, etc. - and ARM seems to be taking business from x86 everywhere else so there isn't such a clear growth path for Intel, these days. Admittedly, Cloud Edge computing has a lot of growing to do so, if enterprises drag their feet the transition to ARM will take longer.

I can't wait to see Intel and AMD bludgeoning one another for exigent sales in a declining market. The smart play for both, it seems to me, would be to go down the custom ARM course. Within two years, I think that apparently insane proposition will probably have become an imperative one. (I suspect Pat Gelsinger's recent statements and others he will no doubt make in the future will be overtaken by events that aren't under PG's or Intel's control.)
Linustechtips12#6900xt - Tuesday, April 13, 2021 - link
"chrisGX" I totally agree with you but I kinda see why they keep competing in the x86 market, mainly due to Microsoft and what seems like since the lack of press reporting on it, not caring about porting windows to arm or just making windows 10x fully arm and then x86 can emulate arm. the first arm stuff they release is either gonna be mobile or server I'm betting on AMD releasing mobile Athlon chips in the next couple years with arm cores probably a Big little design to I would imagine.
mode_13h - Wednesday, April 14, 2021 - link
> The smart play for both, it seems to me, would be to go down the custom ARM course. Within two years, I think that apparently insane proposition will probably have become an imperative one.

AMD K12. Look it up.
dotjaz - Monday, April 12, 2021 - link
N2/V1 was announced last year, you should expect product to appear late this year, so there's a good chance this would be next generation.
Although N2 was supposed to be later and seems to be ARMv9 or something very close (ARMv8.5a+SVE/SVE2+I8MM+MEMTAG+BF16), so it's till possible.
Yojimbo - Monday, April 12, 2021 - link
I think it will be based on Arm's Poseidon platform. Why hold back? In the supercomputer space, AMD and Intel are going to be pushing hard for Genoa-Instinct and Granite Rapids-Xe systems. NVIDIA's memory bandwidth advantage will be a good selling point, but they will want a strong GPU to compete as even with CXL it may become lot harder to get their GPUs into supercomputers otherwise.

Now in the data center it'll be a different situation. There NVIDIA's GPU is likely to be the key selling point and will likely be chosen the majority of the time no matter which CPU it's paired with, though for large AI models the bandwidth the GPUs will enjoy to system memory using the Grace CPUs will be a strong selling point.
ChrisGX - Tuesday, April 13, 2021 - link
>> I think it will be based on Arm's Poseidon platform.

I think that is likely, too. Poseidon IP is due in 2022 and Grace will be released sometime in 2023. Not a lot of time but probably enough to get everything done right (as long as Poseidon IP can be delivered to licensees by early 2022). The scope of Poseidon IP (or that part of it due in 2022) isn't that clear, though. Will the IP be for Neoverse N3 or V2 or both N3 and V2? And nothing Nvidia has said about Grace gives a clue as to what pieces Nvidia will be using to build it. The SPECrate numbers quoted by Jensen Huang didn't really help in making any educated guesses about what cores might show up in Grace. I do think Grace will be an ARMv9 (with SVE2) CPU, however.

I can see this stinging Intel badly. I do not foresee any slow down of Nvidia in the supercomputer domain.

https://www.anandtech.com/show/16073/arm-announces...
Yojimbo - Tuesday, April 13, 2021 - link
EDIT: I meant "...they will want a strong CPU to compete..."
WaltC - Wednesday, April 21, 2021 - link
People really need to back up and get a grip, imo...announcing things, and shipping things, are two very different sides of the same "coin." The "shipping" facet being the whole ball of wax, so to speak. The present situation with nVidia and AMD should illustrate that fact convincingly--both companies have announced powerful, capable new GPUs, and both companies have actually made a few products with specific and well-known capabilities and hardware--but so far neither company has been able to get close to actually making/shipping enough of those products to meet global demand for them even fractionally!

And Grace isn't anywhere near the state of AMD and nVidia GPU development. Its problems are somewhat different--we know exactly what the GPUs are in terms of hardware, but nonetheless neither company is making anywhere near enough to meet a fraction of the global demand. But Grace? As nVidia doesn't know what Grace is specifically there is essentially a 0 demand globally for "Grace" for exactly that reason--what Grace is or will be is anything but clear, etc. As far as "Grace" is concerned we know almost nothing about it. Except that it presently does not exist atm, and that nVidia has announced that it intends to sell Grace, when and if ever nVidia can succeed in making *something* they will call "Grace,"--but then once they make it, what is it, then? Obviously, we don't know because nVidia doesn't really know just yet--beyond the broadest set of theoretical generalities, etc. nVidia hasn't even decided what its actual capabilities are--because nVidia doesn't actually know what the hardware will consist of--specifically! Generally, *everybody knows* what Grace will be if it is ever manufactured and sold--and that is an ARM server CPU of nVidia custom design. But that's not really saying much, is it?...;) I'm baffled that this aspect of Grace isn't crystal clear to everyone at this stage.

(Interjection--I like new stuff as much as the next guy--we *all* like new stuff! There would be something *wrong* with all of us if we didn't...;) But I much prefer *shipping* new stuff to new stuff which is still in the announcement/theoretical/embryonic design stage. I think that also applies to most people..;))

But here's some speculation and some theory of my own. I will be very surprised if the UK regulators approve the sale of ARM to nVidia--very surprised. I think nVidia's bid to purchase ARM stems from a desperation JHH has about nVidia's future beyond the next GPU--GPUs that hopefully nVidia (and AMD!) will be able to ship globally the next time they announce global availability for *anything*! It's an act of desperation by JHH, as I see it. He's trying to in one incredibly expensive fell swoop to create a rough equivalency between nVidia and AMD--nVidia would go from owning no CPU IP with an established, global market to an extensive global market and the IP of ARM--Apple, among others, would then find itself dancing to nVidia's tune--and so it's no wonder it's an appealing idea for JHH. "If you can't build it yourself, then buy what someone else has already accomplished, and try and appropriate it to yourself" etc. I'm no fan of Apple, but I can surely understand why both Apple and Microsoft, among others, vigorously oppose turning the ARM world over to JHH & nVidia.

Last, I read that there's a clause in the current purchase agreement that penalizes nVidia $1.25B if the deal collapses for any reason, apparently! So basically, nVidia's position is that even if the regulators in the UK nix the deal--which I think is all but inevitable--that nVidia pays ARM (or Softbank) one point two-five billion dollars. Also, why has nVidia signed on for an additional $750M paid to ARM for "ARM IP"--which, one would think, as these things go, that the payment for the ARM IP would be included in the ~$37B+ transfer of money and stock from nVidia to ARM/Softbank! Right? So what's the extra $750M for--bribes--kickbacks, greased palms? Even so, even if it's just a dodge to cover the funny-money siphoned off to the regulators to cover their "hard work" in approving the deal--that's a lot of money right off the top to let go in the *likely* even the UK regulators will kill the deal. This deal has so much of the UK's national security tied up in it in a variety of ways that I really wonder why JHH is taking such a risk which at least outwardly doesn't have a snowball's chance in Hades. That is why I think there's more than a bit of desperation behind the bid for ARM. And possibly more here than meets the eye--I'd bet on it.

Last, about date speculation fixing specifically on Grace deployment...not wise to count chickens before hatching, etc. There's no guarantee that when Grace is ever finalized and actually produced that anyone will actually want it, imo...;) People should remember the Larrabe debacle--that Intel had to finally kill before production every began. Speaking of dates and times, IIRC Intel was supposed to have shipped 10nm a couple of years ago, and so on. It's going to be interesting to see how all of this shakes out...lots and lots of 'announcements' everywhere...still, only genuine shipping products impress me, don't care who tried to make them.
mode_13h - Wednesday, April 21, 2021 - link
> so far neither company has been able to get close to actually making/shipping enough of those products to meet global demand for them even fractionally!

See: crypto mining.

> As nVidia doesn't know what Grace is specifically there is essentially a 0 demand globally for "Grace"

Lol. What?? I'm sure Nvidia knows EXACTLY what Grace is. It's a specialized coprocessor for some of their datacenter GPUs. It has in-built demand, because you probably won't be able to buy their next-gen DGX/AGX systems without it.

> "If you can't build it yourself, then buy what someone else has already accomplished, and try and appropriate it to yourself"

How does that not apply to practically all big acquisitions? Like Intel's purchase of Altera, Habana, & Mobileye, and AMD's purchase of Xylinx, for instance? That doesn't mean they're not good strategy. Do you expect these companies NOT to make good strategic moves, for some reason?

> So what's the extra $750M for--bribes--kickbacks, greased palms? Even so, even if it's just a dodge to cover the funny-money siphoned off to the regulators to cover their "hard work" in approving the deal--that's a lot of money right off the top to let go in the *likely* even the UK regulators will kill the deal.

Wow, such a load of BS. I'll bet it's not hard to find out what that's for, if you actually cared to know. It should be further elaborated in shareholder documents.

> People should remember the Larrabe debacle

What's your deal, man? Are you trying to short Nvidia? Good luck.
mode_13h - Wednesday, April 21, 2021 - link
> People really need to back up and get a grip

Yes. I can definitely think of at least one...
Rudde - Monday, April 12, 2021 - link
>500GB/s LPDDR5X probably means 8 channels (512bit) of memory with a transfer rate of 8000MT/s or more.
txguy - Monday, April 12, 2021 - link
LPDDR5x doesnt have native ECC support. Curious to know why they went with LP5X and not D5
Ryan Smith - Monday, April 12, 2021 - link
Energy efficiency. Plus they don't need DIMMs since it's soldered-down.

DDR is great, but it's designed first and foremost for servers where you need massive memory capacity and the ability to swap DIMMs to meet different needs. It's not very power efficient, which is why LPDDR was created.
eastcoast_pete - Monday, April 12, 2021 - link
Ryan, any mentioning of what the maximum RAM configuration will be?
Ryan Smith - Monday, April 12, 2021 - link
It has not been disclosed.
mode_13h - Monday, April 12, 2021 - link
Doesn't seem like it could be that much, if they're going to use the same card form factor.
SarahKerrigan - Tuesday, April 13, 2021 - link
LPDDR packages max out at 16GB today, IIRC, so 128GB today assuming the released mockup image is accurate and it has 8 packages - maybe 256GB by the time of release? Not great, not terrible.
Yojimbo - Wednesday, April 14, 2021 - link
256 GB per module would be 1 TB per node. The Summit supercomputer has 512 GB per node, with 6 GPUs per node instead of 4 like NVIDIA's proposed DGX-next node. The upcoming Aurora supercomputer is supposed to have about 1.1 TB of memory per node, including HBM memory. NVIDIA's DGX-next will have at least 320 GB of HBM plus whatever LPDDR they have. Again, the Aurora machine will have 6 GPUs per node instead of 4. So the compute to memory capacity ratio for NVIDIA's plan does not seem to be limited. NVIDIA's announced plan is not to replace other CPUs in all applications, but rather to create a platform for large-scale AI processing.
JayNor - Sunday, May 2, 2021 - link
Aurora will also use Optane. A dual socket Ice Lake Server supports up to 6TB of Optane, so I presume SPR will do the same.
eva02langley - Monday, April 12, 2021 - link
However when you deal with servers, security is a a must. AI will deal with a lot of sensitive information, it is a huge mistake to design a system without security in such a matter.
JoeDuarte - Monday, April 12, 2021 - link
Both LPDDR5 and LPDDR5X support ECC, at minimum link ECC.
AdrianBc - Tuesday, April 13, 2021 - link
ECC does not nees any support from the memory.
In the case of modules, i.e. DIMM, yes, the specification must include additional data pins, to be used for ECC.
However, LPDDR is not used in modules and nothing prevents the board designer to add extra chips for ECC.

The real reason why many people are worried about ECC and DDR5 or LPDDR5 is that it is not yet clear for what values of data bus widths the memory chips will be available, besides those tha implement a 32-bit channel width.

If only 32-bit wide chips would be available, then either ECC would become more expensive, because of the need of adding a more expensive extra chip than necessary, or ECC would become slower, if in-band ECC would be used, like in some embedded Intel CPUs and in some GPUs.
dotjaz - Tuesday, April 13, 2021 - link
What ECC support? ECC is a memory controller feature, not memory chip feature. DDR5 have transparent on-chip ECC but that's not "native" ECC because for the system, it's not even ECC.
KFreund42 - Monday, April 12, 2021 - link
A huge move that many predicted, however, I am surprised that they will use off-the-shelf Neoverse cores. That says a lot about Arm's roadmap!
eastcoast_pete - Monday, April 12, 2021 - link
Those N2 cores aren't slouches, plus I guess they'll use a sizeable number of cores. That scalability is also a significant upside.
JasonLD - Tuesday, April 13, 2021 - link
I think it will probably use Poseidon cores, not N2.
mode_13h - Monday, April 12, 2021 - link
ARM's "off-the-shelf" cores are a lot faster than their custom Carmel cores.
eastcoast_pete - Monday, April 12, 2021 - link
Do I interpret this correctly; the whole setup will use LPDDR5X? If so, it would open up the possibility of both CPUs and GPUs using TB of RAM; might well be worthwhile, as the ability to use memory far bigger than what is possible/affordable with VRAM probably more than makes up for slower memory speeds. Plus, working memory tends to have much lower latency than VRAM.
Ryan Smith - Monday, April 12, 2021 - link
LPDDR5X for Grace, HBM for the GPU.
eastcoast_pete - Monday, April 12, 2021 - link
Thanks, that makes more sense. I was wondering where NVIDIA would get the LPDDR5X chips from to allow for a wide-enough bus to reach TB per second speeds.
mode_13h - Monday, April 12, 2021 - link
Just look at the slide, where they talk about the GPU having 8 TB/s of memory bandwidth.
CiccioB - Monday, April 12, 2021 - link
It's aggregate bandwidth, that is the sum of all 8 GPU.
CiccioB - Monday, April 12, 2021 - link
It's 4 GPU, not 8, sorry
mode_13h - Monday, April 12, 2021 - link
They should've called it Hopper. That sounds a lot more lively and engaging than "Grace". Plus, it relates nicely to the way data would traverse hops in the mesh.
SarahKerrigan - Monday, April 12, 2021 - link
I wouldn't be surprised if that generation's GPU is Hopper. (Unless they really end up naming it "Ampere Next", which I do not imagine will be the case.)
DanNeely - Monday, April 12, 2021 - link
Hopper has been rumored as a future NVidia GPU for several years.
mode_13h - Monday, April 12, 2021 - link
Grace and Hopper -- the thought *did* cross my mind, but it's almost too cute!

If we're going down that road, how about if they named the CPU "Pierre", the GPU "Marie", and the mezzanine card or platform "Curie"!
mode_13h - Monday, April 12, 2021 - link
> not all workloads are purely GPU-bound, if only because a CPU is needed to keep the GPUs fed.

In fact, some layer types are actually run on the CPU.
mode_13h - Monday, April 12, 2021 - link
> By 2023 NVIDIA will be up to NVLink 4, which will offer at least 900GB/sec of bandwidth between the SoC and GPU

Really? Today, A100 GPUs have an aggregate of just 300 GB/s (per direction) between themselves and all their peers. So, I'm not sure they're proposing to suddenly go to 450 GB/s to a single CPU. But, maybe... since the CPU's claimed memory bandwidth is 500 GB/s, it wouldn't be absurd to have such a fast connection.
CiccioB - Monday, April 12, 2021 - link
500GB is the aggregate memory bandwidth between all CPUs (there are 4 of them in the slide)
mode_13h - Wednesday, April 14, 2021 - link
While 500 GB/s does sound high for a single CPU, 125 GB/s sounds low for a server CPU in 2023. I guess this is a bit of a special case, being on a card with the GPU, but it's specifically one where they're trying to optimize bandwidth.
CiccioB - Wednesday, April 14, 2021 - link
Bandwidth requirements depends on the computation capacity of the chip and the amount of caches it has (other than the type of computations they are performing).
Here we are not speaking of 125GB/s for a 96/128+HT core (like those AMD will create next year). These chips have a much smaller amount of cores and are not the one that will process all the data as actual x86 architecture have to.
You are just evaluating this new system architecture with the same criteria you evaluate the nowadays server, which are based on the absolute centrality of the single CPUs to carry out all they task. Where all resources are connected to the single CPU. From network to storage to external accelerators, like GPUs.
That's why you say "a CPU server in 2023". This is not "a CPU server", this is just a (small) slice of the computing CPUs on 2023 server, where more of them are going to work in parallel to do the work.

For what I have understood here, the idea is to create a system based on "chiplets" but not constrained on a substrate, but on the entire motherboard. More chips working in parallel, all with their own local resources communicating through a very fast bus. You can see the analogy with AMD's chiplets-in-a-package architecture.
And this would allow for a linear scaling of performances exactly as chiplets on a substrate, provided you give them enough bandwidth (and energy) for everything. Bus as their number increases, it becomes difficult to feed them. This is the next scaling solution.
mode_13h - Thursday, April 15, 2021 - link
Your whole discussion of the CPU's bandwidth needs misses the point, completely. The problem they're attempting to solve with this architecture is to reduce bottlenecks in GPUs' access of main memory. So, the CPU's memory bandwidth better be large, not so much for the sake of the ARM cores, but more by way of acting as a bridge, to that memory pool, for the GPUs.

Sometimes it helps to take a step back and think, before launching into these verbose posts.
JayNor - Sunday, May 2, 2021 - link
pcie5 is 32GT/sec.
SPR has 80 lanes so 80x32x2/8(bidirectional)=640 GB/sec
If PCIE6 happens in 2023 timeframe, that doubles, though requiring PAM4 transceivers.
Run cxl on top of that for the biased cache coherency.
So, why go to NVLink?
Raqia - Monday, April 12, 2021 - link
The uncore is underappreciated in general, and their ownership of Mellanox will pay dividends in the plumbing of this server design.
mode_13h - Wednesday, April 14, 2021 - link
It's funny to me how Nvidia has been building GPUs with a couple hundred SMs and a couple TB/s of memory bandwidth, yet somehow they need outside expertise to work out the interconnect fabric for a CPU with a fraction of that bandwidth? I have trouble seeing that.
Zizy - Monday, April 12, 2021 - link
AMD and Intel are switching to PCIe 5 so total bandwidth should be comparable. But Grace still sounds interesting because it should have much better bandwidth to a single GPU whereas PCIe 5 x16 is still mere ~60GB/s bidirectional (120 cumulative). Also, considering all the "cumulative bandwidth" numbers, I wonder if NV will keep upload/download symmetry or move to say 10 links to the card and just 2 links from the card.
mode_13h - Wednesday, April 14, 2021 - link
Can PCIe cope with mesh networks?

BTW, in 2023 you'll likely see PCIe 6 and CXL 2 products introduced.
CiccioB - Monday, April 12, 2021 - link
I was just wondering this since quite a bit of time, now.. how long will nvidia take to put some ARM cores directly ONTO the GPU board (first) and then directly into the GPU die?
That could easily off load main CPU (which could then be weaker and support more GPU at once with only limits in total I/O) from a lot of tasks and better exploit the GPU they belong to.
SarahKerrigan - Monday, April 12, 2021 - link
Once upon a time, there was a lot of talk about Denver cores being integrated into at least some Maxwell family products. I don't know if it was ever an actual program in Nvidia, and if so, what ended up happening (though Denver performance was not amazing, and that may have played into it.)
mode_13h - Wednesday, April 14, 2021 - link
Tegra X2 has Denver cores and Pascal GPU. That's probably what you heard about. It launched in like 2016.
SarahKerrigan - Wednesday, April 14, 2021 - link
It's not. :) I was at the X2's Hot Chips presentation; I remember it in considerable detail, and would not mix that up.

For a while, the rumor mill pointed heavily to Denver cores being integrated into either first-gen or second-gen Maxwell dGPUs to avoid the roundtrip back to the host for compute loads. Looking back about the articles about it, it looks like it was a combination of false rumors and probable misinterpretation of an Nvidia exec's comment that "the Maxwell generation will be the first end-product using Project Denver" (which was not the case, but was indicated by roadmaps for a bit.)
mode_13h - Thursday, April 15, 2021 - link
Some of the Tegra chips shipped in multiple flavors that varied in terms of whether they used ARM-designed cores or Nvidia's custom cores, such as the K1

However, according to this, the X2 featured 2x Denver + 4x A57 cores: https://en.wikipedia.org/wiki/Tegra#Tegra_X2

If you need further proof, here's a discussion of 3rd party developers actually comparing the performance of the Denver cores to the A57s: https://forums.developer.nvidia.com/t/denver-2-vs-...

Anyway, Tegra X1 was Maxwell-based. So, maybe the plan was to ship it with Denver cores, but they weren't done in time. One way or another, I really think those comments must've been referring to Tegra.
michael2k - Tuesday, April 13, 2021 - link
Tegra was their integrated CPU/GPU part. I imagine if they wanted to flip the script and embed 8 CPU and 1024 CUDA cores per Big.Little they could.
CiccioB - Tuesday, April 13, 2021 - link
I was thinking for those CPU cores as specialized "serial cores" for the GPU. Something like a core for each GPC, a are PolyMorph Engines.
Each of those cores could be a "master" manager for some work (like resource handling, thread manager, wave organizer in spite of fixed HW schedulers) or be a slave core like an RT core, for example. Or have two for both tasks.
That could improve GPU execution of serial code (which they are weak to) and better management "on die" instead of CPU drivers (that are on the other side of the PCI bus with limited bandwidth and high latency).
mode_13h - Wednesday, April 14, 2021 - link
Grace is their solution. Upgrade the CPU-GPU interconnect not only for better cooperation, but also giving the GPU better access to CPU memory.

Plus, it's not like there's die space to spare, on a GPU.
eva02langley - Monday, April 12, 2021 - link
This was one hour and half of pure garbage. Nvidia saying they will do things, and in the end they are just selling GPUs.

-AV are a long way to go and VIDEO is NOT the answer
-BMW video game was nothing more than a pet project joke
-Nvidia is creating a CPU... but use AWS Gravitron CPU... which are nothing beside Sagemaster run on for AI...
-Maxine.... euuhhhhhh.... no coments....

1 hour and half of dirt at the wall expecting something stick... and the dumb stakeholders bite on it. It is disgusting.
anonomouse - Monday, April 12, 2021 - link
...who hurt you?
CiccioB - Tuesday, April 13, 2021 - link
simply... envy
gescom - Tuesday, April 13, 2021 - link
This.
CiccioB - Tuesday, April 13, 2021 - link
Ahahaha, it appears some fanboy here got hurt by a presentation of future <b>revolutionary</b> products that are going to make AMD (and Intel) future "20% better performance at each generation" something pitiful.

They sell GPUs, SoCs, Network devices and... this may surprise you... also SW!!!
And they just announced they are going to sell CPUs. Yes. Pure CPUs.
Is all this thought to support their core business GPUs? Of course it is.
Is all this allowing them to create better products? Of course it is.
Is all this going to further hurt x86 market, which already lost the race for the HPC market many years ago, like a Thor Hammer? Of course it is!

And if they can really deliver Grace + Ampere Next in 2023, they will just rollover (like a bulldozer (cit.)) to all AMD's and Intel's proposals for their hexascale super computer (which are sill quite far away to be achieved, SW support included).

In 2023, if they maintain what they have on that roadmap, they can finally demonstrate that ARM CPUs can definitely get the spot of x86 CPUs in any high computing work, making 32, 64, 128, 256 or whatever the number of x86 cores in a single package (with related 200, 300, 400, 500W power consumption and still pitiful bandwidth per core) not really interesting anymore. The same work will be done with less power consuming HW and much better balanced/parallel resources.

They can definitely destroy the CPU centric architectures where everything has to be handled by the main CPU cores we have seen so far. Work distribution (with linear scale performances) can be achieved, making beefy centric CPUs I-do-all-the-management-work-with-bandwidth-issues useless.

They are developing what they have invested into: parallel computing device connected by high bandwidth networking. That's the meaning of the 7 billion buy of Mellanox.

If you have not understood this, yes, Nvidia's presentation is just waporware.
Much better AMD and Intel's presentations about how they are going to get 20% more performances from their architectures while using the latest most advanced, expensive and production limited PP.

See ya in the future.
RanFodar - Tuesday, April 13, 2021 - link
Okay. You can shove me in the face that by 2023 ARM will take over the CPU market...

But aside from speculation, do you really believe that at that time, Intel and AMD will be bulldozed by a round from Nvidia's first server CPU? I know ARM has a lot more advantage than x86, but don't tell me that their launch will take over the market with their first swing. Besides, we don't really know what's the future for Intel and AMD anyway. They still have a lot to go, and ONLY time will tell.
CiccioB - Tuesday, April 13, 2021 - link
Wherever have you read "at launch time ARM will take over the entire CPU market"?

If (and if) Nvidia maintains its promises on these CPUs they'll demonstrate that ARM has all the real potential to become a main player in the high computing market and has the potential to gather many more developers and development resources. The same thing that Apple just did with its M1, where they just broke up the exclusive support to x86 applications by developers. Now developers have to think about creating application for both worlds, which is already a great achievement which goes well beyond the real performances the M1 CPU have. Even Adobe has converted its big tanks to ARM, something they haven't done for Windows on ARM, for example.
Microsoft is not Apple and I have had always doubts they really wanted to go against Intel promoting and developing their Windows for ARM, but Apple just did that demonstrating that x86 is no more the king or the only actor that is for a powerful machine.

Nvidia could gather the same spirit with the server market (where ARM is used only in private and custom contests) and with more and more developers that will write optimized code (and framework, and libraries and anything needed) for HPC market, x86 could soon loose the granitic position it has been keeping during these last 20 years. Position that has given them a great inertia as lots of SW is written and optimized just for x86.

Nvidia is taking a different route than those of any previous CPU designers, where the CPU is not the center of everything, but just a node of a bigger mesh where it has little importance with respect to the whole. Communication and data distribution now are the important aspects, not just how many core a die encloses and how fast it can elaborate data by itself.
This for sure could be thought and achieved only by a company that has not directly interest only in the CPU market like Intel, AMD, but also IBM have (had).
Parallelism at system architecture level, not only just inside a die (be it a CPU or GPU).
This is the revolutionary vision Nvidia has just disclosed. And they have been preparing for this with products on the needed parts: CPU, GPU, faster buses and last but not least, networking. They are in the position to dwarf (x86) CPUs importance in distributed servers.
Intel and AMD have to be quite worried about this new scenarios as this not only put them in second line in the importance to have a real scalable system, but it opens up to new actors that will be more than willingly to get their fair share of the big cake Intel (and just lately as a hope to improve its dire situation, AMD).
More money diverged from x86 development = more money invested in creating alternatives.

Nvidia will support them all. Their aim is the more they can, and be it on x86 or ARM solutions it is the same for them. But they know this move is going to weaken its historic rivals, especially AMD which gets a double hit (CPU and GPU, where the latter have not been in professional market for years and they just started developing something for it now).

And as I said at those times when Intel lose against ARM in the mobile race, saying that that would have brought much more investments to other fabs making them competing better and better (at those time Intel was using 22nm against ARM @45nm or 32nm for the bold designers like Qualcomm) making Intel loose its advantage of the PP compensating the (awful and obsolete) x86 architecture, now with TSMC able to bake dies that are better then those made by Intel and with more and more CPU designers using architectures that are much more efficient, it becomes quite difficult for Intel to continue maintain its monopoly, despite the fact that they have enormous engineering capacity and still can deliver better products with the use of advanced bus, in-die connections and packages other cannot benefit but in next years.
Silver5urfer - Tuesday, April 13, 2021 - link
SO much of FUD and BS.

How many comments are here singing this whole ARM nonsensical crap. Esp for all you guys I have one question, does ARM improve "YOUR" computing abilities over x86 ??

No it doesn't you cannot find a DIY machine of ARM, nor a high performance PC, M1 Mac got whipped by Ryzen 4000U series processor. Once the Zen 3 based product with low TDP range launches it's going to be shoveled hard. And Alder Lake is where Intel bet more money for laptops since it's 10nm for one and two it's big little trash on top of Intel Wifi + 5G + thin and light with Win10 out of box compat. Intel targeted Tiger Lake 10nm over RKL because of many reasons in that the volume and profits are definitely one.

And now, does Graviton2 can be owned by indoviduals ? nope. Fujitsu A64FX ? nope. Altra Ampere processors ? nope. Marvell ? nope. So what does ARM provide you guys to shill so damn hard and spell doom about x86. On top in Android land, the OEM controls everything top down stack. You don't get blobs from the OEMs the HW cannot be upgraded nor modified in any part. Plus BL locks on top you don't even own the HW to any extent, with centralized Appstores and control freak Google with Filesystem limitations what do you own actually ?

Yeah, nothing. Remember there's no full proper market for the big OEM companies like Dell and HP, SuperMicro, Gigabyte, Lenovo that make the Server Racks for ARM processors on top talk Volume ARM gets crushed to oblivion. Centriq was the last that was purported to be revolutionary. Recently Ampere processor and now people are heralding upcoming Microsoft server CPU based off ARM and that stupid incapable Google's Whitechapel ARM processor for smartphone and their own ARM processor for servers. Every damn thing is centralized and they simply want to save money, but why do you guys love it so hard.

With x86 you can own a mini server beast, from all the old Xeon parts and user Racks and etc in the market where people make complex Homelabs and what not to the latest Threadripper workstation professional grade processors which have insane PCIe lanes and power in your damn hand where you can install numerous OSes and VMs. Yeah people make them with Raspberry Pi too, it's superb for projects but it's not going to replace an x86 machine. We talk all day even more on this aspect alone and ARM will not come out leading anywhere.

"Grace CPU OMG, ARM is going to take over the planet, and we are going to moon" right ?
mode_13h - Wednesday, April 14, 2021 - link
> And now, does Graviton2 can be owned by indoviduals ? nope. Fujitsu A64FX ? nope. Altra Ampere processors ? nope. Marvell ? nope.

The general public can buy Ampere Altra servers from Gigabyte and Fujitsu A64FX from HPE. There's even a company selling Altra-based workstations.

> in Android land, the OEM controls everything top down stack.

Why confuse ARM with Android? True fact: they even shipped Android for x86!

Also, ARM runs on regular Linux, from little R.Pi to the big servers you mentioned.

Your whole Android tangent appears to be a red herring.

> "Grace CPU OMG, ARM is going to take over the planet, and we are going to moon" right ?

Oddly, I agree with this point. IMO, what cores Grace uses are one of its less interesting aspects, and don't have any real bearing on ARM's broader trajectory in the server market.

That said, the numbers don't lie: ARM's growth in the cloud is substantial and only looks to be accelerating. The decline of x86 will be the one of the big computing stories of this decade.
Silver5urfer - Wednesday, April 14, 2021 - link
Okay I didn't knew A64FX could be bought but a $50K A64FX Rack from HPE is considered as Obtainable for General Public ? Look at what I mentioned. I asked people "you" and Homelab. That's where ARM question comes. What do you feel like x86 HW is lacking. Looking at Homelab and NAS SOHO. EPYC Rome can be really purchased and run with vast ecosystem on top even if they have deep pockets. Usually Homelab crowd uses used Xeon parts. Which speaks for itself.

Android x86, yes they did. But my point was the ARM HW which is already in circulation and owned by many can it run anything else or used as a Computing OS as in Linux or Windows Server ? They do not. Qcomm and Exynos decide what consumer gets. And why would even x86 HW run Goolag Locked Android ? I knoe F Droid exists but its more of a hobby type.

Pi runs Linux and so do all ARM. Pi is super customizable which is not just great but stellar. However the power it has cannot compete. So it's more of a fun project with Educational focus options too than a PC usecase or Homelab or a Render rig. But the thing is how do they Improve your experience over x86 that all here shill for..

As for the ARM is the future race. The latest market trend is AI. ARM is saying they are building CPU and HW around it with next decade. AMD said x86 is their future and with Xilinx M&A they are going to put that to use with FPGA which will change more. AMD is confident on that. As for Intel, will they kill their own x86 for ARM. They are targeting thin and light ARM with Big little approach.

ARM is increasing more because of AWS not others since Amazon simply wants to save money and everyone wants vertical control. Control freaks. Will see how far it will go ofc.
mode_13h - Thursday, April 15, 2021 - link
> a $50K A64FX Rack from HPE is considered as Obtainable for General Public ?

Affordability and availability are two different things. A64FX is a specialized chip that will only really benefit a few with specific workloads. It's an important milestone in ARM's progression, but I see it as a digression from the broader point.

The more relevant data point, for people with the deep pockets to afford 64-core workstations and servers, is really Altra. However, that's not me. I am upgrading my home server to a Ryzen 5000, in the coming months.

Even if I could afford an Altra machine, I'd probably wait at least until the N2-based CPUs are out, before moving to ARM. Altra/N1 are basically A76 cores. But, the bigger issue for me is that I still care about single-thread performance and really don't need so many cores.

> Usually Homelab crowd uses used Xeon parts

BTW, the other server-ish machine in my homelab is an i3, because most of them enable ECC RAM. I also have a E5 Xeon-based workstation.

> What do you feel like x86 HW is lacking.

It's a question of competitiveness. ARM offers better performance-per-area (PPA) and therefore more performance per $. It's also more energy-efficient, due to having a simpler instruction encoding and more architectural registers. And this gives it a slight edge on performance, due to enabling a wider decoder.

For mobile and datacenter, energy efficiency is key. Also relevant is cost. And by offering better PPA, you can afford to fab it on newer nodes, since ARM chips with the same core counts and IPC will be smaller than x86 counterparts. And newer nodes confer an additional performance and energy-efficiency advantages.

So, it's really a case where a whole lot of benefits are derived from a few, key aspects of the ISA.

> Qcomm and Exynos decide what consumer gets.

Okay, but that's an issue with Qualcomm and Samsung, not ARM. Several ARM SBC's run generic desktop Linux distros built for AArch64.

> But the thing is how do they Improve your experience over x86 that all here shill for..

I don't think anyone is saying that there's yet an ARM-based answer to the mainstream PC. I'm certainly not.

I guess the new Mac Mini could be a good option for the sort who are content to use a NUC-class machine, but I'm allergic to Apple for so many reasons that even if Linux is fully-supported on those things, I wouldn't even touch one.

> AMD said x86 is their future

They have to *say* that, even if they're already deep into ARM, RISC V, or whatever. To announce an ARM-based initiative would be an acknowledgement that they're not fully invested in x86. It would create doubt in the minds of both existing and prospective customers about how long AMD will continue to offer leading x86 server products. So, maybe they go with Intel, instead.

Also, it's free advertising for ARM. So, maybe customers would just opt for an ARM CPU that's available today, like Altra, instead of waiting to see what AMD comes up with. If they feel like ARM is an inevitability for them, they might just want to get it over with and embrace that ecosystem.

So, AMD needs to wait until either the market is at a tipping point, or until they're nearly ready to launch their own new ARM chips, before they'll announce. And they're probably not going to jump into the ARM race, while they still have such growth momentum with EPYC. Again, they don't want to confuse the market or canibalize that growth.

I'm not saying it's 100% that AMD will go with ARM, but it's got way more traction than anything else. The main source of uncertainty, in my mind, is Nvidia's ownership. That's got to make a lot of would-be adopters very nervous.
CiccioB - Wednesday, April 14, 2021 - link
I can't understand your point.
I can't understand if you are defending x86 point of view because you can't see things beyond today and tomorrow is already too far for you.
You are just comparing today x86 situation against ARM, just when ARM is getting out from the mobile market it has been for years.
You are comparing an architecture with tenth of years of support and optimizations against one that has just born.
And you say that the new one has nothing good to offer because the old one is better.
Better in what? Consuming energy while, yes, having more cores/frequency/bloated dies and point to a 20% improvement at each new generation?

Despite the fact that Windows for ARM exists, the only real thing that makes the difference is the SW support. Not the HW performances.
Under the HW point of view, your considerations are quite biased and really useless: M1 has 4 high performance cores, Ryzen 4000U has 8 core+HT.
The power consumption are also quite different between the two. And guess who's better?

But apart actual benchmark results, what you can't understand is that a breach opened.
For sure the water spilling from it today is not the same of the river under the dam.
But this breach is not to be ignored, as it was not to be ignored the one that opened when ARM won the race in mobility market.
That allowed for more money on alternative PP different than Intel's (which was the only enjoying its scaling market business), and that provided the situation we are now, where Intel has lost its PP leadership.
This situation may fix the last brick needed for a platform to become popular: that is SW support.

What have you not understood that before M1 Adobe never made a version of their suite that was not x86? And I've seen may other competitors are creating their M1 version of their graphical suits (see Affinity suite).
What have you not understood that with M1 developers that want something published in the Apple universe have now to develop for ARM as well (and in few months when even the biggest Macs shifts to ARM based CPUs probably only for ARM)?
Can't you understand what this means? It means that more and more basic libraries, algorhitms, optimizations and choices will be made for ARM architecture(s).
Today you only have professional (or mobile and embedded) developing framework for ARM. In few time you could have a suite like Visual Studio and all relative libraries being able to create a fully optimized ARM executable that runs for Linux or Windows and iOS.

May this not happen? Of course it may not. Though the Apple ARM market will remain nonetheless.
But for this not to become a bleeding nightmare for x86 historic (and last pillar) SW support, Intel and AMD have really to work hard. Not surely come up with those power sucking, pizza's size dies just for 20% more performance (and only in some tests) that the year before.

You think that everyone needs 8 cores, 16 thread and 32GB of RAM to do to their jobs (especially at home) to say that ARM is not a threat?
That goes againt the success of Chromebook (which can be ARM too).
It's just a question of SW development and support before future ARM PCs can run Windows with whatever office suite they want + Abobe or any other photo retoucher + Blender + CAD + development tools and finally... <b>games</b>.

If next gaming consoles will switch to anything not x86 based, x86 down track will be definitive, and I doubt Intel or AMD can supply SoC at cheaper price that AMD is doing now to make designers interested in their solutions.
It will prefigure a dumping scenery.
In ARM becomes wide spread (independently of the actual performances against 16 core x86 which is owned by 0.01% of PC users) future can be less easy for x86 players.

I just think that we will hear in a couple of years (or at most three) some more announcements by Intel (and I belive first by AMD) of opening towards ARM design or even embedded compatibility (that is x86 cores that can run ARM code as well, just to not say that x86 has been completely, and I may say finally, rendered obsolete).
GeoffreyA - Thursday, April 15, 2021 - link
CiccioB, even if ARM takes over, many of us enthusiasts will keep alive the memory of x86 in our hearts and nobody can erase that, not Apple, not ARM, not the latest, slick, up-to-date thing.
mode_13h - Thursday, April 15, 2021 - link
> x86 cores that can run ARM code as well

I don't see it. Apple achieved something like 70% of native performance with x86 code in emulation. That's good enough for most.

Meanwhile, if you look at what would be involved in building hardware that natively executes both AArch64 code and x86-64, it would come at a significant disadvantage to one or the other, as well as an increase in costs. Once you consider this, emulation starts to look *really* attractive!
Jimbo123 - Monday, April 12, 2021 - link
Intel has been prepared for ARM for the last 20 years, Fyi. They are ready long time ago.
michael2k - Tuesday, April 13, 2021 - link
Funny, they only just failed last year, when Apple switched from x86 to ARM, and previously in 2007 when they turned Apple down to design the iPhone.

Maybe they forgot to start competing because they assumed they were going to win?
RanFodar - Tuesday, April 13, 2021 - link
What is your point that you're trying to make, then? That Intel will not compete today and beyond?
CiccioB - Tuesday, April 13, 2021 - link
Probably they will compete (using other aspects, like for example promoting their own GPU tech pervasive inside each of their CPU) but for sure they were not ready for this and it will take time for them to recover. Whatever they decide to do today it will take 3 or 4 years to come to light. And meanwhile they may have lost the capacity of leading the market where they want as they do now.
And if this has caught them by surprise (and I think it has seen the quite different approach Nvidia is taking that requires them to completely review all their plans if they want to contrast it) it will take even more time to decide a strategy, trying to go on with whatever they have now and have already planned for 2023, the year they should be able to finally make something with their 7nm PP.
mode_13h - Wednesday, April 14, 2021 - link
Intel didn't forget to compete. They tried t enter mobile, but Intel made the mistake of trying to use x86 as the solution to all problems: from mobile to IoT and HPC. And it failed at each one.

With their building of dGPUs and purchasing AI accelerators and FPGAs, Intel finally seems to have gotten over the idea of x86 everywhere.

In the next few years, they're going to have to provide a post-x86 vision, and I think they know that. They just can't do it too soon, or else they risk spooking customers. I'm sure they're hard at work on it, though.
Bagheera - Tuesday, April 13, 2021 - link
did Intel "prepare" to lose the CPU race to
Ryzen because they stuck in 10nm hell?
JayNor - Sunday, May 2, 2021 - link
Intel persevered and so has fabs with a working 10nm process while AMD punted and is in line behind Apple at TSMC, waiting for wafer starts. So, who is ahead?
m00bee - Tuesday, April 13, 2021 - link
I work on giant telco company that US ban. I remember when i attend training ,around 2018. My trainer said, that Intel get most benefit when telco equipment get virtualize in the cloud. Later the ban struck, and our company need to create substitute for Intel processor. ARM selected and the rest is history. I guess intel just have bad luck, bad timing, some misstep. But i still hope it can get back and put some fight to AMD and ARM, we need competition.

Sorry for my broken English
mode_13h - Wednesday, April 14, 2021 - link
Your English is quite understandable.

Thanks for posting! Always good to hear other perspectives!
mitox0815 - Tuesday, April 13, 2021 - link
AI this, AI that. Where's my RTX 4090 ti Super I can't buy, nVidia?
KimGitz - Tuesday, April 13, 2021 - link
It is great seeing so much competition and innovation in the semiconductors business.

I’m predicting this Grace core will be based on ARM’s Poseidon Platform which is ARMv9 (Neoverse V2)

I think even though Nuvia was bought by Qualcomm and shifted their focus from Servers to Ultraportables, their Phoenix design would have outperformed what NVidia, Intel and AMD will have to offer.

It will be interesting to see how CXL will compete with NVLink, InfinityFabric and PCIe 6.0.
mode_13h - Wednesday, April 14, 2021 - link
CXL uses the same PHY as PICe 5 and CXL 2 shares it with PCIe 6, if I understand correctly.

So, I think CXL is designed to supersede PCIe more than compete with it. Intel is firmly behind both.
JayNor - Sunday, May 2, 2021 - link
I don't believe CXL 2 is tied to pcie6. It could run on top of pcie5. It just needs the pcie5 and above feature that enables the negotiation to substitute its protocol.
Oxford Guy - Tuesday, April 13, 2021 - link
'Previously NVIDIA has worked with the OpenPOWER foundation to get NVLink into POWER9 for exactly this reason, however that relationship is seemingly on its way out, both as POWER’s popularity wanes and POWER10 is skipping NVLink.'

Why?

I would think that, with waning popularity, that design's designer would try to hold onto any leverage it has for relevance.

Was Nvidia charging some sort of fee to implement it?
mode_13h - Wednesday, April 14, 2021 - link
You never saw POWER CPUs in a DGX system, did you? Most of Nvidia's customers don't *want* POWER CPUs.

I think it was mainly something they did for some HPC customers, but POWER never really seemed to gain traction from it. So, it's not surprising to me if there was be a lack of interest on both sides.
cskuoh - Wednesday, April 14, 2021 - link
Hi, can anyone tell me where to find the reference for the following?
"64 module Grace+A100 system (with theoretical NVLink 4 support) would be to bring down training such a model from a month to three days."
mode_13h - Wednesday, April 14, 2021 - link
What are you asking for, exactly? The only source on that you're going to find is Nvidia's presentation. As these products don't yet exist, that was merely a projection they made.
EthiaW - Thursday, April 22, 2021 - link
They said something about the acquire matter were going to be dedisclosed in the event. Nothing actually came out, we can confidentiality say the deal must have been blown by now as expected.

NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems

Post Your Comment

119 Comments

Back to Article

SarahKerrigan - Monday, April 12, 2021 - link

gescom - Monday, April 12, 2021 - link

mdriftmeyer - Monday, April 12, 2021 - link

SarahKerrigan - Monday, April 12, 2021 - link

eva02langley - Monday, April 12, 2021 - link

SarahKerrigan - Tuesday, April 13, 2021 - link

eva02langley - Tuesday, April 13, 2021 - link

SarahKerrigan - Tuesday, April 13, 2021 - link

MetaCube - Wednesday, April 14, 2021 - link

mode_13h - Thursday, April 15, 2021 - link

mode_13h - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

SarahKerrigan - Monday, April 12, 2021 - link

name99 - Monday, April 12, 2021 - link

SarahKerrigan - Monday, April 12, 2021 - link

eva02langley - Monday, April 12, 2021 - link

eastcoast_pete - Monday, April 12, 2021 - link

SarahKerrigan - Tuesday, April 13, 2021 - link

silverblue - Tuesday, April 13, 2021 - link

SarahKerrigan - Tuesday, April 13, 2021 - link

mode_13h - Wednesday, April 14, 2021 - link

Santoval - Tuesday, April 13, 2021 - link

mode_13h - Wednesday, April 14, 2021 - link

JayNor - Sunday, May 2, 2021 - link

yetanotherhuman - Tuesday, April 13, 2021 - link

CiccioB - Tuesday, April 13, 2021 - link

Linustechtips12#6900xt - Tuesday, April 13, 2021 - link

Linustechtips12#6900xt - Tuesday, April 13, 2021 - link

CiccioB - Tuesday, April 13, 2021 - link

Linustechtips12#6900xt - Monday, May 3, 2021 - link

Qasar - Tuesday, April 13, 2021 - link

ChrisGX - Tuesday, April 13, 2021 - link

Linustechtips12#6900xt - Tuesday, April 13, 2021 - link

mode_13h - Wednesday, April 14, 2021 - link

dotjaz - Monday, April 12, 2021 - link

Yojimbo - Monday, April 12, 2021 - link

ChrisGX - Tuesday, April 13, 2021 - link

Yojimbo - Tuesday, April 13, 2021 - link

WaltC - Wednesday, April 21, 2021 - link

mode_13h - Wednesday, April 21, 2021 - link

mode_13h - Wednesday, April 21, 2021 - link

Rudde - Monday, April 12, 2021 - link

txguy - Monday, April 12, 2021 - link

Ryan Smith - Monday, April 12, 2021 - link

eastcoast_pete - Monday, April 12, 2021 - link

Ryan Smith - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

SarahKerrigan - Tuesday, April 13, 2021 - link

Yojimbo - Wednesday, April 14, 2021 - link

JayNor - Sunday, May 2, 2021 - link

eva02langley - Monday, April 12, 2021 - link

JoeDuarte - Monday, April 12, 2021 - link

AdrianBc - Tuesday, April 13, 2021 - link

dotjaz - Tuesday, April 13, 2021 - link

KFreund42 - Monday, April 12, 2021 - link

eastcoast_pete - Monday, April 12, 2021 - link

JasonLD - Tuesday, April 13, 2021 - link

mode_13h - Monday, April 12, 2021 - link

eastcoast_pete - Monday, April 12, 2021 - link

Ryan Smith - Monday, April 12, 2021 - link

eastcoast_pete - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

CiccioB - Monday, April 12, 2021 - link

CiccioB - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

SarahKerrigan - Monday, April 12, 2021 - link

DanNeely - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

mode_13h - Monday, April 12, 2021 - link

CiccioB - Monday, April 12, 2021 - link

mode_13h - Wednesday, April 14, 2021 - link

CiccioB - Wednesday, April 14, 2021 - link

mode_13h - Thursday, April 15, 2021 - link

JayNor - Sunday, May 2, 2021 - link

Raqia - Monday, April 12, 2021 - link