Comments for NUVIA Phoenix Targets +40-50% ST Performance Over Zen 2 for Only 33% the Power

NUVIA Phoenix Targets +40-50% ST Performance Over Zen 2 for Only 33% the Power

by Dr. Ian Cutress on 8/11/2020 1:00 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

43 Comments

Back to Article

quorm - Tuesday, August 11, 2020 - link
Lol at that "graph". Hope they can make it real.
Spunjji - Wednesday, August 12, 2020 - link
I mirror this sentiment. It'll be interesting to see how the rest of the market looks when they have a product to show.
eoerl - Tuesday, August 11, 2020 - link
it looks to me like the focus on a single core is a little dishonest, Intel/AMD designs need to scale up to 64 cores while Apple/Qualcomm is 2-4 big cores, the presentation from Nuvia is not super clear on that front. Remove some of the design constraints and doing better it almost granted, but it needs to be presented as a sidestep then, not a complete overtake. The curves read "look, it's possible to do better, we'll be there" while conveniently hiding a couple of parameters (core transistor count being another one, which will matter for competitiveness). Any competition is good and I hope that Nuvia delivers on that, but right now it looks like there are missing parts in the argumentation
Jaianiesh03 - Tuesday, August 11, 2020 - link
Its an architectural leap you could say but its not that impressive. In 18 months the A15 would be out and considering the A14 lands at about 1650 in GB5 , so A15 will probably land at around 1950(20% improvement), so commendable work but apple would be right on their heels
Jaianiesh03 - Tuesday, August 11, 2020 - link
By the can someone clarify, the heading says 40-50 percent whereas the article says 50-100 percent lead over zen2. Because if its just 40 to 50 percent lead over zen 2 in 18 months apple has them beat already
Wilco1 - Wednesday, August 12, 2020 - link
How does that matter? We're talking server chips. Their main competition is next generation server chips from Marvell, Ampere, AWS etc, and maybe AMD if Zen 3 turns out to be better than expected.
Kamen Rider Blade - Tuesday, August 11, 2020 - link
But even then, you have to compete against IBM's OpenPOWER and POWER10 looks pretty damn good.

https://en.wikipedia.org/wiki/POWER10

Don't forget Cavium with it's ThunderX# line.

Ampere eMAG line won't sit on it's laurels and will have a chip refresh at some point.
anonomouse - Tuesday, August 11, 2020 - link
Fairly sure that if you were to plop Power9 in this chart, you'd have to extend the graph a few miles to the right.
name99 - Tuesday, August 11, 2020 - link
POWER10 is great if you have lots of IO or need lots of throughput.
POWER so far has been sub-optimal on most single-threaded tasks (except for certain types of code with MASSIVE memory footprints), and I don't see POWER10 changing that.

The other ARM vendors are irrelevant at this performance level. They're tracking the ARM performance curve, so lagging Apple by about two years. Nuvia will track the Apple curve.
Maybe they'll compete by offering more, cheaper cores (the AMD vs Intel strategy)?
Spunjji - Wednesday, August 12, 2020 - link
Not sure about the claim re: ARM being 2 years behind. For a start, the S865 curve shows ARM's standard designs are lagging Apple by less than a year in efficiency terms. They're lagging further behind in absolute-maximum performance, but then that's something the X1 cores are set to resolve. They're an unknown quantity right now, though.
name99 - Tuesday, August 11, 2020 - link
Oh give me a break. Amazon, Ampere, Marvell have all managed to scale their cores to 64, 80, 96 core SoCs. And they run those cores at <~3W each.
What exactly do you imagine is so difficult about scaling up that Apple and Nuvia (both with extreme competence in the far harder task of designing a performant single-threaded CPU) will face difficulty?

The main thing your comment reveals is not Nuvia dishonesty but your ignorance.
Flunk - Monday, August 17, 2020 - link
That's not a good comparison, you're comparing chips designed for phones to ones designed for data centers. Ampere makes a 80-core ARM CPU that's a better comparison. There is no reason think Nuvia will be launching 4 core parts. Because ARM is a RISC architecture it doesn't need the more complex decoding hardware that an x86 CPU does so there is no reason to assume these CPUs will have un unccompetitve number of cores.
MonkeyPaw - Tuesday, August 11, 2020 - link
I’m curious what AMD and Intel think of the graphs. Reading their blog, they admit that they can’t perfectly isolate the single core power from competing products. Their graph of the 4700U shows one core reaching close to 12W max, which is 80% of the chip’s TDP. Maybe it’s possible, but the entire chip is a 15W part and has 8 cores, a memory controller, and an IGP.

As for if they get there—they aren’t Apple or AMD or Intel. Apple alone has serious cash to throw major resources at this. I don’t know how big NUVIA is, but they do appear to have venture capitalists backing them. That can be some high-stakes survival. I hope they can pull it off for the sake of competition.
NextGen_Gamer - Tuesday, August 11, 2020 - link
@ MonkeyPaw - This is looking at single-core performance only, in which case, for AMD & Intel the single-core can go up to and beyond the TDP. As you stated, AMD is more honest about their TDP then Intel, so it looks like a single "Zen 2" core can hit about 12-watts of the 15-watt total (the rest being things like the IGP, memory controller, etc.). Once more cores are utilized, that wattage would go down along with clockspeeds, hence why single-core peak turbo is different from all-core turbo.

For Intel, it looks like they allow a single "Sunny Cove" core to ramp right up to & a tiny bit past the entire 15-Watt package TDP, which means that the total platform package is of course above that. It looks like though that is the only way Intel is still beating AMD on single-core pf, because up to that 12-ish watts, AMD is still slightly ahead.
Spunjji - Wednesday, August 12, 2020 - link
Just chiming in to note that I read this graph in the same way you did. Ice Lake is a 25W design part scaled back to 15W, and most of the released products reflect that basic design.
anonomouse - Wednesday, August 12, 2020 - link
Worth noting that at the peak perf point, all 3 of Renoir, Ice Lake, and the Skylake are at 4.1Ghz, and within 100mhz of each other at the bottom. So the difference between the perf of Renoir and Ice Lake is primarily IPC, and this makes it even more interesting that they're riding essential the same curve.

The other interesting aspect is that Skylake and Renoir have very basically the same perf/frequency scaling, but the obvious differences in the power curve demonstrate both how much more power 14nm Skylake burns at the same frequency/perf/IPC and how much harder Intel is pushing 14nm to scale the frequency at the top (see how the power gap widens considerably).
name99 - Tuesday, August 11, 2020 - link
The Nuvia point is basically
- when you buy your Intel part, Intel makes a big deal about 5.5GHz (or whatever it is) BUT
- you only get that top frequency for a few seconds
- on only one core.

Nuvia believes (and I agree) that this obsession with peak frequency is idiotic and ruins every aspect of the chip design. (I've explained why elsewhere.) What they will sell you is a chip that runs all cores at peak performance all the time, not a marketing gimmick.
So, yes, you won't be able to goose the core from 3W normal power to 65W peak normal power under weird rare circumstances. In return, you'll get predictable performance, and a CPU upgrade every year or two rather than Intel's design machine that's ground to a halt because of the complexity of the circuits it's trying to design, coupled with a design flow that's unable to be too automated...
name99 - Tuesday, August 11, 2020 - link
If the US isn't willing to bankroll them, I suspect China will be more than happy to...
PeachNCream - Tuesday, August 11, 2020 - link
It's always good to have another competitor in a market where there are far too few participants, but until a finished or nearly finished product is available for third party testing, there is nothing much to say.
andrewaggb - Tuesday, August 11, 2020 - link
It would be great if they can release something that lives up to their claims in a timeframe that's relevant. The graphs are pretty meaningless until we can verify them ourselves. That said... if they can legit can improve IPC by 50% for mixed/general workloads that's pretty huge. Apple's been heading that way but seems unwilling to sell CPU's to the general market.
FreckledTrout - Tuesday, August 11, 2020 - link
That is some rather bold claims and frankly I am very skeptical. I am skeptical on comparing to low power cores. I would gladly be surprised but I think we are looking at marketing more than anything. Feels like someone needs additional funding.
Quantumz0d - Tuesday, August 11, 2020 - link
What is that graph lol. Put forth a proper HW silicon in the hands of Phoronix and STH then we can talk until then it's all BS, also GB5 ? double lol.
Quantumz0d - Tuesday, August 11, 2020 - link
Headline is also misleading. Says "Zen 2" but lists out 4700U in the article only, it looks like as if this magical unicorn is going to destroy AMD Zen 2 uArch.
name99 - Tuesday, August 11, 2020 - link
There you are:
https://browser.geekbench.com/processor-benchmarks

Doesn't change the overall point!
You do realize Nuvia considers this Intel/AMD rivalry to be silly nonsense, two dinosaurs fighting while they both ignore the asteroid heading towards them!?!

The only curve that matters as far as Nuvia is concerned is Apple's curve and Apple's business plans. Apple pro's
- infrastructure: (testing, engineers, money)
- existing customers
But Apple con's
- now locked into a design flow, and it's always hard to throw that all away and say "let's try something completely different"
- it's harder for Apple than Nuvia (though not impossibly hard) to move to a new instruction set.

If we assume Nuvia start at Apple performance levels they can get a win of ~20% just by adding SVE2. Then another, what, 10%? by using ARMv9 rather than ARMv8. Then they get whatever you might expect Apple to get from a generational shift (20..25%), aided this year by the 5nm transition so an easy 10% speed boost and a whole lot of density boost.

Meanwhile Nuvia have the flexibility that's harder for Apple of going to a completely new pipeline. In particular Nuvia may feel that the time has finally arrived for implementing some sort of KIP (kilo-instruction pipeline). A number of these have been proposed over the past 20 years, and the idea has been successively refined. Apple COULD retrofit various pieces of such a pipeline to what they have today (for all I know they've already started doing this), but Nuvia can get their right away without bothering to retrofit.

Ultimately I think it's good news for Apple fans in that it will probably persuade Apple to be a little more daring than their natural inclinations (eg maybe set up an alternative team working on "aggressive core ideas"). Which benefits ARM who's goal seems to be "always two years behind Apple, no better but also no worse".

x86? Well, they made their choice of perpetual compatibility over performance. Now they live with the consequences.
anonomouse - Tuesday, August 11, 2020 - link
"NUVIA’s wording for this graph includes the phrase ‘we have left the upper part of the curve out to fully disclose at a later date’, indicating that they likely intend for Phoenix cores to go beyond 5W per core."

I think this just means they obfuscated the crap out of the curve, not that they intend to go beyond 5W per core. Their blog post pretty directly states why it didn't make sense for them to really target pushing beyond the realistic use-case power budgets.
Colin1497 - Tuesday, August 11, 2020 - link
Comparing your future product against a competitor's past product will tend to work out in your favor if you're competent, but the target is moving.

This should be interesting.
webdoctors - Tuesday, August 11, 2020 - link
I'm curious whether they're leveraging any binary translation for their benchmark results. There's many processors floating around that execute ARM code but translate them into their own priorietary mess internally and they only see speedups once the system is warmed up and do poorly outside of the main loops when there's system calls involved.

Also the belief that something that's perf/watt efficient under 1W or under 5W scales to the 10-20+W is ludicrous. The tradeoffs for scaling in such different domains are enormous when it comes to actual achieved perf.

I'm skeptical there's much room for innovation in the ARM server front but it'll be great to be proven wrong.
anonomouse - Tuesday, August 11, 2020 - link
I think their point is that (per-core) they are not trying to scale to 10-20+ W.
Wilco1 - Tuesday, August 11, 2020 - link
Only Denver uses binary translation and that's not used in many products, certainly not any Arm servers.

Graviton 2 and Ampere Altra are certainly proof of innovation - a relatively small team can make a high-end Arm server chip which uses 1-2W per core and outperforms EPYC.
abufrejoval - Tuesday, August 11, 2020 - link
I keep wondering what their secret sauce is...

With something like Ivan Godard's Mill architecture, I understand how they achieve an order of magnitude more compute performance out the same number of transistors and energy budget: It's quite simply a very clever way of doing things with a DSP inspired ISA that manges to remain general purpose and still my personal favorite, while I'll concede that general purpose has diminishing returns and RISC-V may be better.

But with a given architecture like ARM, just how much can you do?

The last architectural doubling of IPC performance I could sort of understand was the VISC design presented here four years ago. That was just a factor of 2 and it came with a very high effort in an area likely more prone than ever to side channel issues.

But how these new cores can deliver the same general purpose compute power at a fraction of the energy cost on an existing ISA?

There are really only two avenues that I can see:
1. use fewer transistors: To my taste that's too much magic and I don't see Apple chips being small
2. use more transistors but switch them much more slowly (and more aggressively off): At least that seems more likely than 1.

In any case their approach can't be unique to ARM as an ISA, so I guess we won't know, because once that secret got out, everyone would copy their approach.

Probably with less success on x86, because the inherent overhead and complexity of the translation layer isn't going away, while its benefits become ever less important.

But RISC-V or Mill would profit, as would any other ARM if that technology became generalized.

And I can see how and why they got out of Apple: There is really very little sellable benefit for the additional power on the smartphone.

On the laptop workstation, much more so, but on the server, energy consumption is king.

Easy to understand why Tim Cook doesn't like them doing a Jim Keller or going independent. But personally I'd be more interested in a 20GB leak from these guys than from Intel.
Veedrac - Thursday, August 13, 2020 - link
My reply ended up elsewhere: https://www.anandtech.com/comments/15967/nuvia-pho...
ksec - Wednesday, August 12, 2020 - link
Well the A14 is expected to push to around ~1600 in GB5 ( purely from an GB's perspective ), so those curve will be higher up in only a few weeks time.

So by the time they have a product out, it looks like Apple will be shipping A15.

To me it is far more interesting why Gerard Williams III left Apple despite knowing Apple is making a switch to ARM on Mac.
vinayshivakumar - Wednesday, August 12, 2020 - link
A single Zen2 core consuming 12W peak and 2W min sounds very high ? Am i missing something ?
Spunjji - Wednesday, August 12, 2020 - link
That 12W figure is likely at peak turbo. 12W for the single core, 3W for the rest of the SoC
stevekgoodwin - Wednesday, August 12, 2020 - link
"NUVIA Phoenix Targets +40-50% ST Performance Over Zen 2 for Only 33% the Power"

How is this headline related to the article? Where is single threaded performance mentioned?
Veedrac - Thursday, August 13, 2020 - link
The argument for the Mill made a lot more sense before Apple started making their own chips. It's absolutely true that if you do things the Intel way, pushing frequency to the very limit, out-of-order CPUs are great, big, and power hungry. But at Apple have proven, if you focus on power from the very start, a modern out-of-order processor can just be great and big.

Consider this: Apple's ‘small’ Thunder cores are actually out-of-order CPUs, but “against a Cortex-A55 implementation such as on the Snapdragon 855, the new Thunder cores represent a 2.5-3x performance lead while at the same time using less than half the energy” per computation.

Again, that's a much faster, much larger out-of-order core, using less energy than an in-order processor.

And there's no reason to think you can't go bigger.
Veedrac - Thursday, August 13, 2020 - link
This was a reply to https://www.anandtech.com/comments/15967/nuvia-pho...
ZachSaw - Thursday, August 13, 2020 - link
There's a couple of reasons why ARM cores are all on the left of the graph. They aren't designed to scale up in frequency as much as the x86 cores (each lithography process has its own unique curve but generally raising frequency beyond the sweet spot requires exponentially higher voltage). The other more important one is ARM cores are missing a critical hardware feature that software engineers rely on and take for granted - Strong Memory Model. Without this, you'd have to issue memory barrier instructions whenever you need your objects to sync up with other threads. ARM does not yet have the granularity of Itanium when it comes to memory barrier instructions. There is no concept of acquire / release semantics. Geekbench's multithreaded benchmarks run benchmarks in parallel and call it a multicore bench. In other words, they run embarrassingly parallel. That artificially puts ARM in a better light.

In real life workloads running on the CPU, you'd be dealing with problems that aren't embarrassingly parallel (databases with upserts happening at the same time as reads, game state managements etc). GPU handles the embarrassingly parallel problems much more efficiently than ARM cores.
vvid - Friday, August 14, 2020 - link
So wrong on many levels.
1) x86 is the only popular architecture with "strong model". This is not a critical feature.
2) A12Z has x86-like TSO mode.
3) Synchronization is better to do through OS primitives.
4) ARM has Load-Acquire (LDAR) / Store-Release (STLR)
5) Results shown on the graph are SINGLE threaded.
Wilco1 - Friday, August 14, 2020 - link
In addition to vvid's comments: the graph not only shows Arm outperforming x86 on single threaded perf, but more importantly while using only one quarter of the power! This means Arm keeps its much better power efficiency even when scaling beyond x86.

There are many reasons for this, but a modern ISA without 42 years of baggage, not chasing 5GHz like a fool, avoiding SMT and the complex x86 memory model certainly help...
scineram - Monday, August 17, 2020 - link
So they optimized the microarch to Geekbench binary disassemblies?

NUVIA Phoenix Targets +40-50% ST Performance Over Zen 2 for Only 33% the Power

Post Your Comment

43 Comments

Back to Article

quorm - Tuesday, August 11, 2020 - link

Spunjji - Wednesday, August 12, 2020 - link

eoerl - Tuesday, August 11, 2020 - link

Jaianiesh03 - Tuesday, August 11, 2020 - link

Jaianiesh03 - Tuesday, August 11, 2020 - link

Wilco1 - Wednesday, August 12, 2020 - link

Kamen Rider Blade - Tuesday, August 11, 2020 - link

anonomouse - Tuesday, August 11, 2020 - link

name99 - Tuesday, August 11, 2020 - link

Spunjji - Wednesday, August 12, 2020 - link

name99 - Tuesday, August 11, 2020 - link

Flunk - Monday, August 17, 2020 - link

MonkeyPaw - Tuesday, August 11, 2020 - link

NextGen_Gamer - Tuesday, August 11, 2020 - link

Spunjji - Wednesday, August 12, 2020 - link

anonomouse - Wednesday, August 12, 2020 - link

name99 - Tuesday, August 11, 2020 - link

name99 - Tuesday, August 11, 2020 - link

PeachNCream - Tuesday, August 11, 2020 - link

andrewaggb - Tuesday, August 11, 2020 - link

FreckledTrout - Tuesday, August 11, 2020 - link

Quantumz0d - Tuesday, August 11, 2020 - link

Quantumz0d - Tuesday, August 11, 2020 - link

name99 - Tuesday, August 11, 2020 - link

anonomouse - Tuesday, August 11, 2020 - link

Colin1497 - Tuesday, August 11, 2020 - link

webdoctors - Tuesday, August 11, 2020 - link

anonomouse - Tuesday, August 11, 2020 - link

Wilco1 - Tuesday, August 11, 2020 - link

abufrejoval - Tuesday, August 11, 2020 - link

Veedrac - Thursday, August 13, 2020 - link

ksec - Wednesday, August 12, 2020 - link

vinayshivakumar - Wednesday, August 12, 2020 - link

Spunjji - Wednesday, August 12, 2020 - link

stevekgoodwin - Wednesday, August 12, 2020 - link

Veedrac - Thursday, August 13, 2020 - link

Veedrac - Thursday, August 13, 2020 - link

ZachSaw - Thursday, August 13, 2020 - link

vvid - Friday, August 14, 2020 - link

Wilco1 - Friday, August 14, 2020 - link

scineram - Monday, August 17, 2020 - link

Log in

Don't have an account? Sign up now