Name: Arm Unveils 2024 CPU Core Designs, Cortex X925, A725 and A520: Arm v9.2 Redefined For 3nm
Item: Arm Unveils 2024 CPU Core Designs, Cortex X925, A725 and A520: Arm v9.2 Redefined For 3nm
Author: Gavin Bonshor

Arm Unveils 2024 CPU Core Designs, Cortex X925, A725 and A520: Arm v9.2 Redefined For 3nm

by Gavin Bonshor on 5/29/2024 11:00 AM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

55 Comments

Back to Article

SarahKerrigan - Wednesday, May 29, 2024 - link
"The core is built on Arm's latest 3 nm process technology, which enables it to achieve significant power savings compared to previous generations."

ARM doesn't have lithography capabilities and this is a synthesizable core. This sentence doesn't mean anything.
meacupla - Wednesday, May 29, 2024 - link
AFAIK, the core design needs to be adapted to the smaller process node, and it's not as simple as shrinking an existing design.
Ryan Smith - Wednesday, May 29, 2024 - link
Thanks. Reworded.
dotjaz - Wednesday, May 29, 2024 - link
"ARM doesn't have lithography capabilities and this is a synthesizable core"

And? Apple also doesn't have litho. You are telling me they can't implement anything with external foundries? Do you even know the basics of modern chip design? DTCO has been THE key to archieve better results for at least half a decade now.

Also this is clearly not just a synthesizable core. ARM explicitly announced this is avaiable as production ready cores, that means the implementations are tied to TSMC N3E and Samsung SF3 via DTCO, and this is the first time ARM has launched with ready for production hard core implementation.

You clearly didn't understand, and that's why it didn't mean anything TO YOU, and probably had to be dumbed down for you.

It actually makes perfect sense to me.
lmcd - Wednesday, May 29, 2024 - link
There was a turnaround time slide that didn't get Anandtech text to go with it that made this more clear, but a skim would miss it.
zamroni - Monday, June 17, 2024 - link
it means the logic circuit is designed for 3nm's characteristics, e.g. signal latency, transistor density etc.

older cortex designs can be manufactured using 3nm but it won't reach same performance as they were designed to cater higher signal latency of 4nm or older generations
Duncan Macdonald - Wednesday, May 29, 2024 - link
Lots of buzzwords but low on technical content. Much of this reads like a presentation designed to bamboozle senior management.
Ryan Smith - Wednesday, May 29, 2024 - link
Similar sentiments were shared at the briefing.
continuum - Thursday, May 30, 2024 - link
Whole tone of this article feels like it was written by an AI given how often (compared to what I'm used to in previous articles on this from Anandtech!) certain sentiments like "3nm process" and other buzzwords are used!
name99 - Wednesday, May 29, 2024 - link
Not completely true...

Interesting points (relative to Apple, I don't know enough about Nuvia internals to comment) include
- 4-wide load (vs Apple 3-wide load) is a nice tweak.

- 6-wide NEON is a big jump. Of course they have to scramble to cover that they STILL don't have SVE or SME; even so there is definitely some code that will like this, and the responses will be interesting. I can see a trajectory for how Apple improves SME and SSVE as a response, probably (we shall see...) also boosting NEON to 256b-SVE2. (But for this first round, still 4xNEON=2xSVE2)
Nuvia, less clear how they will counter.

Regardless I'm happy about both of these and requiring a response from Apple which, in turn, makes M a better chip for math/science/engineering (which is what I care about).

They're still relying on run-ahead for some fraction of their I-Prefetch. This SOUNDS good, but honestly, that's a superficial first response and you need to think deeper. Problem is that as far as prefetch goes, branches are of two forms – near branches (mostly if/else), which don't matter, a simple next line prefetcher covers them; and far branches (mostly call/return). You want to drive your prefetcher based on call/return patterns, not trying to run the if/else fetches enough cycles ahead of Decode. Apple gets this right with an I-prefetcher scheme that's based on call/return patterns (and has recently been boosted to use some TAGE-like ideas).

Ultimately it looks to me like they are boxed in by the fact that they need to look good on phones that are too cheap for a real NPU or a decent GPU. Which means they're blowing most of their extra budget on throughput functionality to handle CPU-based AI.
Probably not the optimal way to spend transistors as opposed to Apple or QC. BUT
with the great side-effect that it makes their core a lot nicer for STEM code! Maybe not what marketing wanted to push, but as I said, I'll take it as steering Apple and QC in the right direction.
I suspect this is part of why the announcement comes across as so light compared to the past few years – there simply isn't much new cool interesting stuff there, just a workmanlike (and probably appropriate) use of extra transistors to buy more throughput.
eastcoast_pete - Wednesday, May 29, 2024 - link
Speaking of SVE and SME: are there any applications (for Android, Windows-on-ARM or Apple devices) available to the general public that use either or both of them? SVE was originally co-developed by ARM and Fujitsu for the core that powers Fugaku, Riken's supercomputer. There are reports (rumors) that SVE is painful to implement, and someone wrote that Qualcomm elected to not enable SVE in their 8 Gen3 SoC, even though it's in their big cores. Anyone here knows, can comment? Right now, outside of 1-2 benchmarks, which applications actually use SVE, never mind SME?
name99 - Wednesday, May 29, 2024 - link
Presumably ARM’s Kleidi AI libraries (and various MS equivalents) use SVE and SME if present.
And that’s really what matters. This functionality is envisaged (for now) as “built-in”.
Obviously they want developer buy-in over time, but that’s not what matters right now; what matters is what’s in the OS and API’s. Same as the fact that AMX was available to developers via Accelerate was great, but the primary user was Apple’s ML APIs.
Marlin1975 - Thursday, May 30, 2024 - link
What do you mean, its all there. They went over the Optimized design that will take advantage of the synergies of the new NM tech from a leading edge lithography manufacture and lead them to greater performance. Its a win win for everyone, are you not onboard?

:)
syxbit - Wednesday, May 29, 2024 - link
I suspect this will still be worse than the A17 and the Nuvia chips.
GC2:CS - Wednesday, May 29, 2024 - link
A17 and M3 and M4 did not show much benefit by going to the 3nm. If ARM can do better than only good for them.
BGQ-qbf-tqf-n6n - Wednesday, May 29, 2024 - link
A17 was already 30% faster than S8G3 in single-core scores. In the same GB tests ARM is referring to, M4 is 27% faster still.

Presuming the X925 is relative to the X4 with “36 percent faster”, they’ll still be behind M3, much less M4.
OreoCookie - Saturday, June 1, 2024 - link
The speed ups in single and multi core were significant. To my knowledge the 10-core M4 is the fastest stock CPU in single core performance that was tested (about 13 % faster than Intel's Core i9 14900 KS, which clocks up to 6.2 GHz stock). The M3 is about 6 % behind the 14900 KS. (I am unaware of e. g. SPECmark results for the M4.)
mode_13h - Saturday, June 1, 2024 - link
> I am unaware of e. g. SPECmark results for the M4.

I'm pretty sure nobody is testing that, since Anandtech stopped doing it (i.e. after Andrei left).
OreoCookie - Sunday, June 2, 2024 - link
Yeah, and it seems nobody is doing it consistently across several generations. The best dissection of the M3 architecture I remember was by a Chinese Youtube channel, but nobody is carrying the baton. Maybe Ian and Andrei are doing this as part of their work for clients. (Andrei, I think, is working for Qualcomm now, isn't he?)
mode_13h - Monday, June 3, 2024 - link
name99 would know what M3 analysis is out there. He wrote/compiled the Apple M1 explainer, which is a 300-page PDF you can find with all the details about it.

https://github.com/name99-org/AArch64-Explore/
StormyParis - Wednesday, May 29, 2024 - link
Do these processors have anything to prevent exploits such as RowHammer etc ... ? Those & variants have been a big story, then disappeared, but we were never told about an actual solution ?
GeoffreyA - Wednesday, May 29, 2024 - link
Are these companies so lame with their AI desperation?
abufrejoval - Wednesday, May 29, 2024 - link
Memory tagging extensions have been around since ARM 8.5. When you say ARM 9.2 MTE, does that mean they have been significantly upgraded e.g. in the direction of what CHERI does for RISC-V?

I've been trying to find out if ARM has an "AVX-512" issue with their different big/middle/small designs, too. That is if these distinct cores might actually differ in the range of instruction set extensions they support. And I can't get a clear picture, either.

So if say the big cores support some clever new vector formats for AI and the middle or small cores won't, how will apps and OS deal with the issue?
GeoffreyA - Wednesday, May 29, 2024 - link
I don't know enough about ARM to comment, but should think that there are compatibility issues, with instructions, spanning different models and generations. Perhaps there's a feature-level type of method?
Findecanor - Wednesday, May 29, 2024 - link
ARM MTE is much cruder than CHERI. It can be described as "memory colouring": Every allocation in memory is tagged with one of 16 colours. Two adjacent allocations can't have the same colour. When you use a pointer the colour bits in otherwise unused top bits of the pointer have to match the colour of the allocation it points into.

With SVE both E and P cores need to have the same vector length, yes. The vector length is usually no larger than a cache line which have to be the same size anyway.

I don't know specifically about SME but many extensions have to first be enabled by the OS on a core to be available to user-mode programs. If not all cores have an extension, the OS may choose to not enable it on any.
mode_13h - Thursday, May 30, 2024 - link
> The vector length is usually no larger than a cache line which have to be the same size anyway.

Cache lines are usually 64 bytes, which is 512 bits. Presumably, the A520 has only SVE2 @ 128 bits. So, don't let ARM off the hook *that* easily!
eastcoast_pete - Wednesday, May 29, 2024 - link
That is very much something I am wondering, too. Reports/rumors have it that, for example, Qualcomm chose not to enable SVE in the big cores of their SD 8 Gen3. Qualcomm isn't exactly forthcoming with information about that, not that I would expect them to comment.
name99 - Wednesday, May 29, 2024 - link
That's a silly response. It's like being present at the birth of Mac or Windows and saying "why are these stupid hardware companies trying so hard to make their chips run graphics fast?"

The hope of LLMs is that they will provide a substantial augmentation to existing UI. So instead of having to understand a complicated set of Photoshop commands, you'll be able to say something like "Highlight the subject of the photo. Now move it about an inch left. Now remove that power line in the background".
This is not a trivial task; it requires substantial replumbing on existing apps, along with a fair degree of rethinking app architecture. Well, no-one said it would be easy to convert Visicalc to Excel...

But that is where things are headed. And because ARM (and Apple, and QC, and MS) are not controlled by idiots who think only in terms of tweets and snark, each of these companies is moving heaven and earth to ensure that they will not be irrelevant during this shift.

(Oh, you thought the entire world consisted of LLMs answering questions did you? Strange how the QUESTION-ANSWERING COMPANY, ie Google, has created that impression...
Try thinking independently for once. Everything in the world happens along a dozen dimensions at once. All it takes to be a genius is to be able to hold *more than one* dimension in your head simultaneously.)
FunBunny2 - Wednesday, May 29, 2024 - link
Well, no-one said it would be easy to convert Visicalc to Excel..

well... Mitch did it, in assembler at first, and called it Lotus 1-2-3
GeoffreyA - Thursday, May 30, 2024 - link
You can go on with your ad hominen and air of superiority; it won't change that these companies are tripping over themselves, in insecurity and desperation, to grab dollars from the AI pot or not be left behind.

You make assumptions about my ideas based on one sentence. In fact, AI is quite interesting to me. Not how it's going to help someone in Photoshop or Visual Studio, but where LLMs eventually lead; whether they end up being the language faculty in strong AI, or more; what's missing from today's LLMs (state, being trained in real-time, connecting to the sense modalities, using little power in a small space, etc.); and whether consciousness, the last great riddle, will ever be solved, how, and the moral implications. That's of interest to me.

But when one see Microsoft and Intel making an "AI PC," or AMD calling their CPU "Ryzen AI," and so on, it is little about true AI and more about money, checklists, and the bandwagon. Independence of thought is about seeing past fashion and the in-thing. And no thank you: I have got no desire, nor the insecurity, to want to be a genius.
ET - Thursday, May 30, 2024 - link
I'm not sure why you're attributing this to insecurity and desperation when it's all about money. I can understand why end users would prefer companies to invest into things they feel are more relevant, but jumping on bandwagons (and driving them forward) is exactly the thing that companies wanting to keep their market healthy should do.
GeoffreyA - Thursday, May 30, 2024 - link
Agreed; it is all about money. Generally, it is not to the benefit of the consumer or the world. An AI PC might be good for Jensen, Pat, Satya, Tim, Lisa, and co. but does not help most people.
mode_13h - Thursday, May 30, 2024 - link
Ooh, you just got "named!"

Seriously, your comment does indeed sound snarky and your reply sounds defensive and even a bit insecure. I don't think name99 was suggesting that you should want to be a genius, but rather pointing out that it pays to think beyond a single track.

> when one see Microsoft and Intel making an "AI PC," or AMD calling their
> CPU "Ryzen AI," and so on, it is little about true AI and more about money,
> checklists, and the bandwagon.

I'm reminded of when 3D-capable GPUs went so mainstream you could scarcely buy a PC without it. Yet, the killer app for the average PC user had yet to be invented. To some extent, the hardware needs to lead the way before mainstream apps can fully exploit the technology, because software companies aren't going to invest the time & effort in making features & functionality that only a tiny number of users can take advantage of.

Also, you say you want AI models to use little power, but progress happens incrementally and having hardware assist indeed improves the efficiency of inferencing on models that aren't all as big or demanding as LLMs.
GeoffreyA - Thursday, May 30, 2024 - link
Fair enough. I apologise to everyone for negative connotations in my comment and replies, but the companies are free game and we ought to poke fun at them. I'm fed up, with the lies, marketing, double standards, doublespeak, and nonsense. These companies are only after money, and we are the fools at the end of the day. The last few years it was cloud; now, it's AI. What's next?
GeoffreyA - Thursday, May 30, 2024 - link
As I've said, both here and in several comments elsewhere, AI and LLMs are of immense interest to me. I believe they're the Stone Age version of the stuff in our brains. What I'm trying to criticise is not LLMs or the technology, but the marketing ripoff that is bombarding us everywhere, this so-called AI PC, Copilot PC, or whatever Apple calls theirs. It's laughable the way they're plastering the term AI all over products.
SydneyBlue120d - Thursday, May 30, 2024 - link
Can we expect Samsung S25 3nm Exynos 2500 SOC to be based on this cores?
eastcoast_pete - Sunday, June 2, 2024 - link
After their rather poor showing with their Mongoose custom cores, I'd be very surprised if Samsung doesn't stick with ARM's designs for the CPU side of the Exynos 2500. What's (IMHO) really interesting right now is what Samsung will use for their GPU for the 2500. Rumors abound, many saying that they'll walk away from XDNA and use an in-house designed GPU, or come back to the ARM Mali mothership. The latter would put them in an awkward position, as Mediatek is likely the first out of the gate with their new 9400 featuring both the newest ARM cores and whatever the new version of Immortalis will be called. And Mediatek's Dimensity 9400 is (will be?) fabbed on TSMC's newest 3 nm node, so Samsung will want to have maximum differentiation here.
James5mith - Thursday, May 30, 2024 - link
"The enhanced AI capabilities ensure these applications run efficiently and effectively, delivering faster and more accurate results."

ARM hardware will magically fix AI algorithms to be better than they otherwise would be? Really?!?
mode_13h - Thursday, May 30, 2024 - link
They're probably referring to the fact that it can deliver good inferencing performance without having to resort to the sorts of extreme quantization behind some companies TOPS claims. Quantization often comes at the expense of accuracy, especially if it's done after training, rather than the model being designed and trained to utilize some amount of quantized weights.
James5mith - Thursday, May 30, 2024 - link
Also, amazing increases in performance per watt doesn't mean less power draw. If it draws 3x the power to do 4x the work, then it's increased efficiency 1.33x. But it's still drawing 3x the power. That means a battery will be drained 3x faster.

Saying the 30w SoC does work more efficiently than the 10w SoC doesn't make it draw less power.
mode_13h - Thursday, May 30, 2024 - link
> Also, amazing increases in performance per watt doesn't mean less power draw.

ARM provided power/performance curves, the point of which is to show how much more efficient the new cores can be at ISO performance, or how much more performance you can get at the same power, or what tradeoffs you can make anywhere in between.

I know their unitless graphs and lack of details about the workload used to produce them can stretch their credibility, but it's not as if they aren't aware that these cores often won't be clocked to the max.
vegemeister - Friday, May 31, 2024 - link
In most client application, you always do 1x the work, and the only difference is how long it takes / what the CPU utilization % is.

So the SoC will indeed use 1/1.33 as much energy.
eastcoast_pete - Sunday, June 2, 2024 - link
It does overall if the OS and the SoC does "hurry up and get to idle" really well. This is something Apple's mobile SoCs have excelled at in recent times, it helps that their "Little" (efficiency) cores are strong performers that use out-of-order execution and other features to allow the SoC to stay on the efficiency cores for far longer. Android smartphones based on stock ARM cores don't have that option as much, and seem to end up running their larger cores more often and longer. Would also be interesting how much of that efficiency penalty can also be attributed to Android OS, but ARM has been very stubborn sticking to in-order execution for its Little cores. Which is puzzling, but good for Apple.
mode_13h - Thursday, May 30, 2024 - link
I'm disappointed that ARM seems to have deviated from their practice of releasing ISO-power and ISO-performance figures.

Also, I noticed they swapped the axis' in their power/performance graph so that it curves upwards rather than leveling off. I guess some marketing goon decided graphs look more impressive if they curve upward. And, as usual, we get the unitless graphs that don't start at zero.

Hey, does anyone know if the A520 still potentially shares vector FP units between a pair of cores, or did that gem of an idea begin and end with the A510?
GeoffreyA - Thursday, May 30, 2024 - link
"shares vector FP units between a pair of cores"

That was a Bulldozer principle, if I remember rightly.
kkilobyte - Thursday, May 30, 2024 - link
Ok, sorry to remind you and maybe sound a little 'pushy' about it, but what about the i9-14900KS test redo with Intel Default settings? You told us 20 days ago that you'd redo them :

Gavin Bonshor - Friday, May 10, 2024 - link
Don't worry; I will be testing Intel Default settings, too. I'm testing over the weekend and adding them in.

So, will this promise be ever fullfilled?
mode_13h - Thursday, May 30, 2024 - link
+1

Please deliver the promised update to the i9-14900KS review! The people deserve to know how much performance is being lost with Intel's new recommended defaults!
watersb - Friday, May 31, 2024 - link
Pronunciation of their new software branding, 'Kleidi', is not completely clear to me.

One way is to make it sound like the name of a girl, rhymes with 'Heidi'. So a single syllable.

The other way is to infer that the Arm Marketing people wished to evoke a colorful collection of myriad bits that can combine to form interesting patterns. A toy, A Kaleidoscope.

Unfortunately, that sounds like "Collide-y" to me: a product that tends to bang into other pieces.

Which would be an unfortunate name for automotive applications.
EthiaW - Sunday, June 2, 2024 - link
ARM routinely claims the fruit of TSMC node improvement as its own achievement, you'll get familiar with the cliche after following it for a few years.🙄 As for competition, we are already seeing Apple 9-core M4, the 1.5 times bigger brother of A18. Halve the memory score and lower its frequency to a more phone-friendly 3.7Ghz, it's still scoring at least 3200 in geekbench single core which X925 is certainly not going to catch up with. By the time new ARM laptop hits the market Zen5 mobile and Lunar Lake will be prevalent and based on available data an single core improvement of at least 20% is expected so X925 will not have an easy time. I'd say this generation of ARM cores are mostly incremental and nothing revolutional.
mode_13h - Sunday, June 2, 2024 - link
> ARM routinely claims the fruit of TSMC node improvement as its own achievement,

There's nothing automatic about an IPC improvement. You have to actually make design changes to take advantage of the larger transistor budget and timing margins, in order to achieve that. Otherwise, the only way CPUs would get faster from shrinking nodes is just by increasing clockspeeds, which incurs a high cost in additional power.

Plus, how is this any different than what Intel and AMD do, when they announce new CPU microarchitectures? They don't usually separate out how much improvement is from the node, if ever.

> Zen5 mobile and Lunar Lake will be prevalent and based on available data
> an single core improvement of at least 20% is expected

At what power level? It's not 20% IPC, so there's some additional clockspeed in that figure, which might not be entirely applicable to laptops.
EthiaW - Sunday, June 2, 2024 - link
I know it takes solid work(and money) to adapt a certain architecture to the newest node, ARM can claim some credit but not all.
By the way, ARM has a long history of not-so-reliable projection. Remember A57 and X1 that came after much hype only to flop badly? And A72/A78 that was supposed to be minor upgrade but turned out classic? Always view their claim with a pinch of salt.
mode_13h - Monday, June 3, 2024 - link
> ARM has a long history of not-so-reliable projection.
> Remember A57 and X1 that came after much hype only to flop badly?

Did they fail to hit their power or performance projections? Source?
eastcoast_pete - Sunday, June 2, 2024 - link
Question @Gavin and @Ryan: I might have completely missed it, but have Qualcomm and ARM settled their legal fight regarding Qualcomm's right to use the custom Nuvia designs in their SoCs? I almost assume so, as Qualcomm is otherwise proceeding at great risk regarding possible liabilities.
mode_13h - Monday, June 3, 2024 - link
No, I didn't hear anything about it (projections by legal experts were that it wouldn't be wrapped up by now, either), and I'm not seeing any recent hits on it in Google News.
skavi - Monday, June 3, 2024 - link
how much of this article was written by an llm?

Arm Unveils 2024 CPU Core Designs, Cortex X925, A725 and A520: Arm v9.2 Redefined For 3nm

Post Your Comment

55 Comments

Back to Article

SarahKerrigan - Wednesday, May 29, 2024 - link

meacupla - Wednesday, May 29, 2024 - link

Ryan Smith - Wednesday, May 29, 2024 - link

dotjaz - Wednesday, May 29, 2024 - link

lmcd - Wednesday, May 29, 2024 - link

zamroni - Monday, June 17, 2024 - link

Duncan Macdonald - Wednesday, May 29, 2024 - link

Ryan Smith - Wednesday, May 29, 2024 - link

continuum - Thursday, May 30, 2024 - link

name99 - Wednesday, May 29, 2024 - link

eastcoast_pete - Wednesday, May 29, 2024 - link

name99 - Wednesday, May 29, 2024 - link

Marlin1975 - Thursday, May 30, 2024 - link

syxbit - Wednesday, May 29, 2024 - link

GC2:CS - Wednesday, May 29, 2024 - link

BGQ-qbf-tqf-n6n - Wednesday, May 29, 2024 - link

OreoCookie - Saturday, June 1, 2024 - link

mode_13h - Saturday, June 1, 2024 - link

OreoCookie - Sunday, June 2, 2024 - link

mode_13h - Monday, June 3, 2024 - link

StormyParis - Wednesday, May 29, 2024 - link

GeoffreyA - Wednesday, May 29, 2024 - link

abufrejoval - Wednesday, May 29, 2024 - link

GeoffreyA - Wednesday, May 29, 2024 - link

Findecanor - Wednesday, May 29, 2024 - link

mode_13h - Thursday, May 30, 2024 - link

eastcoast_pete - Wednesday, May 29, 2024 - link

name99 - Wednesday, May 29, 2024 - link

FunBunny2 - Wednesday, May 29, 2024 - link

GeoffreyA - Thursday, May 30, 2024 - link

ET - Thursday, May 30, 2024 - link

GeoffreyA - Thursday, May 30, 2024 - link

mode_13h - Thursday, May 30, 2024 - link

GeoffreyA - Thursday, May 30, 2024 - link

GeoffreyA - Thursday, May 30, 2024 - link

SydneyBlue120d - Thursday, May 30, 2024 - link

eastcoast_pete - Sunday, June 2, 2024 - link

James5mith - Thursday, May 30, 2024 - link

mode_13h - Thursday, May 30, 2024 - link

James5mith - Thursday, May 30, 2024 - link

mode_13h - Thursday, May 30, 2024 - link

vegemeister - Friday, May 31, 2024 - link

eastcoast_pete - Sunday, June 2, 2024 - link

mode_13h - Thursday, May 30, 2024 - link

GeoffreyA - Thursday, May 30, 2024 - link

kkilobyte - Thursday, May 30, 2024 - link

mode_13h - Thursday, May 30, 2024 - link

watersb - Friday, May 31, 2024 - link

EthiaW - Sunday, June 2, 2024 - link

mode_13h - Sunday, June 2, 2024 - link

EthiaW - Sunday, June 2, 2024 - link

mode_13h - Monday, June 3, 2024 - link

eastcoast_pete - Sunday, June 2, 2024 - link

mode_13h - Monday, June 3, 2024 - link

skavi - Monday, June 3, 2024 - link

Log in

Don't have an account? Sign up now