" Looks like X series SoCs for the iPads are very much dead, then"
It's more than just that! The iPad Air traditionally used the latest phone chip. (iPad Pro used the X chip, and iPad used the phone chip from a year or two ago.)
This suggests one or both of - M1 chips are remarkably cheap to create. (Good news all round; not least means there's a lot of price margin there to make larger better versions for M2, M2 Max and so on!) - Apple has grand plans for iPadOS that demand a higher level of performance even at the mid-range.
Additionally, the fact that iPad Air is getting the full fat, 8 GPU version of M1 suggests that yields are actually very good for this chip - no need to repurpose the binned, 7 GPU versions that are in the entry level iMac and MBA.
Indeed, and the internal codenames confirm it. A12 is H11P, A12X/Z is H11G, A13 is H12P, A14 is H13P, M1 is H13G, M1 Pro/Max(/Ultra, presumably) is H13X, A15 is H14P.
"- Apple has grand plans for iPadOS that demand a higher level of performance even at the mid-range."
Don't think so on this one, unfortunately. The iPads have been ridiculously overpowered for anything but games for quite a while. They seem to not want to cannibalize Mac sales.
They don't really care about cannibalizing their other product lines, they actually *want* to cannibalize them before anyone else other than themselves does it.
That's astonishing. Suppose we have 2.5GHz baseline frequency for the connection. That means connection width is 1024B! Does anyone else have anything like that? It suggests that the units being transferred between cores are something large (pages or quarter pages) rather than just cache lines, which are the obvious transfer unit and what everyone else has used as far as I know.
It doesn't follow that you need the same level of MCM bandwidth. So far, the multi-die GPUs have been limited to HPC workloads, where work partitioning might reduce the needs for global data-access (compared to interactive rendering).
For interactive rendering, it could be that they've invested a lot in better work scheduling, or just cherry-picking GPU performance numbers from the apps that scale up best.
their benchmark results absolutely massacre AMD and intel in total performance but real world results are just on par except in perf/watt. there's definitely some bottlenecks in there, though some may be in software.
I meant with the M1/M1 max. their SPEC performance is through the roof but I've never seen it perform at that level in actual Mac software. I would expect the same for this chip since its basically two m1 max's connected together.
It's worth pointing out that what AMD did was simply use the same Infinity Link ports they use for off-package communication. And either because of that, or vice versa, the MI200 presents itself as 2 GPUs, from a programmability point of view. So, even though the links are cache-coherent, the aggregate inter-die bandwidth is far too low for tasks to be transparently migrated between the dies and not suffer, substantially.
It'd be interesting to know how Intel's Ponte Vecchio appears to software, but I *think* it probably shows up as a singe GPU per package.
Okay, but GPUs with HBM would often have an 4096-bit wide aggregate connection to their HBM. Is this any different? And AMD's MI200 (MCM) even doubled that to 8192-bit!
(b) the target of HBM is dumb. What's interesting about this size, as I described, is that it's between "smart" components. ie it's not the number of pins that's interesting, it's the protocol driving those pins, the decision as to what blocks of data are transferred when.
> it's the protocol driving those pins, the decision as to what blocks of data are transferred when.
IMHO, I don't really see it as materially different than what drives on-die interconnects or how it's not the same sort of cache hierarchy that's in front of HBM. I'll take your word for it, though. I'm hardly a CPU or ASIC designer.
They said there were 10,000 I/Os so even with many of them being ground and use of differential signaling the frequency they are running at would be a lot lower than 2.5 GHz.
Oops, as pointed out, I misread 1024-byte as 1024-bit.
Similar points could still apply, but at a much more massive scale. Would have to be 8 bi-dir links of a mesh interconnect, to still work in 64-byte quantities. Still plausible, I think.
As Dough_S mentioned, power-efficiency concerns would push them more towards 16 bi-dir links at half the clock speed.
Interesting that they also said this is the last member of the M1 family. I wonder what that means for the Mac Pro. One thing it would appear NOT to mean is four M1 Max chips bolted together. Perhaps the Mac Pro will be based on an M2-derived Ultra Ultra chip?
it basically just means that the mac pro refresh is further out than the m2 or it will feature custom silicon. i would bet on the latter, there is only so much perf you can extract by stitching soc-die's.
sure there is thermal room left for stitching up to 8 m1 max's, but i doubt that would be feasible, more likely they'll go some other route.
given they charge ~$3500 for 114bn transistors, expect a little less than double the transistors (of the mac studio) for the entry model. expandable with likely double or more transistors... this is certainly going to be one hell of a fast machine.
For the last year all reliable rumours have pointed that Apple’s strategy is multi-die processors: 2 dies - 20 CPU + 64 GPU cores and 4 dies - 40 CPU + 128 GPU cores. Those rumours have been consistently proven correct, so we should expect a 4-die processor next.
We should expect a 4-die processor *at some point*. *Next* is probably M2.
This is not just pettyfogging. Hector Martin (who presumably should know) is convinced that the interrupt controller architecture for M1 can not stretch beyond two chips, it just doesn't have the appropriate bits to specify more than two chips.
I was commenting about the theory that the Mac Pro would have some kind of custom silicon, that it would not be very feasible to have even more dies connected
Well what it means is that the Mac Pro will be even more powerful than what four M1 Max would provide. We'll have to see if the M2 Max has more than 8 big CPU cores and 32 GPU cores.
I expect M2 to use the cores from the upcoming A16, and be made on N4P. That's a 6% density gain and 11% performance gain (or 22% efficiency gain) versus the N5 process used for the M1 family. Not a lot, but the cores take up such a small percentage of the overall M1 Max die they could bump to say 12 big cores and 48 GPU cores at the same die size if everything else stayed the same.
name99 said somebody analyzed the interrupt controller and found it's maxed out at 2 dies.
Also, that 2.5 GB/sec wouldn't be possible between sockets. That means there'd be a massive bottleneck that would at least be enough to prevent the GPU dies acting as one.
The combination of display + mac studio gives you essentially the equivalent of today's low-end iMac Pro (either the low end or high version depending on Max or Ultimate) at 1500 or more dollars cheaper. It's not exactly a perfect match -- you get more ports, you have two separate boxes rather than just one which is a (very mild!) hassle, but the box can sit below the studio (out of the way) or, (my guess is will be very popular) someone like 12-South will design a shelf to add to the back of the monitor that can hold the Studio box.
My guess is that this combination is best viewed as the iMac Pro replacement, not the Mac Pro replacement.
My guess (only a guess!) is that the Mac Pro replacement will be built on using CXL for the large RAM capacities, and that requires CXL support in the chipset, coming with M2.
That's always been the case; you could buy a mac mini with a separate monitor. The real point of a 27" iMac or Pro was that you got a superb monitor, not just some random crap. Now that Apple is back in the standalone monitor game that becomes less essential.
I don't think it's fair to say that iMac (and even more so iMac Pro) were a failure; it's more that they were appropriate to their time. You want a big monitor, now you have all this space right behind the monitor, how about using it? When the computation of an iMac Pro (8 to 16 cores, large dGPU) could not fit in a mini-sized box, that was a more desirable alternative than a separate tower.
Now that we can fit that power in a "large" mini, things are somewhat different. In particular I can imagine (certainly this is what I plan for myself) a shelf attached to, but sitting behind the monitor that holds the Mac Studio on its *side*. Apple logo faces outwards, power and one set of ports on the left, other ports on the right. This gives you something very like the iMac Pro (fatter in the middle, thinner at the edges) perfectly feasible if your desk situation has the appropriate amount of depth (as mine does).
But it's feasible because (a) Apple is back in the monitor game (b) the compute guts fit in a small enough box, rather than requiring the area of a 27" screen to hold the dGPU and separate components enough.
> having the computer separated from the monitor allows us to upgrade > the computer faster, as these high quality monitors easily last 10 years.
The 3 PCs I own that I use most often are all >= 10 years old.
I was also using CRT monitors (I had a pair of > 15-year-old Sony GDM-FW900's) until summer of 2020.
And the Samsung 1440p monitor I use at work is 11 years old and still hanging in there. I have a secondary Dell monitor that's 1600x1200 20" from 2007. Its main downside is the amount of heat it puts out, in the summer (CFL backlight, FTL).
I take your point, though. The machines I use at work are a bit newer.
> Pro replacement will be built on using CXL for the large RAM capacities
Yeah, CXL memory would make sense, here.
So, how big are the page sizes? 64 kB? That's probably still in the sweet spot for migrating between in-package and CXL memory. I think you'd want the transfer time to be some low multiple of the latency, in order to limit the overall penalty of a page miss on the in-package RAM.
Okay, so that'd be 256k pages/sec @ PCIe 5.0 x1 speeds (~= CXL 1.x ?). So, transfer time of ~4 usec.
Once source I found indicates estimated latency of CXL 1.1 > 100-150 ns. So, it's about 25x to 40x of that, however I think that estimate doesn't account for bus or possibly device-side latency. Real world end-to-end latency for CXL 1.1 might be closer to 1 usec?
I've long been skeptical of people's fascination with CXL. What the hell's the point of using it versus using DIMMs?
Apple has scaled LPDDR5 all the way up to the Mac Studio. I see no reason why the Mac Pro won't use it as well. It will be upgraded to LPDDR5X, which will bump memory bandwidth by up to 33% (assuming they use the fastest currently available LPDDR5X) and larger LPDDR5/5X stacks could create Mac Pros with multiple TBs of DRAM. The only disadvantage is the lack of post-purchase upgradeability, but you will get 2 TB/sec of memory bandwidth, which isn't available on any x86 platform at any price (though it remains to be seen whether the M2 Max/Ultra will be able to fully exploit it)
> I've long been skeptical of people's fascination with CXL.
It's cache-coherent across multiple devices and CPUs. That lets you put large memory pools that GPU-like accelerators can access, directly. And it's lower-latency than PCIe.
Also, it gives you a way to scale out memory, so you can have another tier beneath what's directly connected to CPUs, but still much faster than NAND flash.
> larger LPDDR5/5X stacks could create Mac Pros with multiple TBs of DRAM.
Not sure about that. Are you aware that the signals & power for each die need to be tunneled up through vias? And the taller your stack, the more of those you need, hurting area-efficiency of the lower dies? Plus, if the stack is too thick, perhaps you might have trouble cooling it.
> you will get 2 TB/sec of memory bandwidth, which isn't > available on any x86 platform at any price
Sapphire Rapids has been announced with HBM. Ian estimated "between 1.432 TB/s to 1.640 TB/s":
> It's cache-coherent across multiple devices and CPUs. That lets you put large memory pools > that GPU-like accelerators can access, directly. And it's lower-latency than PCIe.
But that's irrelevant for Apple, as they already have cache coherency licked across multiple dies thanks to the M1 Ultra's 10,000 I/Os. They don't care about external accelerators since the GPU and NPU are built in - all indications are they have no plans to support third party GPUs at all. If they offer PCIe slots at all, it will probably be x4 slots only for internal SSD expansion or networking too fast for TB4 like 100GbE. No need for special memory solutions there.
So what's the advantage to APPLE to use CXL? Still none that I can see.
> Not sure about that. Are you aware that the signals & power for each die need to be tunneled > up through vias? And the taller your stack, the more of those you need, hurting area-efficiency > of the lower dies?
I'm talking about stuff that already exists. Samsung already offers LPDDR5/5X packages with up to 32 devices. At the current 16Gb generation that's 64GB per package, allowing for 1 TB in a four SoC Mac Pro using their current layout of four packages per SoC. 4 TB down the road as DRAM gets denser, maybe more if even larger packages become available. You think 16 packages of LPDDR5X is a cooling problem, why would CXL (using higher power standard DDR5) be immune from that?
It isn't like this has to go in a laptop where weight is a concern. You stick a wide base heatsink over the entire 4 SoC 16 package complex, with a fan sized to match - a huge but relatively low rpm fan which will be nice and quiet. If Intel and AMD can cool 300+ watts in a single chip, why would it be harder for Apple to do so with that heat spread across 20 different chips??
I mentioned CXL specifically for the Mac Pro as CXL.mem That I am fairly confident of.
But more generally, CXL solves another problem that may (or may not?) be real for Apple. Right now Apple have CPU, GPU and media tied tightly together. But for many (most?) use cases, people don't want this tying. I want more CPU, you want more GPU, she wants more media. Ultra is a good quick way to try to hit the central spot, but an alternative is to provide a good base with lots of CPU and "minimal" other (NPU, GPU, media), but provide one or more equivalents of an Afterburner card that provide extreme levels of GPU/Media/Other accelerators, all essentially transparent to the system, via CXL.
Apple is not doing all this just to make the current faithful happy; they are on track to conquer the world. But part of conquering the world is being a little more flexible in providing different configs for different users. I'm extremely impressed with how they've (with minimal "disruption" and "engineering overhead" scaled by 16x from A14 to M1 Ultra) but it has been at the cost of tying all the accelerators together; CXL may provide a route out of this. Even for internal use they surely need this? At some point their data centers are going to move to Apple Silicon, and those data centers are going to want SoCs with 128 cores on them, not SoCs with 8+2 cores and a massive amount of GPU+media capability.
> But that's irrelevant for Apple, ... They don't care about external accelerators
The Mac Pro & its users probably will. Even if not GPUs, there are still other PCIe cards people will want to use in there. However, I'm not necessarily saying they'll support CXL cards, just that it'll likely still be a tower-style machine.
> So what's the advantage to APPLE to use CXL?
That's a more specific question than you asked before. I'm with name99 in that I think CXL.mem could make sense for them.
> I'm talking about stuff that already exists. Samsung already offers > LPDDR5/5X packages with up to 32 devices.
Got a link to that?
> allowing for 1 TB in a four SoC Mac Pro using their current layout of four packages per SoC.
You'd sure hope none of it goes bad! That's another nice thing about DIMMs.
> cooling problem, why would CXL ... be immune from that?
Because the individual DIMMs are in open air, and the DRAM dies are only packaged with other DRAM dies, rather than also hot CPU/GPU dies.
> If Intel and AMD can cool 300+ watts in a single chip, why would it be harder > for Apple to do so with that heat spread across 20 different chips
If the stack is too thick, I'd be concerned about those dies towards the bottom. It's not only about heat dissipation, but also the thermal gradient you need to sustain.
That's because you think of DRAM as DRAM. But when you are using DRAM in 1.5TB quantities, you are not using it as DRAM, you are using it as fast storage, either as a full in-memory database, or as something to hold your indexes or whatever that point into your database. In both cases, the required performance is more like IO speeds than DRAM speeds, and the segregation from "working" DRAM is fairly trivial, you don't need a clairvoyant system to know how to move pages between fast and slow DRAM.
What are you talking about "good luck working on it". I have a Dell 43" U4320Q 4K monitor, and it is an absolute JOY to work on! It is the perfect size for 4K in my opinion! Fantastic screen real estate, and no need to scale anything at all! Perfect pixel pitch. For those knocking these larger size 4K monitors, they haven't actually tried them! I could never go back to smaller, now!
Oh,Dell 43" U4320Q 4K monitor is very expensive!Since my desk is not very big, I can't fit such a huge monitor, and my budget is limited. I use a 27inch 4K monitor with 150% scale, its my first time to use 4K monitor.Can you describe to me what it's like to use such a large 4K monitor? Especially when compared with 32-inch and 27-inch monitors. I will be very grateful!
I have a 32" 4k screen and it's already at about the limit of how much I want to be turning my head. Any bigger and I'd probably at least want a curved screen, so my eyes don't have to change focus as they sweep from corner-to-center. And I probably sit a bit farther back from my screen than most.
Isn't the point of a 'Retina' display such that you _can't_ see the pixels that it's drawing? Not with typical eyesight at typical viewing distances, at least.
I get that, which is why I emphasized that I *do* like to see all the pixels I'm paying for (and that I'm making my GPU push). I get that I'm old school. My username is based on a 320x200 VGA framebuffer mode, after all!
In a perfect world, you're right. We'd all have screens so high-res that we wouldn't really think about pixels. And interactive rendering would use techniques like VRS and DLSS to avoid wasting too much power. That's the way everything seems to be going, for better or worse.
It's based on me using the same fonts I've been using on lower-DPI displays. I know most people believe I'm wrong to do so, and that I don't "get" what Retina displays are all about.
I'm glad you like your monitor. I've seen a 4k 28" monitor that did look awfully nice. Nothing against them. It's just not for me and my usage model.
Regarding the SE. I really like it. A15 will be great if it can last. Especially considering it can possibly sustain higher power draw than Pro iPhones :D :D Last SE was lame as it did not have 3D touch witch takes a ton of space under the display yet still had the same capacity. Camera hardware is quite oudated. If it is the same as iPhone 8 i think an upgrade would be worth that dollar or two. But that old sensor did not support fast capture for DF and other stuff so we will see. The camera bump seems thinner. That might point to an upgrade same as with iPhone 7 to 8.
Regarding the Air… M1 is nice but I feel it is an overkill for most people. Iwould rather save 50 bucks and some battery too. Is this a bit of an balancing act between mini and Air battery life ? Mini having newer more efficient chip while air leveraging bigger thermals ? Also how do you differentiate the iPad Pro now ? Still there were Large chips in 599 and even 499 iPads in the past so why not.
Regarding the studio… Seems like an iMac Pro sliced in half. I do not think thermals is the reason as iMac Pro already could handle 200W easilly I think. This seems kind of an overkill in terms of thermals. I think regullar Mac mini could handle M1 Pro and M1 Max fine. For the ultra ofc but this seems like a different philosophy altogether. Maybe the iMac Pro is not coming back ?
I don't expect the names to be technical. I expect them to be easy to understand!
Right now, Apple has the dumb dumb naming scheme, and they need to ditch it. Look no further than the MacBook Pro M1 Pro. Surely it's professional to have Pro in its name, twice. What a joke
And then, a double Max chip is somehow not a Double Max, but an Ultra? Why would you even call your chip the Max, if it's not really your maximum design?
This is AMD RX XTX, and nvidia RTX Super Max-Q levels of dumb.
Would you prefer they call it an m5 8524HQ like Intel would? It may be dumb, but at least it isn't totally meaningless numbers level of dumb. There are only three chips, with Ultra having two of them. There are a couple versions of some depending on number of CPU or GPU cores.
There are no SKUs for different clock speeds, no SKUs for having various instructions cut out, no SKUs for different levels of TDP, no SKUs for different market segments (mobile, embedded, server, workstation, desktop) and on and on. Apple's CPU family is simplicity itself compared to Intel. If "Macbook Pro M1 Pro" is your biggest objection, I think Tim Cook can live with it.
I am truly amazed for the best type of article, thanks for sharing the beautiful post knowing all about the "peek performance" of the apple that makes me feel enjoy. <a href="http://www.newcastletreeservicepros.com.au/"&...
what an amazing site that I had been visited, I am truly glad for the most beautiful post sharing the best info updates of the blog with "Peek Performance of the Apple" thanks. http://www.newcastletreeservicepros.com.au/
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
79 Comments
Back to Article
name99 - Tuesday, March 8, 2022 - link
" Looks like X series SoCs for the iPads are very much dead, then"It's more than just that! The iPad Air traditionally used the latest phone chip. (iPad Pro used the X chip, and iPad used the phone chip from a year or two ago.)
This suggests one or both of
- M1 chips are remarkably cheap to create. (Good news all round; not least means there's a lot of price margin there to make larger better versions for M2, M2 Max and so on!)
- Apple has grand plans for iPadOS that demand a higher level of performance even at the mid-range.
ZipSpeed - Tuesday, March 8, 2022 - link
Indeed. If the Air is getting the M1, there's a pretty good chance this year's Pro will get a M2.aliasfox - Tuesday, March 8, 2022 - link
Additionally, the fact that iPad Air is getting the full fat, 8 GPU version of M1 suggests that yields are actually very good for this chip - no need to repurpose the binned, 7 GPU versions that are in the entry level iMac and MBA.Doug_S - Tuesday, March 8, 2022 - link
There's no difference between the M1 and 'X' version is there? They just renamed the A14X to M1.pthariensflame - Tuesday, March 8, 2022 - link
Indeed, and the internal codenames confirm it. A12 is H11P, A12X/Z is H11G, A13 is H12P, A14 is H13P, M1 is H13G, M1 Pro/Max(/Ultra, presumably) is H13X, A15 is H14P.melgross - Friday, March 11, 2022 - link
You guys haven’t been paying attention. There are serious differences between the older “X” chips and the M1.ABR - Thursday, March 10, 2022 - link
"- Apple has grand plans for iPadOS that demand a higher level of performance even at the mid-range."Don't think so on this one, unfortunately. The iPads have been ridiculously overpowered for anything but games for quite a while. They seem to not want to cannibalize Mac sales.
caribbeanblue - Thursday, March 24, 2022 - link
They don't really care about cannibalizing their other product lines, they actually *want* to cannibalize them before anyone else other than themselves does it.name99 - Tuesday, March 8, 2022 - link
"2.5TB/sec interposer bandwidth"That's astonishing. Suppose we have 2.5GHz baseline frequency for the connection. That means connection width is 1024B!
Does anyone else have anything like that? It suggests that the units being transferred between cores are something large (pages or quarter pages) rather than just cache lines, which are the obvious transfer unit and what everyone else has used as far as I know.
Jorgp2 - Tuesday, March 8, 2022 - link
Pretty much any MCM GPU would have that level of bandwidth to function, even 10 years ago GPUs had 1TB/s+ internal bandwidth.mode_13h - Tuesday, March 8, 2022 - link
It doesn't follow that you need the same level of MCM bandwidth. So far, the multi-die GPUs have been limited to HPC workloads, where work partitioning might reduce the needs for global data-access (compared to interactive rendering).For interactive rendering, it could be that they've invested a lot in better work scheduling, or just cherry-picking GPU performance numbers from the apps that scale up best.
whatthe123 - Tuesday, March 8, 2022 - link
their benchmark results absolutely massacre AMD and intel in total performance but real world results are just on par except in perf/watt. there's definitely some bottlenecks in there, though some may be in software.mode_13h - Tuesday, March 8, 2022 - link
> their benchmark results absolutely massacre AMD and intel in total performanceWhich benchmarks? You mean their slides, which seem to reference some unspecified workload?
> real world results are just on par except in perf/watt. there's definitely some bottlenecks
Which real world results, and on what hardware? Is someone already posting independent benchmarks of M1 Ultra?
whatthe123 - Tuesday, March 8, 2022 - link
I meant with the M1/M1 max. their SPEC performance is through the roof but I've never seen it perform at that level in actual Mac software. I would expect the same for this chip since its basically two m1 max's connected together.mode_13h - Wednesday, March 9, 2022 - link
> I meant with the M1/M1 max.Okay, I was confused because we were talking about the inter-die interconnect bandwidth.
melgross - Friday, March 11, 2022 - link
Internal bandwidth is very different from inter chip bandwidth. As said here and elsewhere, no GPU manufacturer has been able to do this.mode_13h - Sunday, March 13, 2022 - link
> no GPU manufacturer has been able to do this.Okay, then what *is* the inter-die bandwidth between the dies of AMD's MI200? I haven't seen a spec on that.
Similarly, what about Intel's PVC? Surely, you must know.
mode_13h - Monday, March 14, 2022 - link
> what *is* the inter-die bandwidth between the dies of AMD's MI200?To answer my own question: 400 GB/s (bidir)
https://www.anandtech.com/show/17054/amd-announces...
> what about Intel's PVC?
I didn't find clear specs on Ponte Vecchio, but claims of "over 2 TB/sec connectivity bandwidth".
mode_13h - Tuesday, March 15, 2022 - link
It's worth pointing out that what AMD did was simply use the same Infinity Link ports they use for off-package communication. And either because of that, or vice versa, the MI200 presents itself as 2 GPUs, from a programmability point of view. So, even though the links are cache-coherent, the aggregate inter-die bandwidth is far too low for tasks to be transparently migrated between the dies and not suffer, substantially.It'd be interesting to know how Intel's Ponte Vecchio appears to software, but I *think* it probably shows up as a singe GPU per package.
mode_13h - Tuesday, March 8, 2022 - link
> That means connection width is 1024B!Okay, but GPUs with HBM would often have an 4096-bit wide aggregate connection to their HBM. Is this any different? And AMD's MI200 (MCM) even doubled that to 8192-bit!
name99 - Tuesday, March 8, 2022 - link
It's different because(a) B=BYTE, not bit
(b) the target of HBM is dumb. What's interesting about this size, as I described, is that it's between "smart" components. ie it's not the number of pins that's interesting, it's the protocol driving those pins, the decision as to what blocks of data are transferred when.
mode_13h - Tuesday, March 8, 2022 - link
> (a) B=BYTE, not bitSorry, I missed that.
> it's the protocol driving those pins, the decision as to what blocks of data are transferred when.
IMHO, I don't really see it as materially different than what drives on-die interconnects or how it's not the same sort of cache hierarchy that's in front of HBM. I'll take your word for it, though. I'm hardly a CPU or ASIC designer.
Doug_S - Tuesday, March 8, 2022 - link
They said there were 10,000 I/Os so even with many of them being ground and use of differential signaling the frequency they are running at would be a lot lower than 2.5 GHz.mode_13h - Tuesday, March 8, 2022 - link
> It suggests that the units being transferred between cores are something large> (pages or quarter pages) rather than just cache lines
Well, a 64-byte cacheline is 512 bits. So, maybe the interconnect is a ring bus. That gives you 2 ports @ 512-bits.
Or, maybe each chip has its own ring and you've got 2 or 4 interchange points between them. Or, it could be a full-on mesh.
mode_13h - Tuesday, March 8, 2022 - link
Oops, as pointed out, I misread 1024-byte as 1024-bit.Similar points could still apply, but at a much more massive scale. Would have to be 8 bi-dir links of a mesh interconnect, to still work in 64-byte quantities. Still plausible, I think.
As Dough_S mentioned, power-efficiency concerns would push them more towards 16 bi-dir links at half the clock speed.
gagegfg - Tuesday, March 8, 2022 - link
Mac Studio: up to 128 GB of RAM, trying to replace a Tower that goes up to 1.5 TB of RAM (12 times more)firewrath9 - Tuesday, March 8, 2022 - link
Its not a full replacement, you can still buy the mac promode_13h - Tuesday, March 8, 2022 - link
And, as a matter of fact, they even *said* it's not a replacement. The new Pro is still yet to come.biigD - Tuesday, March 8, 2022 - link
They said at the end that all that's left to transfer to Apple Silicon is Mac Pro, so Mac Studio seems to be an intermediate step, not a replacement.Blastdoor - Tuesday, March 8, 2022 - link
Interesting that they also said this is the last member of the M1 family. I wonder what that means for the Mac Pro. One thing it would appear NOT to mean is four M1 Max chips bolted together. Perhaps the Mac Pro will be based on an M2-derived Ultra Ultra chip?flyingpants265 - Tuesday, March 8, 2022 - link
It'll be two maxflyingpants265 - Wednesday, March 9, 2022 - link
Wow, that was very easy callbernstein - Tuesday, March 8, 2022 - link
it basically just means that the mac pro refresh is further out than the m2 or it will feature custom silicon. i would bet on the latter, there is only so much perf you can extract by stitching soc-die's.sure there is thermal room left for stitching up to 8 m1 max's, but i doubt that would be feasible, more likely they'll go some other route.
given they charge ~$3500 for 114bn transistors, expect a little less than double the transistors (of the mac studio) for the entry model. expandable with likely double or more transistors... this is certainly going to be one hell of a fast machine.
Ppietra - Wednesday, March 9, 2022 - link
For the last year all reliable rumours have pointed that Apple’s strategy is multi-die processors:2 dies - 20 CPU + 64 GPU cores and 4 dies - 40 CPU + 128 GPU cores. Those rumours have been consistently proven correct, so we should expect a 4-die processor next.
name99 - Wednesday, March 9, 2022 - link
We should expect a 4-die processor *at some point*.*Next* is probably M2.
This is not just pettyfogging. Hector Martin (who presumably should know) is convinced that the interrupt controller architecture for M1 can not stretch beyond two chips, it just doesn't have the appropriate bits to specify more than two chips.
Ppietra - Friday, March 11, 2022 - link
I was commenting about the theory that the Mac Pro would have some kind of custom silicon, that it would not be very feasible to have even more dies connectedDoug_S - Wednesday, March 9, 2022 - link
Well what it means is that the Mac Pro will be even more powerful than what four M1 Max would provide. We'll have to see if the M2 Max has more than 8 big CPU cores and 32 GPU cores.I expect M2 to use the cores from the upcoming A16, and be made on N4P. That's a 6% density gain and 11% performance gain (or 22% efficiency gain) versus the N5 process used for the M1 family. Not a lot, but the cores take up such a small percentage of the overall M1 Max die they could bump to say 12 big cores and 48 GPU cores at the same die size if everything else stayed the same.
ABR - Thursday, March 10, 2022 - link
Is dual socket not possible with M1 (Ultra or a modified version thereof)?mode_13h - Friday, March 11, 2022 - link
> Is dual socket not possible with M1name99 said somebody analyzed the interrupt controller and found it's maxed out at 2 dies.
Also, that 2.5 GB/sec wouldn't be possible between sockets. That means there'd be a massive bottleneck that would at least be enough to prevent the GPU dies acting as one.
techconc - Friday, March 11, 2022 - link
Possible? Yes, very likely. Optimal? No, absolutely not.name99 - Tuesday, March 8, 2022 - link
The combination of display + mac studio gives you essentially the equivalent of today's low-end iMac Pro (either the low end or high version depending on Max or Ultimate) at 1500 or more dollars cheaper. It's not exactly a perfect match -- you get more ports, you have two separate boxes rather than just one which is a (very mild!) hassle, but the box can sit below the studio (out of the way) or, (my guess is will be very popular) someone like 12-South will design a shelf to add to the back of the monitor that can hold the Studio box.My guess is that this combination is best viewed as the iMac Pro replacement, not the Mac Pro replacement.
My guess (only a guess!) is that the Mac Pro replacement will be built on using CXL for the large RAM capacities, and that requires CXL support in the chipset, coming with M2.
Bluetooth - Tuesday, March 8, 2022 - link
By having the computer separated from the monitor allows us to upgrade the computer faster, as these high quality monitors easily last 10 years.name99 - Wednesday, March 9, 2022 - link
That's always been the case; you could buy a mac mini with a separate monitor.The real point of a 27" iMac or Pro was that you got a superb monitor, not just some random crap. Now that Apple is back in the standalone monitor game that becomes less essential.
I don't think it's fair to say that iMac (and even more so iMac Pro) were a failure; it's more that they were appropriate to their time. You want a big monitor, now you have all this space right behind the monitor, how about using it? When the computation of an iMac Pro (8 to 16 cores, large dGPU) could not fit in a mini-sized box, that was a more desirable alternative than a separate tower.
Now that we can fit that power in a "large" mini, things are somewhat different. In particular I can imagine (certainly this is what I plan for myself) a shelf attached to, but sitting behind the monitor that holds the Mac Studio on its *side*. Apple logo faces outwards, power and one set of ports on the left, other ports on the right. This gives you something very like the iMac Pro (fatter in the middle, thinner at the edges) perfectly feasible if your desk situation has the appropriate amount of depth (as mine does).
But it's feasible because
(a) Apple is back in the monitor game
(b) the compute guts fit in a small enough box, rather than requiring the area of a 27" screen to hold the dGPU and separate components enough.
mode_13h - Thursday, March 10, 2022 - link
> having the computer separated from the monitor allows us to upgrade> the computer faster, as these high quality monitors easily last 10 years.
The 3 PCs I own that I use most often are all >= 10 years old.
I was also using CRT monitors (I had a pair of > 15-year-old Sony GDM-FW900's) until summer of 2020.
And the Samsung 1440p monitor I use at work is 11 years old and still hanging in there. I have a secondary Dell monitor that's 1600x1200 20" from 2007. Its main downside is the amount of heat it puts out, in the summer (CFL backlight, FTL).
I take your point, though. The machines I use at work are a bit newer.
mode_13h - Tuesday, March 8, 2022 - link
> Pro replacement will be built on using CXL for the large RAM capacitiesYeah, CXL memory would make sense, here.
So, how big are the page sizes? 64 kB? That's probably still in the sweet spot for migrating between in-package and CXL memory. I think you'd want the transfer time to be some low multiple of the latency, in order to limit the overall penalty of a page miss on the in-package RAM.
pthariensflame - Tuesday, March 8, 2022 - link
Page size on Apple Silicon is 16k, except in Rosetta userspace where it's 4k in the CPU but still 16k in the rest of the chip.mode_13h - Tuesday, March 8, 2022 - link
Okay, so that'd be 256k pages/sec @ PCIe 5.0 x1 speeds (~= CXL 1.x ?). So, transfer time of ~4 usec.Once source I found indicates estimated latency of CXL 1.1 > 100-150 ns. So, it's about 25x to 40x of that, however I think that estimate doesn't account for bus or possibly device-side latency. Real world end-to-end latency for CXL 1.1 might be closer to 1 usec?
https://semiengineering.com/latency-considerations...
Should be noted they estimate CXL 2.0 latency to be much lower, at which point 4kB pages start to look like a good sweet spot.
Doug_S - Wednesday, March 9, 2022 - link
I've long been skeptical of people's fascination with CXL. What the hell's the point of using it versus using DIMMs?Apple has scaled LPDDR5 all the way up to the Mac Studio. I see no reason why the Mac Pro won't use it as well. It will be upgraded to LPDDR5X, which will bump memory bandwidth by up to 33% (assuming they use the fastest currently available LPDDR5X) and larger LPDDR5/5X stacks could create Mac Pros with multiple TBs of DRAM. The only disadvantage is the lack of post-purchase upgradeability, but you will get 2 TB/sec of memory bandwidth, which isn't available on any x86 platform at any price (though it remains to be seen whether the M2 Max/Ultra will be able to fully exploit it)
mode_13h - Wednesday, March 9, 2022 - link
> I've long been skeptical of people's fascination with CXL.It's cache-coherent across multiple devices and CPUs. That lets you put large memory pools that GPU-like accelerators can access, directly. And it's lower-latency than PCIe.
Also, it gives you a way to scale out memory, so you can have another tier beneath what's directly connected to CPUs, but still much faster than NAND flash.
> larger LPDDR5/5X stacks could create Mac Pros with multiple TBs of DRAM.
Not sure about that. Are you aware that the signals & power for each die need to be tunneled up through vias? And the taller your stack, the more of those you need, hurting area-efficiency of the lower dies? Plus, if the stack is too thick, perhaps you might have trouble cooling it.
> you will get 2 TB/sec of memory bandwidth, which isn't
> available on any x86 platform at any price
Sapphire Rapids has been announced with HBM. Ian estimated "between 1.432 TB/s to 1.640 TB/s":
https://www.anandtech.com/show/17067/intel-sapphir...
Doug_S - Wednesday, March 9, 2022 - link
> It's cache-coherent across multiple devices and CPUs. That lets you put large memory pools> that GPU-like accelerators can access, directly. And it's lower-latency than PCIe.
But that's irrelevant for Apple, as they already have cache coherency licked across multiple dies thanks to the M1 Ultra's 10,000 I/Os. They don't care about external accelerators since the GPU and NPU are built in - all indications are they have no plans to support third party GPUs at all. If they offer PCIe slots at all, it will probably be x4 slots only for internal SSD expansion or networking too fast for TB4 like 100GbE. No need for special memory solutions there.
So what's the advantage to APPLE to use CXL? Still none that I can see.
> Not sure about that. Are you aware that the signals & power for each die need to be tunneled
> up through vias? And the taller your stack, the more of those you need, hurting area-efficiency
> of the lower dies?
I'm talking about stuff that already exists. Samsung already offers LPDDR5/5X packages with up to 32 devices. At the current 16Gb generation that's 64GB per package, allowing for 1 TB in a four SoC Mac Pro using their current layout of four packages per SoC. 4 TB down the road as DRAM gets denser, maybe more if even larger packages become available. You think 16 packages of LPDDR5X is a cooling problem, why would CXL (using higher power standard DDR5) be immune from that?
It isn't like this has to go in a laptop where weight is a concern. You stick a wide base heatsink over the entire 4 SoC 16 package complex, with a fan sized to match - a huge but relatively low rpm fan which will be nice and quiet. If Intel and AMD can cool 300+ watts in a single chip, why would it be harder for Apple to do so with that heat spread across 20 different chips??
name99 - Wednesday, March 9, 2022 - link
I mentioned CXL specifically for the Mac Pro as CXL.memThat I am fairly confident of.
But more generally, CXL solves another problem that may (or may not?) be real for Apple.
Right now Apple have CPU, GPU and media tied tightly together. But for many (most?) use cases, people don't want this tying. I want more CPU, you want more GPU, she wants more media. Ultra is a good quick way to try to hit the central spot, but an alternative is to provide a good base with lots of CPU and "minimal" other (NPU, GPU, media), but provide one or more equivalents of an Afterburner card that provide extreme levels of GPU/Media/Other accelerators, all essentially transparent to the system, via CXL.
Apple is not doing all this just to make the current faithful happy; they are on track to conquer the world. But part of conquering the world is being a little more flexible in providing different configs for different users. I'm extremely impressed with how they've (with minimal "disruption" and "engineering overhead" scaled by 16x from A14 to M1 Ultra) but it has been at the cost of tying all the accelerators together; CXL may provide a route out of this.
Even for internal use they surely need this? At some point their data centers are going to move to Apple Silicon, and those data centers are going to want SoCs with 128 cores on them, not SoCs with 8+2 cores and a massive amount of GPU+media capability.
mode_13h - Thursday, March 10, 2022 - link
> But that's irrelevant for Apple, ... They don't care about external acceleratorsThe Mac Pro & its users probably will. Even if not GPUs, there are still other PCIe cards people will want to use in there. However, I'm not necessarily saying they'll support CXL cards, just that it'll likely still be a tower-style machine.
> So what's the advantage to APPLE to use CXL?
That's a more specific question than you asked before. I'm with name99 in that I think CXL.mem could make sense for them.
> I'm talking about stuff that already exists. Samsung already offers
> LPDDR5/5X packages with up to 32 devices.
Got a link to that?
> allowing for 1 TB in a four SoC Mac Pro using their current layout of four packages per SoC.
You'd sure hope none of it goes bad! That's another nice thing about DIMMs.
> cooling problem, why would CXL ... be immune from that?
Because the individual DIMMs are in open air, and the DRAM dies are only packaged with other DRAM dies, rather than also hot CPU/GPU dies.
> If Intel and AMD can cool 300+ watts in a single chip, why would it be harder
> for Apple to do so with that heat spread across 20 different chips
If the stack is too thick, I'd be concerned about those dies towards the bottom. It's not only about heat dissipation, but also the thermal gradient you need to sustain.
name99 - Wednesday, March 9, 2022 - link
That's because you think of DRAM as DRAM. But when you are using DRAM in 1.5TB quantities, you are not using it as DRAM, you are using it as fast storage, either as a full in-memory database, or as something to hold your indexes or whatever that point into your database. In both cases, the required performance is more like IO speeds than DRAM speeds, and the segregation from "working" DRAM is fairly trivial, you don't need a clairvoyant system to know how to move pages between fast and slow DRAM.aliasfox - Tuesday, March 8, 2022 - link
They haven't replaced the Mac Pro yet - it's still in the lineup, and they clearly said that updating the Mac Pro is "for another day."The Mac Pro probably had to stay in the lineup because they couldn't figured out how to charge $700 for Mac Studio wheels.
Hifihedgehog - Tuesday, March 8, 2022 - link
" Looks like X series SoCs for the iPads are very much dead, then"The vanilla/entry-level M series was a rebrand of the AxX series chips.
mode_13h - Tuesday, March 8, 2022 - link
I can imagine Apple looking at Intel and saying "Now *that* is how you scale up a NUC!"This makes an utter joke of Intel's "NUC Extreme" products.
mode_13h - Tuesday, March 8, 2022 - link
I'm a little surprised the display is so small. In my experience, even 32" is a little small for 4k.Of course, I'm no Mac user. I want to actually *see* all the pixels I'm paying for (and forcing my GPU to draw).
Bluetooth - Tuesday, March 8, 2022 - link
If you want to see all the pixels then go for a 40" 4K, but good luck working on it.But I agree with you, I was hoping for a 32" 5K screen.
shank2001 - Tuesday, March 8, 2022 - link
What are you talking about "good luck working on it". I have a Dell 43" U4320Q 4K monitor, and it is an absolute JOY to work on! It is the perfect size for 4K in my opinion! Fantastic screen real estate, and no need to scale anything at all! Perfect pixel pitch. For those knocking these larger size 4K monitors, they haven't actually tried them! I could never go back to smaller, now!Frederick Arthur - Tuesday, March 8, 2022 - link
Oh,Dell 43" U4320Q 4K monitor is very expensive!Since my desk is not very big, I can't fit such a huge monitor, and my budget is limited.I use a 27inch 4K monitor with 150% scale, its my first time to use 4K monitor.Can you describe to me what it's like to use such a large 4K monitor? Especially when compared with 32-inch and 27-inch monitors. I will be very grateful!
mode_13h - Tuesday, March 8, 2022 - link
I have a 32" 4k screen and it's already at about the limit of how much I want to be turning my head. Any bigger and I'd probably at least want a curved screen, so my eyes don't have to change focus as they sweep from corner-to-center. And I probably sit a bit farther back from my screen than most.aliasfox - Tuesday, March 8, 2022 - link
Isn't the point of a 'Retina' display such that you _can't_ see the pixels that it's drawing? Not with typical eyesight at typical viewing distances, at least.mode_13h - Tuesday, March 8, 2022 - link
I get that, which is why I emphasized that I *do* like to see all the pixels I'm paying for (and that I'm making my GPU push). I get that I'm old school. My username is based on a 320x200 VGA framebuffer mode, after all!In a perfect world, you're right. We'd all have screens so high-res that we wouldn't really think about pixels. And interactive rendering would use techniques like VRS and DLSS to avoid wasting too much power. That's the way everything seems to be going, for better or worse.
name99 - Tuesday, March 8, 2022 - link
Isn't it nice, then, that you can use a 3rd party monitor if you prefer...mode_13h - Tuesday, March 8, 2022 - link
Of course. But Apple has that reputation for being top shelf, so I'm just a bit surprised. It's not like they haven't made 30" or 32" screens, before.Frederick Arthur - Tuesday, March 8, 2022 - link
hi mode! I use a 27inch 4K monitor .Frankly speaking, I think it is good. I will appreciated if you can tell me why 32inch is too small for 4k?mode_13h - Tuesday, March 8, 2022 - link
It's based on me using the same fonts I've been using on lower-DPI displays. I know most people believe I'm wrong to do so, and that I don't "get" what Retina displays are all about.I'm glad you like your monitor. I've seen a 4k 28" monitor that did look awfully nice. Nothing against them. It's just not for me and my usage model.
Frederick Arthur - Wednesday, March 9, 2022 - link
Thank you very much! Have a nice day !GC2:CS - Tuesday, March 8, 2022 - link
Regarding the SE. I really like it. A15 will be great if it can last. Especially considering it can possibly sustain higher power draw than Pro iPhones :D :D Last SE was lame as it did not have 3D touch witch takes a ton of space under the display yet still had the same capacity. Camera hardware is quite oudated. If it is the same as iPhone 8 i think an upgrade would be worth that dollar or two. But that old sensor did not support fast capture for DF and other stuff so we will see. The camera bump seems thinner. That might point to an upgrade same as with iPhone 7 to 8.Regarding the Air… M1 is nice but I feel it is an overkill for most people. Iwould rather save 50 bucks and some battery too. Is this a bit of an balancing act between mini and Air battery life ? Mini having newer more efficient chip while air leveraging bigger thermals ? Also how do you differentiate the iPad Pro now ? Still there were Large chips in 599 and even 499 iPads in the past so why not.
Regarding the studio… Seems like an iMac Pro sliced in half. I do not think thermals is the reason as iMac Pro already could handle 200W easilly I think.
This seems kind of an overkill in terms of thermals. I think regullar Mac mini could handle M1 Pro and M1 Max fine. For the ultra ofc but this seems like a different philosophy altogether. Maybe the iMac Pro is not coming back ?
meacupla - Tuesday, March 8, 2022 - link
Seriously? M1 Ultra? Now apple is copying intel/AMD for garbage confusing naming schemesmode_13h - Tuesday, March 8, 2022 - link
You really expect Apple, of all companies, to be more technical in their names? Seriously?I mean, I'm almost surprised they release as many tech specs as they do. Not too surprised, these days, as I guess it'd be hard not to brag.
meacupla - Tuesday, March 8, 2022 - link
I don't expect the names to be technical.I expect them to be easy to understand!
Right now, Apple has the dumb dumb naming scheme, and they need to ditch it.
Look no further than the MacBook Pro M1 Pro. Surely it's professional to have Pro in its name, twice. What a joke
And then, a double Max chip is somehow not a Double Max, but an Ultra?
Why would you even call your chip the Max, if it's not really your maximum design?
This is AMD RX XTX, and nvidia RTX Super Max-Q levels of dumb.
Doug_S - Wednesday, March 9, 2022 - link
Would you prefer they call it an m5 8524HQ like Intel would? It may be dumb, but at least it isn't totally meaningless numbers level of dumb. There are only three chips, with Ultra having two of them. There are a couple versions of some depending on number of CPU or GPU cores.There are no SKUs for different clock speeds, no SKUs for having various instructions cut out, no SKUs for different levels of TDP, no SKUs for different market segments (mobile, embedded, server, workstation, desktop) and on and on. Apple's CPU family is simplicity itself compared to Intel. If "Macbook Pro M1 Pro" is your biggest objection, I think Tim Cook can live with it.
Frederick Arthur - Tuesday, March 8, 2022 - link
Well,I think the SE is too expensive.Doug_S - Wednesday, March 9, 2022 - link
Its $30 more than the previous model, that's hardly a big price increase.kath1mack - Thursday, April 14, 2022 - link
Interestinggenbel - Thursday, April 21, 2022 - link
I am truly amazed for the best type of article, thanks for sharing the beautiful post knowing all about the "peek performance" of the apple that makes me feel enjoy. <a href="http://www.newcastletreeservicepros.com.au/"&...genbel - Thursday, April 21, 2022 - link
what a beautiful post of the best type of article that I really appreciated, I just want to say thanks for sharing the beautiful post of the apple best performance that looks very impressive. http://www.newcastletreeservicepros.com.au/">http://www.newcastletreeservicepros.com.au/genbel - Thursday, April 21, 2022 - link
what an amazing site that I had been visited, I am truly glad for the most beautiful post sharing the best info updates of the blog with "Peek Performance of the Apple" thanks. http://www.newcastletreeservicepros.com.au/