It’s very possible this is built using N3E, especially at the relatively low volumes the MBP lineup has. The cost of N3B is also pretty high vs N3E, and I don’t see Apple rushing to adopt a high cost manufacturing process for the M3 series that won’t see 2025 as TSMC has already confirmed they will be moving past it.
Don’t forget that the entire 3nm lineup from TSMC has been delayed, so Apple would’ve already been designing for a more mature 3nm process. Don’t have specifics on N3E mass production, but if it was July/August it could line up.
Well they also made some cuts especially on M3 Pro on core counts and down to a 192 bit memory bus, maybe to make up for the cost. They usually mention gen 2 3nm process or something telling but Apple-like vague like that.
TSMC's first public statement about N3 timelines was during their Q1'20 earnings call, where they indicated that it would reach volume production in H2'22. TSMC eventually announced that milestone on December 29, 2022. N3 was not late.
TSMC has always said that N3E would follow one year after N3, reaching volume production in H2'23. During their Q3'22 earnings call, C.C. Wei indicated that N3E might be pulled in 2 or 3 months, which would place it in the Sep-Oct 2023 timeframe. In their most recent earnings call on Oct 19, 2023, TSMC confirmed that N3E would reach volume production in Q4 of this year.
Ignore the media and analysts and pay attention to what TSMC is actually saying. The volume production milestone is a trigger for revenue recognition in subsequent quarters. TSMC is therefore very transparent about that, they have to be.
I really like that they further diverged M3 Pro and M3 Max. Because they went all-in in M3 Max, which in turn will do wonders for the eventual M3 Ultra. Makes sense.
I think people thought the M3 Pro would be the new vanilla M3. It will be very interesting to see the performance results. If the M3 Pro can beat the M2 Pro with fewer performance cores, then it bodes well for the entire M3 architecture as a whole.
I would still love to see a deep dive into the core architecture details - instruction decoders, cores, and reorder buffers. Will Anandtech be able to figure any of this out, or will it require documentation from Apple?
Apple: We here you, here's 192 bit in M3....Pro....
Me: Nooooo!
M3 Pro got a few cuts. I wonder if the binned chip is even faster than the full core count bin of M2 Pro. They only claim 10% faster from the full bin M3 Pro to the M2 Pro. Also only 1 (!) more P core than the M3 in the weird 11 core model. But once you upgrade the base M3 model to 16GB, the M3 Pro model is sitting right there anyway, in their typical price laddering.
1 Apple made a major change in how it produces the chips. Previously, they designed the Max chips, so that the bottom half of the GPU section, system level cache, and memory controllers could be chopped off to make the Pro, getting two products from one mask.
The M3 Pro is his own design, with a completely different layout, thus requiring apple to spend money to lay out and make a mask for a separate chip. This is very expensive however, slicing off a chunk of silicon for each Pro chip was also expensive, especially since they probably had to cut off some silicon with no defects to meet the demand for previous Pros.
2 Previous high-end M chips thus had identical, dual-cluster performance blocks with four cores each and an efficiency cluster with two and then four cores. From the die shots, you can see they increased to up to six cores per cluster. The M3 Max still has two performance clusters, and one efficiency cluster.
M3 Pro has the six performance cores on only one cluster. In the Anandtech Article on M1 Max performance it was shown that each cluster could draw somewhat above 100 GB per second. That goes to why the M3 Pro has a reduced capacity memory interface. With only one cluster, the CPU can’t use 200 GB per second.
I plan on scrounging the money together for an M3 Max for running local high parameter large language models. My current conditions require my machine to be unplugged a large portion of the time, maintaining full performance unplugged is important to me.
128 GB of unified memory will allow running the biggest models with GPU acceleration. I am very disappointed that they didn’t go with LPDDR5-9600 though. Neural networks are memory bandwidth limited and faster memory would have made a huge difference in inference speed.
You're exactly right that the M3 Pro is a new design, rather than a chop of the M3 Max. However, with the M1 Pro and M2 Pro, while the design was a chop of the Max layout, Apple produced separate tape-outs and masks for the Pro and Max dies. They weren't cutting down Max silicon to make Pros.
Because design costs are so insane and the engineering team only has so much bandwidth, Apple went the chop route for the first two generations, even though that left them with a mid-range chip that wasn't terribly well optimized. Now they've gone all-in with three separate layouts. And while many folks don't seem terribly pleased with what Apple is optimizing for here, in the end, Apple is in the business of selling Macs, not M-series chips.
Also, we can clearly see the effects of analog not scaling as well as logic for N3. With a chop, you end up with a ton of wasted shoreline. Even after cutting the memory interface down by 25%, the real estate at the edge of the M3 Pro die is completely utilized.
The Max (and previous Pro) layouts use double-stacked DRAM controllers. The M3 Pro shifted to a single / linear controller layout like the regular M chips. Probably makes fan-out a heck of a lot easier.
I also just noticed that Apple didn't bother to transpose the die shot before superimposing it on the package render for the M3 Pro. So there's 2 memory interfaces on the side with 1 LPDDR package and 1 interface on the side with 2 packages. They got it right for the M3 though. Huh.
Furthermore the A and M series are diverging on architectures, the M3 dynamic caching GPU is not on A17, and the A17's twice as fast 35TOPS neural engine is not on M3, so it seems like the direction is more bespoke architectures for everything, which can be good for us (except all the weird chops to M3 Pro right now)
Is the neural engine twice as fast though? Or is it just Apple citing INT8 vs INT16 TOPS? Qualcomm was recently touting INT4, to make sure they had the biggest number.
This is only a difference if the M3 doesn't support INT8 for some reason.
After getting humiliated by their performance expectations with RTX 3090 and when ADL and Zen 4 launched getting butchered, Apple learnt it the hard way and now they are being reluctant on giving those expectations.
For what it's worth they should first get rid of the notch on the MacBook and they should offer a PCIe solution for the NVMe not the soldered awful design.
Second part the absolutely bonkers size of these processors is insane for reference 13900K has 25B transistor count on 7nm, and still Apple cannot beat that with Unified Memory. Then we have a 7950X from 2x 71 mm² for the CCDs and 122 mm² on the IODie and 13.5B transistors total. And no way this M3 Max will beat a 7950X in SMT performance. 78Billion Transistors for GA102 RTX4090, don't even dream about the performance. Then we have in socket AM4 upgrade M3 Pro / Max is outdated when we compare vs an x86 desktop in before anyone jumps, Apple sells iMac, Mac Mini with these, so they should be compared to the x86 desktop processors and not the BGA jokes.
As for RT it's a gimmick, even Nvidia knows by 2035 only RT will be effective until then it's all Upscaling nonsensical garbo. It's Apple chasing the trends they are never going to catch up with Nvidia, who is leading that with TAA cancerous upscaling on DLSS and fake frame insertion. Shame Apple also felt the need to jump into that rather not. Esp with their zero Vulkan support.
Just to clarify: You seem overly concerned about transistor count on the M3s in comparison with a 13900K. Remember that these are not comparable chips in any way: Apple Silicon is a proper SOC, with many units and controllers (including a comparatively powerful GPU, media encode/decode, and ML accelerator) that are either absent or much less powerful on the 13900K.
Hardware RT is hardly a gimmick in many industries including mine (high-end 3D animation). In my render engine of choice (Redshift), the M3 Max is ~2.5x faster than the M1 generation, according to early reports (of course, we'll have to wait for more testing). That promises to be huge for Mac-based 3D artists, and threatens to close the gap (in follow on variants like the M3 Ultra) between Apple Silicon based GPUs and more recent NVidia GPUs.
The price of M series machines is far more exceeding than a PC counterpart and a total transistor count of 96B on M3 Max vs 7950X of 14B and 4090 with 78Bn will absolutely result in a blood bath. And once AM5 socket Zen 5 comes, say goodbye to the Mac Desktop.
Second the count of transistors on this complex N3B is reflecting in their price premium, look at M3 pricing it's off the charts. M3 Max starts at $4000, for that money I can buy a top end X670 motherboard + 4090 ROG Strix OC and a 7950X which will blow this thing out of water, as I mentioned once Zen 5 launches, R9 8950X will wreck havoc.
You are talking about a Workload in rendering not gaming, I was talking about gaming performance and if you want to include that rendering, there's no way a 4090 can be beaten by M3 parts and second the Hollywood render farms run on HPC like infrastructure, for eg a Transformers Revenge of the Fallen CGI took days to render here's information.
"Rendering the Devastator took over 85% of ILM's render farm capacity, and the complexity of the scene and having to render it at IMAX resolution caused one computer to "explode". Digital Domain handled work on secondary characters, including the transformation of Alice from her human disguise to her robot self. The beginning showing a close-up of her face as the skin broke apart took five animators three months to finish."
You think seriously that people are going to blow their money on these ARM Soldered machines ? M2 Ultra is not going to a beat an ILM render farm and no way it is going in a cluster.
ARM is only for portables, it is never going to scale to this level of Desktop performance nor the real world HEDT class work. I did not even talk about Threadripper Pro 7000 series.
The M3 Max starts at $3,200 in the 14" laptop and will likely again be $2,000 in the Studio desktop. So your comparable budget for just a PC motherboard, CPU, and GPU is more like $1,500, so definitely no 4090 ROG Strix OC.
And that $2000 M3 Max can beat relevant similar budget PC ? Once the Mac Studio launches new CPUs will launch and in a year new GPUs will launch and 4090 will be 1/2 the price like 3090Ti. Meanwhile the BGA studio will be DOA.
I don't know mate, but most of us are disappointed with the Apple A17, M3 and extremely disappointed with the M3 Pro and M3 Max.
We assumed that Apple was working on a new microarchitecture, that means faster performance and lower energy use. Their first gen product was from 2019 microarchitecture, with the follow-up being mostly a refresh without major differences.
On top of this, is the full-node jump to the TSMC-3nm lithography. We were promised upgrades around the +30% improvement, which is great but not abnormal for full-node jumps.
All up we should've seen a slight energy decrease and a massive performance uplift. Think of the likes of 2016 14nm Cortex-A73 versus the 2019 7nm Cortex-A76.
We didn't see that with the A17 which is a shame. But let's assume and chalk it up to TSMC having bad yields. For instance, their immature lithography may be leaking energy, so for stability reasons, the system may have been tuned to increase the Voltages coming at the expense of performance and efficiency. Even still, we should've seen something more than a 6%-9% uplift (or claimed 10%). Because the microarchitectural uplift should have enabled that, so maybe a more modest +15% wouldn't have been wrong. And remember this is for iPhones and iPads, products that are refreshed yearly and are easily disposed/upgraded by consumers. So a one year "meh" cycle would have been understandable due to the TSMC troubles.
But then you get to the Base M3 and things look less impressive. These are on iPad Pro and MacBook Air. They aren't replaced as quickly. Perhaps Apple could've done something, maybe lower the prices slightly. Like pass on the savings they get from TSMC not fulfilling their part of the contract. So Something. Yet still, we have to understand these in the context that these are the "low cost" options.
There is ZERO excuse for the M3 Pro, M3 Max, and M3 Ultra. Apple SHOULD have delayed their launch. They should've been built on the proper/working 3nm silicon, that doesn't leak or need to have voltages raised for stability. Or just fulfills the promises of a +30% uplift. And then mixed in with a microarchitectural upgrade that is long overdue. So another +10%-30% uplift, totalling anywhere from a +40% to 55% upgrade. And remember these are not exorbitant, these are the industry standard, since we haven't had a proper upgrade for many years.
The M3 Pro in particular looks bad. At least the last refresh we saw the M2 Pro give some competition to the M1 Max. Now it looks like M3 Pro is BARELY equivalent. All of this points to one thing: GREED. Apple was very innovative with the A13 chipset (iPhone 11), Base M1 (great value MBA), and the Late 2021 M1 Max (performance and efficiency). The slight refresh and upgrades were good, not great, but not meh. We saw the release of A15 (iPhone 13), M2 (value MBA 15), M2 Pro (efficiency), and M2 Max (performance). This slowdown shows that Apple is resting on its laurels. That means any architectural breakthrough they have in their labs, they will parcel away into smaller upgrades to trickle feed for future products. Quick example, their laptops have a NOTCH for the FaceID module but they haven't actually put the hardware inside it. They've done this trickle-feed in the past, until competition forced them, or leadership. But with Steve Jobs gone, it seems corporate greed is the main driving force and not innovation.
I don’t know. Given the limited uplift with N3B for the A17, I’m not disappointed with the improvements to the M3 line at all. Further, I think the rebalancing of the M3 Pro makes a lot more sense. There was nothing but a GPU difference before. The new M3 pro will be more efficient and likely a more suitable competitor to something like the Qualcomm X elite chip.
The M3 Pro is likely a well-balanced chip for a lot of users: it has a lot more grunt than the regular M3, but is still very efficient. Judging from the benchmarks, Apple really seems to have invested in energy-efficiency, which results in sizable uplifts in multi-core performance.
I don't know: while the A17 Pro's CPU improvements have been behind expectations, the M3 series delivers IMHO. 15 % gen-over-gen improvements are in line with the competition (and what TSMC told us to expect), the single-core performance cores are on par with Intel's and AMD's *desktop* CPUs (without OC, obviously), and in terms of power efficiency they are still in a class of their own. In fact, power efficiency seems to be the biggest improvement of them all as the M3 Max has a multi-core performance that is of the same order as the M2 Ultra. I wouldn't call that disappointing.
The M3 Pro is an opinionated design: Apple could have continued with how it designed the M3 Pro by cutting down the M3 Max. But they haven't. I don't think that's a failure at all. Perhaps it doesn't suit your needs, but the fact that the M3 Pro has *less* transistors than the M2 Pro (e. g. by giving the SoC no excess memory bandwidth) might translate to energy savings.
How can you be disappointed with the M3 Max? Sure it's expensive, but it's delivering M2 Ultra performance in a laptop! Im my opinion, the 16" MacBook Pro with M3 Max chip is the best laptop on Earth right now. It is a real PRO machine for PRO workflows. Well done Apple!!
ARM is already in the serverroom battling x86 - look up Graviton from Amazon, or the Ampere that runs in Azure. ARM can indeed battle your precious x86 CPUs. The future is about power per watt and the environment, here ARM has a very strong case compared to Intel (and so does RISC V, and Apple actually).
Who denied that ? I already mentioned Graviton in the Qcomm article. And that's the only relevant one. There's no competition other than that, plus AMD's Bergamo is already high density Zen 4c without SMT and that beats the ARM offerings. The point is about Personal Computer and ARM cannot do what x86 does which is large OS and Software support plus freedom. ARM junk in Android you need blobs else the baseboard is going to dumpster. Meanwhile x86 is not relegated to such restrictions. Same for Apple M series or whatever series they are tailored only for specific OS and Software work.
Apple is not even remotely related to HPC and will never be with the TSMC wafer cost for their precious high transistor count processors they cannot scale to the level of x86.
so do you specify "HDET class work" because the M series have already caught up to regular desktops, and that doesnt fit with your world view?
Because last time I checked, M series performance is actually quite good, ad the M2 ultra (in ?ARM optimized apps) give the likes of AMD, intel, and nvidia a run for their money.
Yes, people are blowing their money on these ARM soldered machines: https://www.idc.com/getdoc.jsp?containerId=prUS510... Apple is the only major vendor whose market sharing and shipments are growing in an era of industry contraction.
I'm also not talking about gaming. Macs suck for gaming, not because they are not capable (M-series chips are pretty great for gaming workloads), but because there are very few triple A games that have been ported to Mac. Apple also seems determined not to do the things that game developers would need to change that situation (buying game studios, paying for ports), so I don't see that changing.
Many big productions still render on CPU, since GPU render engines are considered less mature, and GPU memory limitations make large scenes and complex sims impractical. You know what machines don't have those limits? Apple Silicon Macs, where rendering and simulation can take advantage of all the system's unified memory.
I'm not saying that people will build render farms from Macs (those will still be linux workstations). Just that they have some significant advantages, which you seem determined to avoid acknowledging.
I find this conversation really odd. Is the original comment author a child? Stinks of 'PCs are better' without much forethought into how Macs are used. The benchmarks for the M3 Max are genuinely impressive, showing it as being right yup there with the 14900K DESKTOP processor. All this coming from a laptop chip chewing 60 watts, unplugged and on battery. Anyway, it's easy to get caught up in these conversations with comments like "once XYZ comes out, say goodbye to the Mac Desktop". Really? We will say goodbye? What will happen? Will Apple cease production of desktops because AMD has a new chip? You think people will mass-migrate over to Windows because AMD has a new chip? Such odd comments. Mac will continue to be Mac, regardless of whether AMD, Intel or NVIDIA has faster hardware. It's a mindset that PC/Windows/Android folk never grasp. And on the outside it may seem illogical, but for Mac users, there's more to a user-experience beyond the specs and numbers. Some people seek more from their experiences using a computer than seeing high FPS. Some of us just need to get lots of work done. I run a digital agency and build apps for companies with hundreds of millions of dollars, and my M2 Max MacBook Pro is the sole piece of hardware that makes it possible. As an aside, it's worth noting that NVIDIA is working on an Arm SoC for PCs. I wouldn't be so condescending towards ARM just yet.
How is Apple ever humiliated on GPU performance when Apple only presented slides based mostly on NON Gaming Graphics workloads for Apple M2/earlier M series series SOCs. Apple's M2 and earlier presentations never focused primarily on gaming for Apple's GPU performance slide materials so that's Professional Graphics Workload based where FPS performance is not an issue. It's only the Tech Press that focused on gaming as Apple's M2/Earlier M series drivers are more tuned/optimized for Professional Graphics workloads and not FPS/Gaming workloads.
And Apple's only just recently announced optimizations and SDKs more geared towards gaming workloads on Apple Silicon. But I can see where Apple's marketing has stated in an unqualified manner that Apple's GPU was comparable to Nvidia's offerings there and I never take Marketing seriously to begin with as marketing is rarely facts based by design!
a) You can't compare the transistor count of an M3 SoC to that of a 13900K. It's apples and oranges --not a meaningful comparison. You'd instead need to compare it to a 13900K + discrete GPU + SSD controller + etc., etc. b) Even with that, the M3 will be larger, for a key reason you ignore: Efficiency. Consider the GPU. Apple uses more GPU cores run at lower clocks to obtain higher efficiency. And, for the CPU, they use larger cores to obtain higher IPC so they can run those cores at lower clocks. An M2 running at 3.7 GHz has the same single-core Geekbench 6 score as an AMD Ryzen 9 7900 running at 5.4 GHz.
Apple has been "humiliated" by matching the per-core performance of Intel's 14900K on its entry level part. They must also be "humiliated" that it achieves half the multicore performance of the 24-core 14900K, despite having only 8 cores.
Wow. What a barely coherent rant comparing wildly different hardware and making blatantly bonkers statements.
I've got news for you: every pixel in every frame of every videogames is fake (for some definition of the word fake). Rasterization is a cheap way to fake "real" light transport. When you see a brown pixel, it's a fake brown. There are only red, green, and blue pixels. No brown. It's fake.
The entire obsession with graphics is a gimmick. Everyone has been convinced that better graphics = higher realism = more emersion = more entertaining. It all falls apart if you think about it for more than 3 seconds.
If emersion was the only determining factor in entertainment, then why is VR still struggling to take off? Why do people love Minecraft or Among us so much?
If realism equated to emersion, then how do people get lost in abstract games like chess? Do you think they'd enjoy chess twice as much if their chess pieces were twice as detailed?
You've been duped into believing that the drastically diminishing returns of enjoyment graphics adds to gaming is way more important than it is.
Apple did get rid of the notch. I bought a M2 Max 14 inch MBP this summer and don't remember seeing the notch after my first day and I use it daily. Same with my iPhone 13.
Apple has stated in slides that the Instruction Decoder resources are wider and the execution engine has more resources as well for the A17 P core. But Apple's refusal to even provide basic CPU and GPU resources information should be maybe resulting in the Tech Press refusing to give Apple any Press coverage unless Apple provides CPU/SOC information more in line with the Industry norms.
Qualcomm and it's Oryon Core needs some CPU core Block diagrams and even though Arm Holdings provides some limited Block Diagrams for it's in-house reference CPU core designs even ARM Holdings' materials there fail to explicitly state CPU core Instruction Decoder Width/Count and Micro-op issue/retire rates at least for the slides that ARM Holdings uses for its announcement presentations. And iGPU information in the form of Render Configurations such as: Shaders:TMUs:ROPs and RT cores and Matrix Math Cores is difficult to come by as well for most ARM Based SOCs with iGPUs.
Does Apple even publish, outside of an NDA, any CPU core performance counter information/guides as the Benchmark software has to have access to internal performance counters to do any latency and cache performance calculations and other such testing.
Well, these other companies have to sell their chips so they need to publish these specs. Apple doesn’t sell theirs so they don’t have to publish anything. I get that some people are unhappy with the limited amount Apple does publish, but we do get to see actual performance shortly after the devices ship, so it’s not really a big deal. It’s a shame Anandtech is no longer capable of doing deep dives. That told us a lot.
"However much Apple is paying TSMC for these chips, it can’t be cheap "
Oh, it isn't. They are charging +500 USD to get the fully binned M3max compared to the previous M2Max configuration. And they aren't offering the base M3 on the regular and cheaper MacBook air
"This is a bit surprising to see, as faster LPDDR5X memory is readily available, and Apple’s GPU-heavy designs tend to benefit greatly from additional memory bandwidth. The big question at this point is whether this is because of technical limitations (e.g. Apple’s memory controllers don’t support LPDDR5X) or if Apple has made an intentional decision to stick with regular LPDDR5."
It is certainly possible that Apple has implemented full memory compression. QC offered this with their ill-fated server chip, so it's likely that Nuvia/Oryon is using this (which in turn explains why they are comfortable using just 128b wide LPDD5X). Ideally you run this all the way to the LLC and compress decompress at entry/exit into the LLC, so that your LLC is effectively twice as large (at the cost of a few additional cycles of latency). You can't compress every line, but you can compress a surprisingly large fraction by 2x.
"The design includes a proprietary algorithm for memory bandwidth enhancement via in-line and transparent memory compression. Memory compression is performed on a cache line granularity and delivers up to 50% compression and up to 2x memory bandwidth on highly compressible data."
I don't think so. I think it's a function of revenue smoothing. Move out those large sales (as opposed to the smaller sales of these more expensive laptops) to a different quarter from the iPhone quarter.
"More curious, however, is Apple’s claims that this will also improve GPU performance. Specifically, that dynamic caching will “dramatically” improve the average utilization of the GPU. It’s not immediately clear how memory allocation and GPU utilization are related, unless Apple is targeting a corner-case where workloads were having to constantly swap to storage due to a lack of RAM. "
There are three possible meanings of this "Dynamic Caching" none of which match what Ryan said.
(a) Apple currently uses 8K of L1D and 64K of Scratchpad storage per core. Scratchpad is used either as Threadblock memory and/or as Tile memory. Apple could allocate something like a unified pool of 128K for both Scratchpad and L1D, allowing more Threadblocks/Tile shaders to occupy a core if they need a lot of Scratchpad, and otherwise using the excess SRAM for L1D. nV have done this for years,
(b) Apple could allow one core to use the Scratchpad of another core. This allows more flexibility in threadblock distribution and allows threadblocks to use a larger pool of threadblock storage. nV has been doing this in the past two or three generations.
(c) GPUs do not allow dynamic allocation of Scratchpad (or other memory resources, like Ray Tracing address space). Meaning code is forced to allocate the maximum size it might need, even if it probably will not need it. This in turn means, eg you can only pack say two threadblocks on core if each claim they want 32K of Scratchpad, even if they will only use 16K. Apple has multiple patents for introducing address indirection for GPU address spaces (think of VM for these GPU-specific address spaces). What this does is allow you to oversubscribe to these address spaces, the same way you can oversubscribe to a process' virtual address space, but only have physical memory allocated when you touch the address space. So if I only actually touch 16K of Scratchpad, that's all the physical Scratchpad that will be allocated. This is probably handled by the GPU companion core and faults by the GPU TLB that go to that companion core. Not *exactl* hardware, but transparent to all SW (even the OS) apart from Apple's GPU companion core firmware.
In the event of oversubscription, the options are like the CPU OS options; eg you can swap out some Scratchpad or RT storage used by one threadblock (to standard DRAM) and reallocate it to the other threadblock – basically like paging. And like paging you can learn, and avoid thrashing; if one threadblock really does use most of its subscription, then be careful in how much oversubscription you allow to subsequent threadblocks from that kernel...
In c - do you mean a ballooning technique like used by VMWare, where you overprovision memory? This was my first thought. Since the memory is a shared complex it would make sense to just treat it all as virtual memory for the GPU.
No. Remember that nVidia offers UNIFIED POINTERS. So even though conceptually Scratchpad and Device address spaces are different, they use the same load/store instructions and you disambiguate the target by (I assume) high bits in the address.
Once you have this scheme in place, it's trivial to extend it so that further high bits indicate something like - device address space (ie go through L1D) or - local Scratchpad OR - remote Scratchpad1 vs remote Scratchpad2 vs ... then it's just a question of routing the data to the appropriate core.
(c) does not REALLY handle overprovisioning, and is not meant to. It does however, if you are executing a variety of kernels, allow core 1 (which is maxed out in terms of Scratchpad use but not other resources) to use any extra, unused Scratchpad attached to core 2 (which maybe is executing a very different kernel that uses very little Scratchpad).
Oops, my bad. I confused (b) and (c) in the reply above.
Yeah, I guess that is basically like how Virtual Machines over provision. But I don't think you need to go to the (somewhat exotic) world of Virtual Machines; it's no different from how VM allows an app to over provision its memory. In an app I can happily allocate a 10GB block of address space then just start using the beginning of it, and if I never need to use past the first 1GB, I never pay any sort of price (physical memory allocation) for the remaining 9GB;
What's the deal with having 11 and 12 core versions? I wonder if the 11 core will be chips that have one faulty CPU and/or GPU? Seems ridiculous to have two versions so similar.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
55 Comments
Back to Article
Jansen - Tuesday, October 31, 2023 - link
It’s very possible this is built using N3E, especially at the relatively low volumes the MBP lineup has. The cost of N3B is also pretty high vs N3E, and I don’t see Apple rushing to adopt a high cost manufacturing process for the M3 series that won’t see 2025 as TSMC has already confirmed they will be moving past it.Don’t forget that the entire 3nm lineup from TSMC has been delayed, so Apple would’ve already been designing for a more mature 3nm process. Don’t have specifics on N3E mass production, but if it was July/August it could line up.
tipoo - Tuesday, October 31, 2023 - link
Well they also made some cuts especially on M3 Pro on core counts and down to a 192 bit memory bus, maybe to make up for the cost. They usually mention gen 2 3nm process or something telling but Apple-like vague like that.BillBear - Tuesday, October 31, 2023 - link
It's worth remembering that the top of the line Ryzen chips don't come anywhere near the M3 Pro's "reduced" memory bandwidth.TheinsanegamerN - Wednesday, November 1, 2023 - link
At 192 bit the m3 pro should be around 150GB/s.With new 8533 mhz memory AMD is pushing 137GB/s.
Apple's lead is disappearing.
BillBear - Wednesday, November 1, 2023 - link
With the Zen 4 Ryzen chips shipping now? No.AMD has about half the memory bandwidth of the M3 Pro.
valuearb - Thursday, November 2, 2023 - link
Max is still 400 Gb/s tho.repoman27 - Tuesday, October 31, 2023 - link
TSMC's first public statement about N3 timelines was during their Q1'20 earnings call, where they indicated that it would reach volume production in H2'22. TSMC eventually announced that milestone on December 29, 2022. N3 was not late.TSMC has always said that N3E would follow one year after N3, reaching volume production in H2'23. During their Q3'22 earnings call, C.C. Wei indicated that N3E might be pulled in 2 or 3 months, which would place it in the Sep-Oct 2023 timeframe. In their most recent earnings call on Oct 19, 2023, TSMC confirmed that N3E would reach volume production in Q4 of this year.
Ignore the media and analysts and pay attention to what TSMC is actually saying. The volume production milestone is a trigger for revenue recognition in subsequent quarters. TSMC is therefore very transparent about that, they have to be.
https://investor.tsmc.com/english/encrypt/files/en...
https://pr.tsmc.com/english/news/2986
https://investor.tsmc.com/english/encrypt/files/en...
https://investor.tsmc.com/english/encrypt/files/en...
my_wing - Wednesday, November 1, 2023 - link
TSMC is lier and will delete post. But old things do record itself in the internet.https://www.anandtech.com/show/16024/tsmc-details-...
The date the link headline is:
TSMC Details 3nm Process Technology: Full Node Scaling for 2H22 Volume Production
2H22 Volume Production and today we know is not is iPhone 14 on N3B node, Nope.
mandirabl - Tuesday, October 31, 2023 - link
I really like that they further diverged M3 Pro and M3 Max. Because they went all-in in M3 Max, which in turn will do wonders for the eventual M3 Ultra. Makes sense.TEAMSWITCHER - Tuesday, October 31, 2023 - link
I think people thought the M3 Pro would be the new vanilla M3. It will be very interesting to see the performance results. If the M3 Pro can beat the M2 Pro with fewer performance cores, then it bodes well for the entire M3 architecture as a whole.I would still love to see a deep dive into the core architecture details - instruction decoders, cores, and reorder buffers. Will Anandtech be able to figure any of this out, or will it require documentation from Apple?
tipoo - Tuesday, October 31, 2023 - link
Yes, the lack of a mobile editor prevented a deep dive on A17, but I'd love to see one on the M3 familytipoo - Tuesday, October 31, 2023 - link
Me: I hope Apple moves to 192 bit busses in M3Apple: We here you, here's 192 bit in M3....Pro....
Me: Nooooo!
M3 Pro got a few cuts. I wonder if the binned chip is even faster than the full core count bin of M2 Pro. They only claim 10% faster from the full bin M3 Pro to the M2 Pro. Also only 1 (!) more P core than the M3 in the weird 11 core model. But once you upgrade the base M3 model to 16GB, the M3 Pro model is sitting right there anyway, in their typical price laddering.
The Hardcard - Tuesday, October 31, 2023 - link
Some nonprofessional observations:1 Apple made a major change in how it produces the chips. Previously, they designed the Max chips, so that the bottom half of the GPU section, system level cache, and memory controllers could be chopped off to make the Pro, getting two products from one mask.
The M3 Pro is his own design, with a completely different layout, thus requiring apple to spend money to lay out and make a mask for a separate chip. This is very expensive however, slicing off a chunk of silicon for each Pro chip was also expensive, especially since they probably had to cut off some silicon with no defects to meet the demand for previous Pros.
2 Previous high-end M chips thus had identical, dual-cluster performance blocks with four cores each and an efficiency cluster with two and then four cores. From the die shots, you can see they increased to up to six cores per cluster. The M3 Max still has two performance clusters, and one efficiency cluster.
M3 Pro has the six performance cores on only one cluster. In the Anandtech Article on M1 Max performance it was shown that each cluster could draw somewhat above 100 GB per second. That goes to why the M3 Pro has a reduced capacity memory interface. With only one cluster, the CPU can’t use 200 GB per second.
I plan on scrounging the money together for an M3 Max for running local high parameter large language models. My current conditions require my machine to be unplugged a large portion of the time, maintaining full performance unplugged is important to me.
128 GB of unified memory will allow running the biggest models with GPU acceleration. I am very disappointed that they didn’t go with LPDDR5-9600 though. Neural networks are memory bandwidth limited and faster memory would have made a huge difference in inference speed.
repoman27 - Tuesday, October 31, 2023 - link
You're exactly right that the M3 Pro is a new design, rather than a chop of the M3 Max. However, with the M1 Pro and M2 Pro, while the design was a chop of the Max layout, Apple produced separate tape-outs and masks for the Pro and Max dies. They weren't cutting down Max silicon to make Pros.Because design costs are so insane and the engineering team only has so much bandwidth, Apple went the chop route for the first two generations, even though that left them with a mid-range chip that wasn't terribly well optimized. Now they've gone all-in with three separate layouts. And while many folks don't seem terribly pleased with what Apple is optimizing for here, in the end, Apple is in the business of selling Macs, not M-series chips.
Also, we can clearly see the effects of analog not scaling as well as logic for N3. With a chop, you end up with a ton of wasted shoreline. Even after cutting the memory interface down by 25%, the real estate at the edge of the M3 Pro die is completely utilized.
repoman27 - Tuesday, October 31, 2023 - link
Well, almost completely utilized. Looking at the die shot again, there are several areas where they pushed logic out to the edge.tipoo - Tuesday, October 31, 2023 - link
Interesting, what else are they using the edges for IO wise with the cut to 192 bit?repoman27 - Tuesday, October 31, 2023 - link
The Max (and previous Pro) layouts use double-stacked DRAM controllers. The M3 Pro shifted to a single / linear controller layout like the regular M chips. Probably makes fan-out a heck of a lot easier.I also just noticed that Apple didn't bother to transpose the die shot before superimposing it on the package render for the M3 Pro. So there's 2 memory interfaces on the side with 1 LPDDR package and 1 interface on the side with 2 packages. They got it right for the M3 though. Huh.
tipoo - Tuesday, October 31, 2023 - link
Furthermore the A and M series are diverging on architectures, the M3 dynamic caching GPU is not on A17, and the A17's twice as fast 35TOPS neural engine is not on M3, so it seems like the direction is more bespoke architectures for everything, which can be good for us (except all the weird chops to M3 Pro right now)https://twitter.com/BenBajarin/status/171914800433...
repoman27 - Tuesday, October 31, 2023 - link
Is the neural engine twice as fast though? Or is it just Apple citing INT8 vs INT16 TOPS? Qualcomm was recently touting INT4, to make sure they had the biggest number.This is only a difference if the M3 doesn't support INT8 for some reason.
Silver5urfer - Tuesday, October 31, 2023 - link
After getting humiliated by their performance expectations with RTX 3090 and when ADL and Zen 4 launched getting butchered, Apple learnt it the hard way and now they are being reluctant on giving those expectations.For what it's worth they should first get rid of the notch on the MacBook and they should offer a PCIe solution for the NVMe not the soldered awful design.
Second part the absolutely bonkers size of these processors is insane for reference 13900K has 25B transistor count on 7nm, and still Apple cannot beat that with Unified Memory. Then we have a 7950X from 2x 71 mm² for the CCDs and 122 mm² on the IODie and 13.5B transistors total. And no way this M3 Max will beat a 7950X in SMT performance. 78Billion Transistors for GA102 RTX4090, don't even dream about the performance. Then we have in socket AM4 upgrade M3 Pro / Max is outdated when we compare vs an x86 desktop in before anyone jumps, Apple sells iMac, Mac Mini with these, so they should be compared to the x86 desktop processors and not the BGA jokes.
As for RT it's a gimmick, even Nvidia knows by 2035 only RT will be effective until then it's all Upscaling nonsensical garbo. It's Apple chasing the trends they are never going to catch up with Nvidia, who is leading that with TAA cancerous upscaling on DLSS and fake frame insertion. Shame Apple also felt the need to jump into that rather not. Esp with their zero Vulkan support.
Blark64 - Tuesday, October 31, 2023 - link
Just to clarify: You seem overly concerned about transistor count on the M3s in comparison with a 13900K. Remember that these are not comparable chips in any way: Apple Silicon is a proper SOC, with many units and controllers (including a comparatively powerful GPU, media encode/decode, and ML accelerator) that are either absent or much less powerful on the 13900K.Hardware RT is hardly a gimmick in many industries including mine (high-end 3D animation). In my render engine of choice (Redshift), the M3 Max is ~2.5x faster than the M1 generation, according to early reports (of course, we'll have to wait for more testing). That promises to be huge for Mac-based 3D artists, and threatens to close the gap (in follow on variants like the M3 Ultra) between Apple Silicon based GPUs and more recent NVidia GPUs.
Silver5urfer - Tuesday, October 31, 2023 - link
The price of M series machines is far more exceeding than a PC counterpart and a total transistor count of 96B on M3 Max vs 7950X of 14B and 4090 with 78Bn will absolutely result in a blood bath. And once AM5 socket Zen 5 comes, say goodbye to the Mac Desktop.Second the count of transistors on this complex N3B is reflecting in their price premium, look at M3 pricing it's off the charts. M3 Max starts at $4000, for that money I can buy a top end X670 motherboard + 4090 ROG Strix OC and a 7950X which will blow this thing out of water, as I mentioned once Zen 5 launches, R9 8950X will wreck havoc.
You are talking about a Workload in rendering not gaming, I was talking about gaming performance and if you want to include that rendering, there's no way a 4090 can be beaten by M3 parts and second the Hollywood render farms run on HPC like infrastructure, for eg a Transformers Revenge of the Fallen CGI took days to render here's information.
"Rendering the Devastator took over 85% of ILM's render farm capacity, and the complexity of the scene and having to render it at IMAX resolution caused one computer to "explode". Digital Domain handled work on secondary characters, including the transformation of Alice from her human disguise to her robot self. The beginning showing a close-up of her face as the skin broke apart took five animators three months to finish."
You think seriously that people are going to blow their money on these ARM Soldered machines ? M2 Ultra is not going to a beat an ILM render farm and no way it is going in a cluster.
ARM is only for portables, it is never going to scale to this level of Desktop performance nor the real world HEDT class work. I did not even talk about Threadripper Pro 7000 series.
bifford - Tuesday, October 31, 2023 - link
The M3 Max starts at $3,200 in the 14" laptop and will likely again be $2,000 in the Studio desktop. So your comparable budget for just a PC motherboard, CPU, and GPU is more like $1,500, so definitely no 4090 ROG Strix OC.Silver5urfer - Tuesday, October 31, 2023 - link
And that $2000 M3 Max can beat relevant similar budget PC ? Once the Mac Studio launches new CPUs will launch and in a year new GPUs will launch and 4090 will be 1/2 the price like 3090Ti. Meanwhile the BGA studio will be DOA.Kangal - Wednesday, November 1, 2023 - link
I don't know mate, but most of us are disappointed with the Apple A17, M3 and extremely disappointed with the M3 Pro and M3 Max.We assumed that Apple was working on a new microarchitecture, that means faster performance and lower energy use. Their first gen product was from 2019 microarchitecture, with the follow-up being mostly a refresh without major differences.
On top of this, is the full-node jump to the TSMC-3nm lithography. We were promised upgrades around the +30% improvement, which is great but not abnormal for full-node jumps.
All up we should've seen a slight energy decrease and a massive performance uplift. Think of the likes of 2016 14nm Cortex-A73 versus the 2019 7nm Cortex-A76.
We didn't see that with the A17 which is a shame. But let's assume and chalk it up to TSMC having bad yields. For instance, their immature lithography may be leaking energy, so for stability reasons, the system may have been tuned to increase the Voltages coming at the expense of performance and efficiency. Even still, we should've seen something more than a 6%-9% uplift (or claimed 10%). Because the microarchitectural uplift should have enabled that, so maybe a more modest +15% wouldn't have been wrong. And remember this is for iPhones and iPads, products that are refreshed yearly and are easily disposed/upgraded by consumers. So a one year "meh" cycle would have been understandable due to the TSMC troubles.
But then you get to the Base M3 and things look less impressive. These are on iPad Pro and MacBook Air. They aren't replaced as quickly. Perhaps Apple could've done something, maybe lower the prices slightly. Like pass on the savings they get from TSMC not fulfilling their part of the contract. So Something. Yet still, we have to understand these in the context that these are the "low cost" options.
There is ZERO excuse for the M3 Pro, M3 Max, and M3 Ultra. Apple SHOULD have delayed their launch. They should've been built on the proper/working 3nm silicon, that doesn't leak or need to have voltages raised for stability. Or just fulfills the promises of a +30% uplift. And then mixed in with a microarchitectural upgrade that is long overdue. So another +10%-30% uplift, totalling anywhere from a +40% to 55% upgrade. And remember these are not exorbitant, these are the industry standard, since we haven't had a proper upgrade for many years.
The M3 Pro in particular looks bad. At least the last refresh we saw the M2 Pro give some competition to the M1 Max. Now it looks like M3 Pro is BARELY equivalent. All of this points to one thing: GREED. Apple was very innovative with the A13 chipset (iPhone 11), Base M1 (great value MBA), and the Late 2021 M1 Max (performance and efficiency). The slight refresh and upgrades were good, not great, but not meh. We saw the release of A15 (iPhone 13), M2 (value MBA 15), M2 Pro (efficiency), and M2 Max (performance). This slowdown shows that Apple is resting on its laurels. That means any architectural breakthrough they have in their labs, they will parcel away into smaller upgrades to trickle feed for future products. Quick example, their laptops have a NOTCH for the FaceID module but they haven't actually put the hardware inside it. They've done this trickle-feed in the past, until competition forced them, or leadership. But with Steve Jobs gone, it seems corporate greed is the main driving force and not innovation.
techconc - Friday, November 3, 2023 - link
I don’t know. Given the limited uplift with N3B for the A17, I’m not disappointed with the improvements to the M3 line at all. Further, I think the rebalancing of the M3 Pro makes a lot more sense. There was nothing but a GPU difference before. The new M3 pro will be more efficient and likely a more suitable competitor to something like the Qualcomm X elite chip.OreoCookie - Sunday, November 5, 2023 - link
The M3 Pro is likely a well-balanced chip for a lot of users: it has a lot more grunt than the regular M3, but is still very efficient. Judging from the benchmarks, Apple really seems to have invested in energy-efficiency, which results in sizable uplifts in multi-core performance.TEAMSWITCHER - Tuesday, November 14, 2023 - link
M3 Pro a little more grunt than the regular M3.M3 Max has a lot more grunt than M3 Pro.
OreoCookie - Sunday, November 5, 2023 - link
I don't know: while the A17 Pro's CPU improvements have been behind expectations, the M3 series delivers IMHO. 15 % gen-over-gen improvements are in line with the competition (and what TSMC told us to expect), the single-core performance cores are on par with Intel's and AMD's *desktop* CPUs (without OC, obviously), and in terms of power efficiency they are still in a class of their own. In fact, power efficiency seems to be the biggest improvement of them all as the M3 Max has a multi-core performance that is of the same order as the M2 Ultra. I wouldn't call that disappointing.The M3 Pro is an opinionated design: Apple could have continued with how it designed the M3 Pro by cutting down the M3 Max. But they haven't. I don't think that's a failure at all. Perhaps it doesn't suit your needs, but the fact that the M3 Pro has *less* transistors than the M2 Pro (e. g. by giving the SoC no excess memory bandwidth) might translate to energy savings.
TEAMSWITCHER - Tuesday, November 14, 2023 - link
How can you be disappointed with the M3 Max? Sure it's expensive, but it's delivering M2 Ultra performance in a laptop! Im my opinion, the 16" MacBook Pro with M3 Max chip is the best laptop on Earth right now. It is a real PRO machine for PRO workflows. Well done Apple!!nikolajbrinch - Tuesday, October 31, 2023 - link
ARM is already in the serverroom battling x86 - look up Graviton from Amazon, or the Ampere that runs in Azure.ARM can indeed battle your precious x86 CPUs.
The future is about power per watt and the environment, here ARM has a very strong case compared to Intel (and so does RISC V, and Apple actually).
Silver5urfer - Tuesday, October 31, 2023 - link
Who denied that ? I already mentioned Graviton in the Qcomm article. And that's the only relevant one. There's no competition other than that, plus AMD's Bergamo is already high density Zen 4c without SMT and that beats the ARM offerings. The point is about Personal Computer and ARM cannot do what x86 does which is large OS and Software support plus freedom. ARM junk in Android you need blobs else the baseboard is going to dumpster. Meanwhile x86 is not relegated to such restrictions. Same for Apple M series or whatever series they are tailored only for specific OS and Software work.Apple is not even remotely related to HPC and will never be with the TSMC wafer cost for their precious high transistor count processors they cannot scale to the level of x86.
TheinsanegamerN - Wednesday, November 1, 2023 - link
so do you specify "HDET class work" because the M series have already caught up to regular desktops, and that doesnt fit with your world view?Because last time I checked, M series performance is actually quite good, ad the M2 ultra (in ?ARM optimized apps) give the likes of AMD, intel, and nvidia a run for their money.
Blark64 - Wednesday, November 1, 2023 - link
Yes, people are blowing their money on these ARM soldered machines: https://www.idc.com/getdoc.jsp?containerId=prUS510...Apple is the only major vendor whose market sharing and shipments are growing in an era of industry contraction.
I'm also not talking about gaming. Macs suck for gaming, not because they are not capable (M-series chips are pretty great for gaming workloads), but because there are very few triple A games that have been ported to Mac. Apple also seems determined not to do the things that game developers would need to change that situation (buying game studios, paying for ports), so I don't see that changing.
Many big productions still render on CPU, since GPU render engines are considered less mature, and GPU memory limitations make large scenes and complex sims impractical. You know what machines don't have those limits? Apple Silicon Macs, where rendering and simulation can take advantage of all the system's unified memory.
I'm not saying that people will build render farms from Macs (those will still be linux workstations). Just that they have some significant advantages, which you seem determined to avoid acknowledging.
scottrichardson - Sunday, November 5, 2023 - link
I find this conversation really odd. Is the original comment author a child? Stinks of 'PCs are better' without much forethought into how Macs are used. The benchmarks for the M3 Max are genuinely impressive, showing it as being right yup there with the 14900K DESKTOP processor. All this coming from a laptop chip chewing 60 watts, unplugged and on battery. Anyway, it's easy to get caught up in these conversations with comments like "once XYZ comes out, say goodbye to the Mac Desktop". Really? We will say goodbye? What will happen? Will Apple cease production of desktops because AMD has a new chip? You think people will mass-migrate over to Windows because AMD has a new chip? Such odd comments. Mac will continue to be Mac, regardless of whether AMD, Intel or NVIDIA has faster hardware. It's a mindset that PC/Windows/Android folk never grasp. And on the outside it may seem illogical, but for Mac users, there's more to a user-experience beyond the specs and numbers. Some people seek more from their experiences using a computer than seeing high FPS. Some of us just need to get lots of work done. I run a digital agency and build apps for companies with hundreds of millions of dollars, and my M2 Max MacBook Pro is the sole piece of hardware that makes it possible. As an aside, it's worth noting that NVIDIA is working on an Arm SoC for PCs. I wouldn't be so condescending towards ARM just yet.FWhitTrampoline - Tuesday, October 31, 2023 - link
How is Apple ever humiliated on GPU performance when Apple only presented slides based mostly on NON Gaming Graphics workloads for Apple M2/earlier M series series SOCs. Apple's M2 and earlier presentations never focused primarily on gaming for Apple's GPU performance slide materials so that's Professional Graphics Workload based where FPS performance is not an issue. It's only the Tech Press that focused on gaming as Apple's M2/Earlier M series drivers are more tuned/optimized for Professional Graphics workloads and not FPS/Gaming workloads.And Apple's only just recently announced optimizations and SDKs more geared towards gaming workloads on Apple Silicon. But I can see where Apple's marketing has stated in an unqualified manner that Apple's GPU was comparable to Nvidia's offerings there and I never take Marketing seriously to begin with as marketing is rarely facts based by design!
byte99 - Tuesday, October 31, 2023 - link
a) You can't compare the transistor count of an M3 SoC to that of a 13900K. It's apples and oranges --not a meaningful comparison. You'd instead need to compare it to a 13900K + discrete GPU + SSD controller + etc., etc.b) Even with that, the M3 will be larger, for a key reason you ignore: Efficiency. Consider the GPU. Apple uses more GPU cores run at lower clocks to obtain higher efficiency. And, for the CPU, they use larger cores to obtain higher IPC so they can run those cores at lower clocks. An M2 running at 3.7 GHz has the same single-core Geekbench 6 score as an AMD Ryzen 9 7900 running at 5.4 GHz.
Blark64 - Wednesday, November 1, 2023 - link
An update: the first Geekbench scores have emerged for the lowest end M3: https://browser.geekbench.com/v6/cpu/compare/33423...Apple has been "humiliated" by matching the per-core performance of Intel's 14900K on its entry level part. They must also be "humiliated" that it achieves half the multicore performance of the 24-core 14900K, despite having only 8 cores.
lilkwarrior - Saturday, November 18, 2023 - link
Ray-tracing is the holy grail of graphics and is required for any new GPU for current gen gaming and productivity.It is not a gimmick being used critically for creative professional hse Macbooms are very primarily for over decades
lilkwarrior - Saturday, November 18, 2023 - link
*professional use Macbooks are very much primarily forAbe Dillon - Tuesday, November 28, 2023 - link
Wow. What a barely coherent rant comparing wildly different hardware and making blatantly bonkers statements.I've got news for you: every pixel in every frame of every videogames is fake (for some definition of the word fake). Rasterization is a cheap way to fake "real" light transport. When you see a brown pixel, it's a fake brown. There are only red, green, and blue pixels. No brown. It's fake.
The entire obsession with graphics is a gimmick. Everyone has been convinced that better graphics = higher realism = more emersion = more entertaining. It all falls apart if you think about it for more than 3 seconds.
If emersion was the only determining factor in entertainment, then why is VR still struggling to take off? Why do people love Minecraft or Among us so much?
If realism equated to emersion, then how do people get lost in abstract games like chess? Do you think they'd enjoy chess twice as much if their chess pieces were twice as detailed?
You've been duped into believing that the drastically diminishing returns of enjoyment graphics adds to gaming is way more important than it is.
valuearb - Monday, December 11, 2023 - link
Apple did get rid of the notch. I bought a M2 Max 14 inch MBP this summer and don't remember seeing the notch after my first day and I use it daily. Same with my iPhone 13.AceMcLoud - Friday, December 15, 2023 - link
Amazing that people like you are still around.FWhitTrampoline - Tuesday, October 31, 2023 - link
Apple has stated in slides that the Instruction Decoder resources are wider and the execution engine has more resources as well for the A17 P core. But Apple's refusal to even provide basic CPU and GPU resources information should be maybe resulting in the Tech Press refusing to give Apple any Press coverage unless Apple provides CPU/SOC information more in line with the Industry norms.Qualcomm and it's Oryon Core needs some CPU core Block diagrams and even though Arm Holdings provides some limited Block Diagrams for it's in-house reference CPU core designs even ARM Holdings' materials there fail to explicitly state CPU core Instruction Decoder Width/Count and Micro-op issue/retire rates at least for the slides that ARM Holdings uses for its announcement presentations. And iGPU information in the form of Render Configurations such as: Shaders:TMUs:ROPs and RT cores and Matrix Math Cores is difficult to come by as well for most ARM Based SOCs with iGPUs.
Does Apple even publish, outside of an NDA, any CPU core performance counter information/guides as the Benchmark software has to have access to internal performance counters to do any latency and cache performance calculations and other such testing.
melgross - Tuesday, October 31, 2023 - link
Well, these other companies have to sell their chips so they need to publish these specs. Apple doesn’t sell theirs so they don’t have to publish anything. I get that some people are unhappy with the limited amount Apple does publish, but we do get to see actual performance shortly after the devices ship, so it’s not really a big deal. It’s a shame Anandtech is no longer capable of doing deep dives. That told us a lot.dudedud - Tuesday, October 31, 2023 - link
"However much Apple is paying TSMC for these chips, it can’t be cheap "Oh, it isn't. They are charging +500 USD to get the fully binned M3max compared to the previous M2Max configuration. And they aren't offering the base M3 on the regular and cheaper MacBook air
name99 - Tuesday, October 31, 2023 - link
"This is a bit surprising to see, as faster LPDDR5X memory is readily available, and Apple’s GPU-heavy designs tend to benefit greatly from additional memory bandwidth. The big question at this point is whether this is because of technical limitations (e.g. Apple’s memory controllers don’t support LPDDR5X) or if Apple has made an intentional decision to stick with regular LPDDR5."It is certainly possible that Apple has implemented full memory compression.
QC offered this with their ill-fated server chip, so it's likely that Nuvia/Oryon is using this (which in turn explains why they are comfortable using just 128b wide LPDD5X). Ideally you run this all the way to the LLC and compress decompress at entry/exit into the LLC, so that your LLC is effectively twice as large (at the cost of a few additional cycles of latency). You can't compress every line, but you can compress a surprisingly large fraction by 2x.
Here's what QC said at the time:
https://www.qualcomm.com/news/onq/2017/10/qualcomm...
"The design includes a proprietary algorithm for memory bandwidth enhancement via in-line and transparent memory compression. Memory compression is performed on a cache line granularity and delivers up to 50% compression and up to 2x memory bandwidth on highly compressible data."
Abe Dillon - Tuesday, November 28, 2023 - link
I wonder if the decision to forego LPDDR5X is at all related to the mystery 128 Gb chips that aren't available on the consumer market yet.valuearb - Thursday, November 2, 2023 - link
Aren't "yet" offering the M3 on MacBook Air. That's likely a function of production yields, not cost.name99 - Friday, November 3, 2023 - link
I don't think so. I think it's a function of revenue smoothing. Move out those large sales (as opposed to the smaller sales of these more expensive laptops) to a different quarter from the iPhone quarter.name99 - Tuesday, October 31, 2023 - link
"More curious, however, is Apple’s claims that this will also improve GPU performance. Specifically, that dynamic caching will “dramatically” improve the average utilization of the GPU. It’s not immediately clear how memory allocation and GPU utilization are related, unless Apple is targeting a corner-case where workloads were having to constantly swap to storage due to a lack of RAM. "There are three possible meanings of this "Dynamic Caching" none of which match what Ryan said.
(a) Apple currently uses 8K of L1D and 64K of Scratchpad storage per core. Scratchpad is used either as Threadblock memory and/or as Tile memory. Apple could allocate something like a unified pool of 128K for both Scratchpad and L1D, allowing more Threadblocks/Tile shaders to occupy a core if they need a lot of Scratchpad, and otherwise using the excess SRAM for L1D. nV have done this for years,
(b) Apple could allow one core to use the Scratchpad of another core. This allows more flexibility in threadblock distribution and allows threadblocks to use a larger pool of threadblock storage. nV has been doing this in the past two or three generations.
(c) GPUs do not allow dynamic allocation of Scratchpad (or other memory resources, like Ray Tracing address space). Meaning code is forced to allocate the maximum size it might need, even if it probably will not need it. This in turn means, eg you can only pack say two threadblocks on core if each claim they want 32K of Scratchpad, even if they will only use 16K.
Apple has multiple patents for introducing address indirection for GPU address spaces (think of VM for these GPU-specific address spaces). What this does is allow you to oversubscribe to these address spaces, the same way you can oversubscribe to a process' virtual address space, but only have physical memory allocated when you touch the address space. So if I only actually touch 16K of Scratchpad, that's all the physical Scratchpad that will be allocated. This is probably handled by the GPU companion core and faults by the GPU TLB that go to that companion core. Not *exactl* hardware, but transparent to all SW (even the OS) apart from Apple's GPU companion core firmware.
In the event of oversubscription, the options are like the CPU OS options; eg you can swap out some Scratchpad or RT storage used by one threadblock (to standard DRAM) and reallocate it to the other threadblock – basically like paging. And like paging you can learn, and avoid thrashing; if one threadblock really does use most of its subscription, then be careful in how much oversubscription you allow to subsequent threadblocks from that kernel...
nikolajbrinch - Tuesday, October 31, 2023 - link
In c - do you mean a ballooning technique like used by VMWare, where you overprovision memory? This was my first thought. Since the memory is a shared complex it would make sense to just treat it all as virtual memory for the GPU.name99 - Tuesday, October 31, 2023 - link
No. Remember that nVidia offers UNIFIED POINTERS. So even though conceptually Scratchpad and Device address spaces are different, they use the same load/store instructions and you disambiguate the target by (I assume) high bits in the address.Once you have this scheme in place, it's trivial to extend it so that further high bits indicate something like
- device address space (ie go through L1D) or
- local Scratchpad OR
- remote Scratchpad1 vs remote Scratchpad2 vs ...
then it's just a question of routing the data to the appropriate core.
(c) does not REALLY handle overprovisioning, and is not meant to. It does however, if you are executing a variety of kernels, allow core 1 (which is maxed out in terms of Scratchpad use but not other resources) to use any extra, unused Scratchpad attached to core 2 (which maybe is executing a very different kernel that uses very little Scratchpad).
name99 - Tuesday, October 31, 2023 - link
Oops, my bad. I confused (b) and (c) in the reply above.Yeah, I guess that is basically like how Virtual Machines over provision. But I don't think you need to go to the (somewhat exotic) world of Virtual Machines; it's no different from how VM allows an app to over provision its memory. In an app I can happily allocate a 10GB block of address space then just start using the beginning of it, and if I never need to use past the first 1GB, I never pay any sort of price (physical memory allocation) for the remaining 9GB;
block2 - Sunday, November 5, 2023 - link
What's the deal with having 11 and 12 core versions? I wonder if the 11 core will be chips that have one faulty CPU and/or GPU? Seems ridiculous to have two versions so similar.