Comments Locked

21 Comments

Back to Article

  • Findecanor - Monday, October 30, 2023 - link

    Curiously enough, the M3 Pro has 6P+6E cores, compared to 8P+4P of the M2 Pro.
    Same number of cores, but a smaller portion of performance-to-efficiency cores.
  • name99 - Monday, October 30, 2023 - link

    Looks like they are still trying out different options.
    Remember the M1 Pro only had two E-cores, but they could "turbo boost" to substantially higher frequencies than the M1 E-cores.

    It's possible that Apple's plan for the M3 Pro looks something like
    - on battery, run them at ~A17 frequency so ~2GHz
    - on wall power run them at close to the P-core frequency, so close to 4GHz

    If we do very rough handwaving scaling from A17 GB6 results, then an E-core (at iPhone GHz) is worth about .4 P-cores; so nominally an M3 Pro comes with "6+2.4" P-cores, but when boosted comes with "6+4.8" cores. That's surely too optimistic, but suggests that we might see something like, on battery power and for throughput code (GB6 MT or Cinebench) the equivalent of about 8 P cores; and on wall power the equivalent of maybe 10 P cores.
    ~9 P cores is about where we were with Cinebench and the 8+4core MB2 Pro, so the scaling is not that bad compared to what you might think.
  • techconc - Monday, October 30, 2023 - link

    They seemed to make a specific point bragging about how Apple Silicon runs at maximum performance whether it’s plugged in or on battery.
  • Kevin G - Tuesday, October 31, 2023 - link

    In mobile they are thermal limited so that is mostly true. The iMac would be the exception today since they can afford to run it hotter/give it more power due to larger heat sinks and not having to worry about cooking a person’s lap.

    I would expect things to be different when we get the Mac Mini and Studio updates next year.
  • yankeeDDL - Tuesday, October 31, 2023 - link

    ARM architecture is inherently much cleaner and more efficient than x86 that carries over a ton of legacy drawbacks.
    In my opinion it is an inevitable transition in the coming years, and the fact that nVidia and AMD are planning to release ARM-based CPUs is a clue that the transition will happen sooner rather than later (in my opinion). The M* trash Intel's x86. And while it's true that Intel is far behind in power/efficiency compared to Ryzen (for quite a few years already), the gap to Apple is staggering. The change to "Apple silicon" gave the Macs a massive leap forward from all perspectives. It is inevitable that the PCs will have to follow suite.
  • ABR - Tuesday, October 31, 2023 - link

    Meanwhile the Max has _fewer_ efficiency cores with only 4 to go with its 12 perf cores. I feel like Max is the one to get here, though I have no need for 128GB of (doubtlessly power hungry) RAM.
  • meacupla - Monday, October 30, 2023 - link

    I wonder how well these will sell... seeing as M2 was a flop
  • solipsism - Monday, October 30, 2023 - link

    In what way was the M2 "a flop?
  • meacupla - Monday, October 30, 2023 - link

    uhhh... sales?
    You know apple halted production of M2 chips because sales were plummeting, right?
  • Unashamed_unoriginal_username_x86 - Monday, October 30, 2023 - link

    M1 MBA was same price as ICL MBA and had 55% longer battery life, M2 was $200 more and had 5% less battery life based on Notebookcheck.
    Comparing sales is obfuscated by COVID, stimulus, recession etc. but mac shipments peaked at M2 release in 2022q4. That's from Statista which doesn't break down market share though, maybe all those q4 sales were moreso the M1 Mac. So no it probably wasn't a flop financially
  • Jansen - Monday, October 30, 2023 - link

    Think this is using 2nd generation N3E, which recently entered mass production.
  • varase - Monday, October 30, 2023 - link

    Apple probably wanted to steal the thunder from the Snapdragon X Elite (which was stealing some of the distant thunder of the M2 lineup).

    Of course, the snappiness of the computer still comes mostly down to the speed of high performance cores - with the remaining cores ready to come into play with highly multithreaded applications.

    Subtle message to Qualcomm, Nuvia, and Microsoft: Yeah, you go three of the lead managing engineers responsible for the development of the M1, but you left the thousands of engineers responsible for doing the grunt work.

    Obviously, Apple had the M3 designs taped out, prototyped, and validated before even releasing the A17 Pro - and having all of TSMC's entire 3nm manufacturing capacity means they could start production of these lower quantity SoCs at a moment's notice.
  • Jansen - Monday, October 30, 2023 - link

    The launch was predicated on N3E yields. Nothing to do with Snapdragon hype.
  • repoman27 - Monday, October 30, 2023 - link

    The M3 chips were made using TSMC’s N3 node, not N3E. Volume production of N3E is only beginning this quarter, and cycle times are >90 days. We won’t see N3E silicon in shipping products until May~June 2024.
  • mode_13h - Monday, October 30, 2023 - link

    I see no mention of ARMv9-A or SVE. Didn't see them mentioned in the previous iPhone liveblog, either. Is Apple possibly stuck on ARMv8-A? Maybe their architecture license doesn't cover v9 and they don't want to accept the terms ARM is offering?
  • Kevin G - Tuesday, October 31, 2023 - link

    Apple seems to be pushing SIMD compute to the GPU or other accelerators. The other improvements in ARM v9 would be nice but SVE2 just doesn’t mesh well with Apple’s current strategy for acceleration.
  • name99 - Tuesday, October 31, 2023 - link

    Or maybe they think SVE is just a bad idea? The predication part is valuable, but the complications generated by the variable length seem more than the variable length option buys you.

    It's not clear to an outsider, but AMX is capable of handling AVX-512 style *compute* operations (not permute), and Apple seems to grow this performance and capability every generation.
    Their endgame may be, once they are comfortable that the AMX instruction set has stabilized (which is not yet! they keep changing it every year, and new ideas make it clear that the older ISA was limiting them) they will make the AMX ISA compiler-visible, and that will be their AVX-512/SVE competitor?
  • joelypolly - Tuesday, October 31, 2023 - link

    Most interesting for me is there is now better alignment of the different SKU releases. Only the Ultra wasn't update today.

    Compared to the M1 release where there was an 11 month delay between the M1 and M1 Pro/Max, and almost 6 more month before we saw the M1 Ultra. I expect by the time M4 rolls around the release cadence is going to be pretty much locked in.
  • Bambel - Tuesday, October 31, 2023 - link

    few observations:

    - the layout of all three chips is rather unique this time, so less re-use of physical layout between the family members.
    - the p-cores seem to be in six core clusters now, but I guess it's the same 16MB L2, will be interesting to see how it is partitioned (fixed vs. dynamic slices) Also, only one AMX unit per cluster so shared with six cores compared to four in previous gen.
    - not sure what's the deal with six e-cores in the Pro, even two were enough to drive the whole OS.
    - Pro seems to have regressed to a 192 bit memory interface but at least the SLC looks like 16MB, the standard M3 seems to have only 8MB. And Max..? No idea but I guess 48MB?
    - lot's of "unused"(?) space on the Max die. Seems to have a very conservative floor plan. And of course again the area of the die-to-die interconnect is hidden. Maybe on two edges now to enable a four chip "Extreme" or whatnot?
  • repoman27 - Tuesday, October 31, 2023 - link

    Still only a single edge for die-to-die interconnect.

    The Max design does seem a bit sparse though. Dark silicon to deal with heat dissipation? Not sure what's going on there.
  • name99 - Tuesday, October 31, 2023 - link

    The new Max design raises some interesting questions.

    Suppose the following statements are both true:
    - building and testing an EUV mask set is extremely expensive (we know this!)
    - it is fairly easy, in a modern fab, to set a machine to only use PART of a mask set, and when stepping, to move the wafer based on the subarea of the mask that is used, not the whole mask.

    Then imagine we do the following. The full Max mask set includes
    - a Fusion area at the very bottom of the die
    - two GPU+memory areas at the bottom
    - an IO (and similar "one time" stuff. Display controller, ISP, Secure Enclave, etc) area at the top.

    Now we can use this single mask set to make multiple different Max's.
    - Max Ultra1 has Fusion and IO section
    - Max Ultra2 has Fusion but no IO section
    - Max Normal has no Fusion
    - Max Minus has no Fusion AND is missing a stripe of GPU from the bottom

    The details are unclear (and maybe the Max Minus does not exist as I describe it, it's always a fused or yield-salvaged Max Normal) but the geometry seems to lend itself to this idea. And it avoids some (not all, but some of) the "waste" that otherwise occurs – a machine with all these extra Display Controller and IO ports that don't really make sense. Maybe a future design will figure out a way to pack more "unusable" stuff in the IO area (perhaps two or even three of the Display Engines)?

    The wildest version of this idea says: why not just cut off the entire top half of the design (more or less, as far as the memory controllers go). What THAT gives you is a GPU-only chip...
    Giving Apple some degree of mix-and-matching for building Ultra's and Extreme's.
    Eg, for example, perhaps the first version of the Extreme could be a Max Ultra1 and Max Ultra2 (so looking like a current Ultra pair) along with two Max GPU's. Kinda like a dGPU, but without the dGPU downsides.

    Of course this is wild speculation, but this generation has shown us that it's silly to get locked into certain ideas ("a cluster is four cores", "the Pro is a chop of the Max") when alternatives arise that have interesting potential in terms of opening up new possibilities.

    In terms of the Fusion connector, I've been thinking about that geometry.
    Look at how different (MUCH wider) the "memory" edges are for the Max compared to the Pro/M3. One possibility is that each edge can have twice as much memory plugged in (double the external pins, same number of internal pins, so same bandwidth but double capacity). This memory could, eg, be on both sides of the package board.
    But another reason for that width might be that that's where Fusion now lives, so some of that edge real estate goes to TSV's or something that route to a package RDL which then connects to another device.
    You could even do both of these, so something like two Max's form an Ultra (or Ultra GPU) unit with the highest speed Fusion between them, and these two Ultra's communicate over a longer distance via the RDL as a sort of Fusion-Lite that's half or a quarter the bandwidth of full Fusion. Obviously this is a non-uniform system, but so what, that's just an issue of having the OS and GPU schedulers aware of the fact.

Log in

Don't have an account? Sign up now