Comments Locked

39 Comments

Back to Article

  • SarahKerrigan - Monday, August 23, 2021 - link

    The early disclosures for IBM's new z processor, Telum, indicate it may have no on-die L3 (but has absolutely immense L2's.) I'm excited to see how that plays out!
  • Ian Cutress - Monday, August 23, 2021 - link

    Can confirm. L3 is virtual on a single chip, and L4 is virtual across chips. It's the future of multi-level caches.
  • SarahKerrigan - Monday, August 23, 2021 - link

    Thanks! Any other deets you can provide on Telum before the presentation? IIRC z15 put higher-level BTBs in eDRAM - are those just SRAM structures now? What's the L1 config look like? Early disclosures implied the SC is gone - are memory controllers integrated into the Telum CP now?
  • Ian Cutress - Monday, August 23, 2021 - link

    It's all one chip :) Presentation is soon, too much to write in a box right now. But I think all your Qs will be answered.
  • SarahKerrigan - Monday, August 23, 2021 - link

    My questions weren't answered, sadly... and normally z HC presentations are so good!
  • SarahKerrigan - Monday, August 23, 2021 - link

    Eh, I take it back. Looking closer, it appears that we're looking at, mostly, a z15 core variant with a refactored branch predictor (hi, lack of eDRAM!), otherwise a similar or identical core, and a new uncore.

    I'm here for it, I guess.
  • TeXWiller - Monday, August 23, 2021 - link

    My memory is little hazy on this, but I think it was either Oracle or Fujitsu that had such cache organization in one of their recent models and they got a decent performance with it.
  • Shaunathan - Monday, August 23, 2021 - link

    aww man, i got burrito juice all on my hand
  • Unashamed_unoriginal_username_x86 - Monday, August 23, 2021 - link

    Yo dog thats awful. hope u get better
  • The Hardcard - Monday, August 23, 2021 - link

    Jason Lowe-Power? What a name for someone in the semiconductor industry! He doesn’t need to do anything extra to get his resume read.
  • JayNor - Monday, August 23, 2021 - link

    What pcie 5 chips does Intel have to hook to Alder Lake?
  • Drumsticks - Monday, August 23, 2021 - link

    I'm guessing the PCIe5 support was built in for GLC on Sapphire Rapid's sake, and it was easier to include it in ADL rather than leave it out in favor of PCIe.

    For what it's worth, though, Samsung has announced PCIe5.0 SSDs coming next year.
  • Slash3 - Monday, August 23, 2021 - link

    Alder Lake only supports PCIe 5.0 on the GPU slot (x16 or x8/x8), the dedicated NVMe M.2 port is still Gen4 x4. Obviously a slot adapter could still provide Gen5 disk access, but the default target for desktop storage on this upcoming chipset still seems to be Gen4.
  • SarahKerrigan - Monday, August 23, 2021 - link

    "8 cores + 4 MB L2"

    No. The L2 is 32MB *per core*.

    Yes, I know the number is immense. No, that doesn't mean it's wrong.
  • Oxford Guy - Monday, August 23, 2021 - link

    530 sq mm at ‘7nm’ and no space wasted on iGPU.

    Although the 32 MB is exciting, reading ‘Samsung’ is a bit saddening — not due to the node itself but due to IBM’s name not being in the mix.
  • Silver5urfer - Monday, August 23, 2021 - link

    Look at Intel's PR. On the architecture day and Hot Chips, same thing - explaining how great those small cores are and how their solution for the problem which they created is a good one and bulletproof.

    AMD on the other hand their IPC boost and growth charts and comparisons and calculation methods. Just superb. I hope their 3D V-Cache Zen3 refresh crushes this pathetic big small ARM copy cat design from Intel. Not only that, Sapphire rapids shares a lot with Threadripper / Milan CPUs. Esp the Memory controllers and the whole layout. I bet Intel is eyeing the platform benefits - DDR5, Gen 5.0 and CXL, HBM on die etc. Esp their package deals with their new hyper marketed GPU Xe.
  • MetaCube - Sunday, August 29, 2021 - link

    Wat
  • Yojimbo - Monday, August 23, 2021 - link

    "12:00PM EDT - Most apps are Single or lightly MT"

    "lightly MT"; I think this means they only need a few threads, but it seems like it could mean that they need mostly light threads (but perhaps many of them). Which is meant?
  • mode_13h - Monday, August 23, 2021 - link

    > "lightly MT"; I think this means they only need a few threads

    Definitely this.
  • abufrejoval - Wednesday, August 25, 2021 - link

    I think the real meaning is more like "most applications fail to use any additional cores for a significant amount of their work". In other words, Gene Amdahl's law holds true for them.
  • schujj07 - Monday, August 23, 2021 - link

    Anyone else look at Sapphire Rapids and think the layout is eerily similar to Epyc Gen 1. Almost looks like Intel wasn't ready to go full chiplet like Zen2 or later and could only do MCM.
  • WaltC - Monday, August 23, 2021 - link

    First thing I thought of...;)
  • Oxford Guy - Monday, August 23, 2021 - link

    I was very excited about Sapphire Rapids until I saw that it’s ‘glued together’.
  • Yojimbo - Monday, August 23, 2021 - link

    I think Intel's marketing objections to gen 1 Epyc was that they were "glued together" desktop parts, not that they were using MCMs. Intel has been planning for MCMs for a long time. And the way Intel is gluing together their MCMs in Sapphire Rapids is more sophisticated than how AMD did for gen 1 Epyc.
  • arashi - Monday, August 23, 2021 - link

    After just about half a decade later I sure hope so.
  • Yojimbo - Monday, August 23, 2021 - link

    Epyc is connected through the PCB, Sapphire Rapids uses EMIB. Intel has been developing EMIB for probably a decade, and they've been talking about using packaging technologies to maintain a Moore's Law-like pace of improvement in application performance for years.

    Chiplet is just what AMD calls their CPU MCM.
  • sgeocla - Tuesday, August 24, 2021 - link

    EMIB and Foveros are at least 2 years behind what TSMC & AMD are doing. See Intel current 50 micron bump pitch and 10 micron pitch in 2023-2024 for TSV connections vs 9 micron pitch used for 3D V-cache for AMD with Cu-Cu bonding using lower power.

    They difference between AMD's approach and Intel's approach is that one of them is efficient and cost effective and has been shipping in millions of devices while the other has only been demonstrated in expensive, low volume and low performance products like Lakefield.

    AMD's are using chiplets on multiple nodes with the best characteristics of each. Intel is using MCM for sapphire rapids because they only have 1 base tile on 10nm, mirrored and rotated. This means low yields and that the tiles can't be salvaged for different products like AMD does with Ryzen and Threadripper. This is basically EPYC1 only with smarter glue. That's why it's limited to 56 cores while AMD's Genoa will be 96 cores with Genoa X at 128 cores.
    AMD resolved the issue of effectiev latency using larger caches but Intel can't go higher in cores because their architecture means their latency increases with the number of cores.
  • schujj07 - Tuesday, August 24, 2021 - link

    While EMIB might be the connection, I wouldn't be surprised if Intel runs into the same problems with Sapphire Rapids that AMD had with Epyc 1, ie latency across to different chips. Each tile has 2 its own 2 channel memory controller. Therefore local memory requests will be quick, however, when you need to go to something in a different tile the data will have to traverse the mesh/UPI links. AMD got around this in Gen 2 & 3 by having the IO die. That centralized all IO & memory communication. I I just think that Intel is 4-5 years behind AMD in terms of layout on their chips. Granted we won't know until it ships supposedly in H1 2022.
  • whatthe123 - Tuesday, August 24, 2021 - link

    Epyc 1 was worse performing because it was on a worse node and had slow single hop i/o paths between CCXs instead of an IOD. Cache access was also limited to each 4 core ccx. Intel is claiming EMIB wiring is similar to monolithic performance and that all cores have access to all resources, which would be very different from Naples. Claiming something isn't the same as actually delivering so who knows if the real world performance is any good but based on what they're saying it wouldn't have the same downsides as Naples.
  • abufrejoval - Wednesday, August 25, 2021 - link

    "the PCB" is too easy to misconstrue as the mainboard. The die carrier used here is a rather special PCB, which doesn't duplicate a typical PCB's latencies, capacities and voltage levels.
  • arashi - Monday, August 23, 2021 - link

    They can't even power it on/run workloads on it, every single vague chart/graph is simulated.
  • eastcoast_pete - Monday, August 23, 2021 - link

    Thanks Ian! Question about IBM's Telum CPU for mainframe being fabbed at Samsung: Is Samsung considered a "Trusted foundry"? If not, quite a number of US government agencies cannot use (buy or lease ) a mainframe with a Telum inside.

    On a different subject: How many people from Apple attend this conference? Reason I ask is that Apple least in the past, basically behaved like a parasite, as they never present anything at this and similar meetings. They typically take a lot of notes and ask questions, but it's all take, and no giving of information. If I am mistaken about Apple presenting, please correct me; would be nice to know they actually show signs of good corporate citizenship.
  • name99 - Tuesday, August 24, 2021 - link

    (a) The Samsung question is very interesting! I'd be curious as to how that plays out.

    (b) At least when I was at Apple (before Apple got into the CPU design business), plenty of Apple people attended. Your outrage is more based on ignorance than reality.

    - Apple explain plenty of how their designs work if you make the effort to spelunk through the patents and run some experiments.

    - BUT their design is what you would get if you started with a clean slate in say 2005, with strong opinions (that have been validated) as the how frequency vs power vs density will play out over the next few generations of process. Their design will not help anyone who's unwilling to burn their existing design and start from scratch.

    - There is very little in their design that I had not previously encountered somewhere in the academic literature. They benefited massively from ZERO NIH concerns. You may examining the literature is obvious. It's not. So many good ideas were published 20 years ago (plenty of them sponsored by Intel) but Intel's management, in their wisdom, have not been interested in restructuring their designs to the extent necessary to exploit those ideas.

    - Which gets us to the final point. You'd be stunned, when you look at the details, at how much Apple changes (ie is willing to change) every design. Their have been three big generations, the first one being internal PA Semi stuff ending at the A6; then A7..A10; then A11..A14. My guess is A15 begins a new generation.
    Each generation is a huge visible change (eg A7 added 64b; A11 added clustering and everything that flows from that, and removed 32b). But it's also a massive design change. Apart from that, the annual changes are frequently, and silently, much larger than the sorts of things we see in these HotChips talks.
    You have to have a team [and management!] that are willing to make these massive annual changes, plus a set of tools to validate the changes are worth doing, plus a set of tools to help implement the changes.

    I used to think somewhat like you. Not any more. Apple didn't get to their position by some sort of nefarious tricks whereby they "stole" ideas in some way that prevented their use by others, and they aren't keeping their tricks secret. They got to where they are by
    - very deep knowledge of the literature
    - an imagination to combine ideas from many many places
    - a willingness to take risks in the sense of constant redesign.

    One way in which the best parts of Apple work well is that design and UI is separate from implementation (as individuals, not as collaborators). This has a VERY important (and under-appreciated) effect: the designers design for what would be great UI, and what the HW is capable of, but they don't have to do the work!
    This is SO important. When the engineer is the designer, you always consider an idea in terms of "oh god, that sounds like so much hard work, so many changes". The Apple split means you rarely suffer from that failure mode: rather than engineers dismissing their own ideas bcs a few minutes thought suggests it's a lot of work, they are constantly being forced to implement good ideas -- and often discovering those good ideas can be implemented without nearly as much work as they imagined, or as part of a grand redesign that's worth doing because of so much more it opens up.

    My GUESS is that Apple's CPU design works in much the same way, that there are a few lofty theorists, extremely familiar with the academic literature, who are constantly revisiting previous ideas and simulations and asking "why don't we change the register allocator in this way? the current scheme for sharing registers is OK, but look at this new scheme I thought up; etc etc"
    The next level down of engineers probably groan and push back against every one of these ideas, but the important point is that in Apple all the weight is on the side of the grand designers, none on the side of the poor engineers who have to do the implementation.

    In a way this is just the latest version of a computing argument as old as time. When do you stick with the existing, tried and true code base/design; and when do you engage in huge changes? Since the mid-70s Apple has been defined by being willing to engage in the huge changes, and pay the price of constant low-level irritation every year (every year many things are fixed but a few other things break). Since the same time both halves of Wintel have been defined by not engaging in large changes, by engaging mainly in minimal changes. For a few years Intel engaged in aggressive internal design changes even as the ISA was not changed much (think of 386 to 486 to Pentium to PPro) but not much since about Nehalem.
    Meanwhile the classic MS mentality has been expressed by Joel Spolsky in many ways, not least here: https://www.joelonsoftware.com/2001/10/14/in-defen... and here https://www.joelonsoftware.com/2000/04/06/things-y...

    I'm not interested in arguing about the extent to which Spolsky (or MS or Intel) or justified in their behavior. My point is the poster's original claim, that Apple is not sharing; and my claim that the issue is not that Apple is keeping secrets, it's that all the other companies find it (every year) easier to just evolve the existing design a little more in a few directions than to tear it all down and start again. Apple publishing a hundred papers would not change that...

    It's interesting to compare this with semiconductor processes. Aren't I a hypocrite for complaining that Intel are too timid in redesigning their micro-architecture while also complaining that they should follow TSMC in how they design their process?
    I think the differences is Intel's process failures (IMHO) result not so much from big leaps as from marketing/finance driven decisions.
    The difference between INTC and TSMC that matters is that TSMC STFU until it has something validated along every scary dimension. If something CANNOT be validated yet, it is postponed (cf, eg, GAA on N3). Intel, on the other hand (for reasons that make zero sense to me) insists on claiming, well before the scheme is validated at a manufacturing level, that it will deliver technology X on date Y. Then they find themselves locked into that promise even when it makes no sense.
    Would TSMC's cautious half-nodes help Intel? Well, not if Intel insisted on still describing every half node step they plan for the next ten years. (Wait, isn't that what they already DID with Intel 7, Intel 5, Intel 4, ...?) The issue is not that TSMC is making cautious half steps while Intel is rebuilding the process from scratch each generation; it's that TSMC is using, for each process, a suite of technologies that have all been validated, separately and together in the lab; while Intel is using, for each process, a suite of suite of technologies that have all been validated, separately and together, on a marketing slide five years ago.
  • eastcoast_pete - Tuesday, August 24, 2021 - link

    While I actually agree with you on a number of your points, my criticism of Apple was not that they don't disclose at least some of their hardware designs somewhere (patents actually require that, after they have been granted). Rather, it was and is about them never (AFAIK) presenting at Hot Chips; they absolutely attended, at least in the past. . One of the attractions of such meetings is that attendees can ask presenters questions; and, often enough, "I can't talk about that" is also an important answer.
  • name99 - Tuesday, August 24, 2021 - link

    This is not a technical point, so take it as you wish, but I would urge you to look inward as to why you ACTUALLY care about how Apple behaves in this respect.
    We've agreed that there is nothing going on that deserves the term parasite, no "unfair" witholding of information by Apple, no insights that couldn't be acted upon by others.

    And if you don't have the energy to work through patents and run your own tests to know how these CPUs work, well in the past people like Agner or Henry Wong provided the real, serious info at a much deeper level than HotChips talks, and for M1 people like Andrei, Dougall, and I have been doing the same thing, with deep dives published in various places.

    So look into yourself, look past the tribalism and mindlessness, and ask what you're REALLY upset about.
    My guess is that you want to bring the future forward; you want to experience that thrill of knowing what's new in the A15 now, not when Apple has their iPhone event in three weeks. You want to know today, not some time next year, how Apple will solve the issue of scaling up M1-sized concepts to the requirements of a Mac Pro.
    And that's perfectly human, we all want to know the future. But you have to realize that, in this particular sort of case, it's something like an addiction. You'll get a one-time thrill of knowing 2022's design in 2021. But then what? Now, in 2022 you will want 2023's design. After that one hit, you're still limited to only learning one year's worth of new design every year. Neither your epistemic situation, nor your level of joy, have actually improved. And if your chosen company, like Intel, submits to this addiction, things go south really fast, with Intel, every year, trying to provide more than one year's worth of future prophecy beyond what they did last year, till they're demoing utterly meaningless ten year projections. This all ends like any addiction ends.

    If you want to get a constant thrill of what might be coming, don't demand it from a company that has to produce real products; that simply cannot end well either for you (with a drastically telescoped future) or the company (locked into a roadmap that may make ever less sense). Instead read the stuff that *might* happen, but doesn't lock down the future -- read the academic literature, read what IMEC is doing.
  • Oxford Guy - Tuesday, August 24, 2021 - link

    Apple’s business model has been about speeding up planned obsolescence since the Apple III.

    (Demoing the Mac using a superior not-for-sale prototype surreptitiously is just one symptom of that.)
  • TristanSDX - Monday, August 23, 2021 - link

    Great dissapoitment. For ADL, on such conference I expected great detail of core design, instead there were replay of pretty shallow marketing info, and explanation of ThreadDirector. Crap and waste of time.
  • abufrejoval - Wednesday, August 25, 2021 - link

    Some things don't seem to change, ever, like the z/Arch chips: Tons of really good ideas, but useless, because they stay hell bent on selling to a very affluent niche.

    They've stuck with their mainframe snake oil since water cooled ECL, even when they went CMOS under the cover and yet for most credit card companies I know, their addiction was never really about the hardware, but the software stack. That software stack could run on the very same power chips that runs the i-Series (or ARM for that matter) quite quickly and reliably enough for pretty much everyone.

    You won't get these chips manufactured cheaply, but there is no technical hurdle to doing a lesser "E-Core" variant of z-Arch. And had they done so years ago, AMD64 might have never happened.

    And on that front: I'd have never thought I'd see a AMD HotChips presentation *that* boring. I think there wasn't a single bit of news in all that and they got caught in a very awkward moment of their product roadmap. (And I don't forgive them, that they made all those VM encryption options "server only": That is a move so stupid, I want to fire someone)

    It made all the trumpeting from Intel almost look impressive: Somebody sure thinks that there are major doubts on Intel's attractiveness in corporate/cloud decision makers mainds. They sure fire from all cannons, but it still sounds like stage thunder.
  • SystemsBuilder - Thursday, August 26, 2021 - link

    My take on it, as someone who attended Hot Chips and this session live:
    I was hugely disappointed by Intel's presentations. completely marketing department controlled - pretty much a rerun of intel's architecture days. I would even say that the way the intel presenters were speaking (monotone unengaging tone and very controlled sentences) it was 100% scripted and they never even one went of script or expanded outside what has already been release at the architecture days. not even in the Hot Chips Slack chat channel - 100% marketing messaging controlled. I feel bad for the Intel engineers being on super tight leach from they masters at the Marketing department.

    AMD did a much better job and it was a quite exciting presentation that actually released new exciting information.

    my conclusion of this session together with the Packaging session was that TWSC is at least 2-4 years a head of intel in packaging technology and that means AMD will continue to be 2-4 years ahead for the foreseeable future in terms of core scaling and performance... I remember Intel presenter said something defensive since he was presenting directly after the TSMC presenter like: We are focusing on packaging technology "at scale" clearly feeling the need to differentiate with that towards TSMC since his presentation was 2-3 years behind TSMC in pure tech terms - in my view.

Log in

Don't have an account? Sign up now