Comments Locked

42 Comments

Back to Article

  • Kevin G - Friday, December 17, 2021 - link

    “Apple's M1 family SoCs made at N5 run at 3.20 GHz, but if these SoCs were produced using N4X, then using TSMC's math they could theoretically be pushed to around 3.70 GHz or at an even higher frequency at voltages beyond 1.2V.”

    The big thing missing for context on this is simply what voltage does the M1 currently operate at? Say if it is 1.0V, then moving to the N4X process would result in a ~66% increase in power consumption for a meager ~16% increase in performance. For a chip targeted at mobile, that’d be a bad move give the emphasis on performance per watt.

    For the targeted markets, it’ll be attractive and those who need maximum performance regardless of power consumption.

    We’re also entering an era where chiplets are the norm so targeting (relatively) high voltages and clock focused manufacturing nodes to the specific aspects if a design is feasible. Things like stacked cached can continue to be made on more commodity nodes or IO coming from more analog friendly nodes etc. I have a feeling that if such advanced packing techniques were not going to be common in the HPC sector, we likely wouldn’t have seen this node as it’d be too crazy for a large monolithic design ( >500 mm^2 ).
  • Alistair - Friday, December 17, 2021 - link

    not true, you'd expect better frequencies all along the voltage curve...
  • brucethemoose - Friday, December 17, 2021 - link

    Not necessarily along the whole voltage curve, but probably at the tail end, yeah.
  • Hul8 - Saturday, December 18, 2021 - link

    > For a chip targeted at mobile, that’d be a bad move give the emphasis on performance per watt.

    I think the implication in the article was "if Apple decides they want workstation or server CPUs using the ARM architecture".
  • Kevin G - Saturday, December 18, 2021 - link

    I get there but when we’re in the era of chiplets, what is the better path to performance: going to higher voltages and clicks to increase single threaded performance or to increase core counts given the same power envelope as the voltage/clocks? 66% more cores or 16% or higher clocks? For the server market it’d be more cores. For the workstation market is probably the sole niche where the higher clock trade off makes sense given their workloads. Their is a desire for more cores in workstations up to a point. However the workstation market isn’t large enough by itself to make a specialized N4X processor chiplet vs. a more mobile or density focused node. We’ll continue to see high end desktop and low end server gear be repurposed for the traditional workstation.
  • whatthe123 - Sunday, December 19, 2021 - link

    you already have many-core designs floating in the wild with 100+ cores. it doesn't necessarily scale better even with the core advantage, probably because data access times becomes a problem. cache isn't shrinking nearly at the same pace as logic on modern nodes so it makes sense to fatten cores to a certain extent for fast single thread processing. cloud is the main market for shoving in a billion cores, but it's just one of many enterprise markets.
  • dotjaz - Sunday, December 19, 2021 - link

    The most ridiculous part is that's mere 4% increase over N4P, entirely pointless if the figures are true.
  • mode_13h - Saturday, January 1, 2022 - link

    That would also seem to depend on pricing, production capacity, and yield.
  • Alistair - Friday, December 17, 2021 - link

    5.30Ghz all core Ryzen CPUs? Sign me up! :)
  • evanh - Friday, December 17, 2021 - link

    I was under the impression that High-Performance-Computing is completely different to data centre servers. As far as I was aware, your typical web/file server is all about power efficiency. - The amount of compute per Watt. Under-clocking is the name of the game there.
  • brucethemoose - Friday, December 17, 2021 - link

    Server CPUs are still clocked relatively high, just not at nutty frequencies like today's desktop chips. I bet they're in the part of the voltage/frequency curve where they would benefit from this process.
  • evanh - Saturday, December 18, 2021 - link

    Relative being a comparative term, doesn't that imply that servers have to be on the low side when compared to other general computing?

    I'm still under the impression that HPC is different to data-centre servers. Is that no so?
  • Kevin G - Saturday, December 18, 2021 - link

    HPC can use commodity server hardware and networking if cost is a dominant factor. HPC so about tuna the server hardware toward maximize performance and do the scaling by the cluster size. Server clusters are generally in the single digits to cover for redundancy but each node has flexibility for the variety of server workloads and expansion.

    HPC can go a bit higher in terms of raw power per node and they get the funding to use exotic networking and cooling. They are still power limited like servers but where they allocate power to in the design and handle it is just different.

    Probably the biggest differentiator is that servers favor two (or more when possible) DIMMs per channel. This eats up a bit of power and lower performance a tad but the user get more memory in a system. The HPC side will focus on one DIMM per channel to focus on higher memory clocks and lower latency rather than capacity. Networking is also different has HPC is both latency and bandwidth focused. Traditional servers are more commodity Ethernet focused that has increasingly focused on virtualization and legacy scaling (ie a new server has to be on the same network as a decade old app server that can’t go down).
  • eldakka - Saturday, December 18, 2021 - link

    > As far as I was aware, your typical web/file server is all about power efficiency.

    Data Centre does not mean web/file servers. Sure, those are components of it, but, for example at my work, we have database servers running on PPC hardware using scores of cores (for single instances, we have many instances useing several hundred PPC cores all up). A transaction may require a lot of database work, and you want that to complete in 1/2 second no matter whether it's a simple select from a single indexed table, or involves multiple joins across several multi-billion-row tables, and SQL statements that are 20 lines long.

    There's a lot of biometrics in there, matching images, etc., all which again you want to complete in 1/2 second or less.

    No, there is a lot of high performance operations and hardware in data centres that might not be scientific HPC (weather modelling, universe simulations, mass brute-force decryption, etc.), but there is still plenty of high performance required.
  • KurtL - Saturday, December 18, 2021 - link

    High-performance computing is different and not different from running typical web/file servers. True supercomputing is all about cost efficiency and hence also power efficiency. It's really hard these days to argue that yo u need an infrastructure that uses millions of dollars of power per year these days. Reaching exascale is all about doing it at as reasonable power levels as possible. HPC is of course different from running web servers as depending on the application there is often a need for fast high-precision floating point computations, or for AI Inference, fast vector and matrix math on short datatypes. And it is also different because the thousands of processor chips have to be able to communicate with each other with as low latency as possible.

    Truly the only market in datacenters where high frequencies at any power level are acceptable, is for those applications that don't scale to a distributed memory model and need a single OS image, or even don't scale well to multiple cores and need as high single thread performance as possible.

    The N4X process really seems more useful to me for gaming PCs and workstation replacements (which would then be operated remotely as the cooling of such high power beasts is way too noisy for an office) than many datacenter or supercomputer applications.
  • ballsystemlord - Friday, December 17, 2021 - link

    How do they cool the chip with all that extra power running through it? Even AMD's current 7nm generation are thermally limited because of the density.
  • ballsystemlord - Friday, December 17, 2021 - link

    PS: For those who've not been following discussions of this sort. The Silicon is currently thermally limiting the amount of heat that can be dissipated at 7nm. At N4X (4nm) it will be worse.
  • Alistair - Friday, December 17, 2021 - link

    they are not thermally limited at all, clearly you don't own one...
  • Wrs - Saturday, December 18, 2021 - link

    5800x here, was thermally limited at just under 125w on the CCD, using a D15 air cooler. Samples do vary but there are plenty of 5900x/5950x that suffer the same; it's just hidden when a synthetic benchmark spreads the load among two CCDs. Mind you we're talking 1.5 watts/mm2, and when the die area expands that limit drops to converge on the one dimensional thermal design limit. Compare to existing N5 chips typically running 0.25 watts/mm2.
  • Alistair - Saturday, December 18, 2021 - link

    that's totally not true, I have a 5800x with a noctua U12, there is something seriously wrong with your mounting or your cooler

    150W on the U12 even, is fine
  • Wrs - Sunday, December 19, 2021 - link

    @Alistair just because you have a cooler running chip doesn't mean the majority of those chips are cool running. Anandtech noted how much more power per core their 5800x sample used. Also note I said CCD, not package. So excluding the IOD and interposer/substrate, how much wattage is actually going into the CCD? You'll probably find the 125W limit there. It's a physical reality of a small CCD and added thermal resistance from solder/IHS. Surely any U12 heatsink can handle more wattage, as long as the heat is more evenly spread out.

    I'm not the only one with a "mounting issue" - countless others on Reddit and elsewhere have asked whether nearing 90C at load is healthy or normal. And at least the chip seems to be surviving at that temp.
  • back2future - Sunday, December 19, 2021 - link

    ... numbers for 5950x are about 161mm² (/2) + 125 (IO) mm² from silicon chip surface from 3 chiplets on AM4 socket towards cooling devices. Given these numbers (105-)130W/161mm² TDP would get towards 250W for 1.5W/mm²?
    (GE patented, air flow efficiency: https://www.murata.com/-/media/webrenewal/products... )
  • ChronoReverse - Saturday, December 18, 2021 - link

    I'm curious what you think they're limited by then. Ryzen 5000 CPUs tend to perform significantly faster when undervolted so that they don't heat up as much and thus can clock up more and longer.
  • Alistair - Saturday, December 18, 2021 - link

    boost algorithms have nothing to with being thermally limited

    i've seen this internet meme about Ryzen 5800x being too hot everywhere and yet I've built 10+ computers with zero problems, and I can now state for certain that these people have no idea what they are talking about

    you should be at 75 degrees at load with that cooler, at most, not a problem, and indeed not a single reviewer had a problem

    at the end of the day the temperature doesn't matter as long as it is low enough to avoid throttling, and in the end the lower power draw of the 5800x compared to the 11900k etc. allows for cheaper coolers
  • Alistair - Saturday, December 18, 2021 - link

    ie u12s works perfectly with 5800x, not so with the 11900k etc.
  • Wrs - Sunday, December 19, 2021 - link

    The boost algo is certainly thermally aware. Here's another simple experiment: uninstall the heatsink while the processor is down clocked in Windows, spray R410a onto IHS then push an all-core load. You can easily see 300-400MHz higher all-core because of dropping the ambient 30-50C. An average 5800x suddenly looks like a stellar sample.

    Then again, most users won't complain of modern thermal throttling because it doesn't stutter like the early days.
  • ikjadoon - Friday, December 17, 2021 - link

    I think Cerebras' Wafer Scale Engine has some answers: massive liquid cooling systems, massive power supplies, and ginormous "MIMO" water blocks (multiple in, multiple out).

    HPC always will find the extra millions under someone's mattress. The best overview I've seen yet, but still rather limited.

    https://fuse.wikichip.org/news/3010/a-look-at-cere...
  • mode_13h - Saturday, January 1, 2022 - link

    Cerebras isn't a general-purpose HPC system, though. It's really focused on deep learning and other systolic processing problems. Lack of high-precision arithmetic and memory that's closely-coupled to the processing elements make it inappropriate for many algorithms.
  • Ryan Smith - Friday, December 17, 2021 - link

    "How do they cool the chip with all that extra power running through it?"

    Liquid cooling. It's already required for the AMD MI250X.
  • ballsystemlord - Saturday, December 18, 2021 - link

    Thanks Ryan and Ikjadoon.
  • mode_13h - Saturday, January 1, 2022 - link

    At Super Computing 2019, wasn't phase-change liquid cooling all the rage?
  • TheinsanegamerN - Saturday, December 18, 2021 - link

    They could always use pockets of dead silicon, just dead space, to provide a cooling area that helps suck heat out of cores or clusters. Said dead silicon from disabling e cores makes alderlake somewhat easier to cool.
  • brucethemoose - Friday, December 17, 2021 - link

    This is exactly what desktop CPUs need, where clockspeeds at the voltage wall are king. And I guess AMD has no reason to skip it, seeing how the laptop lineup (where you don't necessarily want those crazy clocks) will use entirely different chips, assuming their current design pattern holds.

    I wonder if Intel's fab division will come out with something similar? Rocket Lake and Tiger Lake were already kinda bifurcated like that.

    Also, I wonder what the area cost is? Some accelerators (like GPUs) would scale well with extra area vs extra frequency, but the cost surely isn't 15%... right?
  • nandnandnand - Saturday, December 18, 2021 - link

    I wouldn't be surprised if we see AMD release products made at TSMC, GlobalFoundries (12LP+), and Samsung in the same year.
  • mode_13h - Saturday, January 1, 2022 - link

    > the laptop lineup will use entirely different chips, assuming their current design pattern holds.

    Haven't they already said that desktop CPUs will all have an iGPU, from Zen 4 onward? That basically unifies them with laptop & desktop APUs, potentially limiting the compute-only chiplets for workstations & servers.
  • twtech - Saturday, December 18, 2021 - link

    I wouldn't mind seeing a Threadripper made on this node.
  • Oxford Guy - Saturday, December 18, 2021 - link

    High-frequency trading, i.e. bots controlled by the rich controlling the stock market.
  • name99 - Saturday, December 18, 2021 - link

    Something I don't get. When TSMC calls this an HPC node, do they actually mean an AMD desktop node?

    Real HPC does not appear to chase GHz for the obvious reason that doing so generates crazy amounts of heat.
    Top500 #1, Fugaku, runs at 2.2GHz on TSMC N7.
    #2,#3 are POWER at 3.1GHz
    #4 is 1.45GHz (China)
    #5 is Epyc at 2.45 GHz

    etc etc

    I suspect that this is another example of marketing gone mad -- TSMC have what is actually a desktop-targeted node, which will be used by AMD and nV for devices that can ramp up to "excess" GHz on the desktop, but which has no relevance to "real" HPC (except perhaps, if this is the only node AMD target, so they will ship chips that could run at 5.2GHz to actually run at 3GHz in some supercomputer, or data warehouse).

    Am I wrong in this analysis?
  • BushLin - Saturday, December 18, 2021 - link

    Broadly speaking, Supercomputers are put together to solve a particular type of problem and things like configuration and interconnectivity will be specifically designed with that in mind, clock speeds will be tuned to find efficiency for the code which is likely to be the same across a huge cluster...
    HPC workloads are varied to the point that that no assumptions can be made about what clock speed gives the best total cost of ownership while delivering the compute power required.
    There will be plenty of customers for high frequency cores (not desktop level boost clocks as these are run 24/7 with high levels of utilisation) with access to >128GB ECC RAM.
  • mode_13h - Saturday, January 1, 2022 - link

    You can only meaningfully compare clock speeds between two CPUs with the same microarchitecture. Chip designers target a particular critical path, which then dictates clock speed, based on the properties of a particular manufacturing process. A chip designed to use a longer critical path won't clock as high as one designed to a shorter target, but would still potentially benefit just as much from a process like N4X.

    The EPYC 7H12 and the F-tier processors provide some evidence that some HPC & server customers indeed care about clock speed:

    https://www.anandtech.com/show/16778/amd-epyc-mila...

    Plus, have you seen how high Nvidia, AMD, and Intel are pushing the power consumption of their A100, MI200, and PVC, respectively?
  • Dr_b_ - Sunday, December 19, 2021 - link

    Looking forward to new GPUs that use this, that you can't buy, but also use 1000W
  • FLORIDAMAN85 - Monday, December 20, 2021 - link

    Dr._b_ speaks for The Legion!

Log in

Don't have an account? Sign up now