"[...] with a power of 0.50 petajoules per bit transferred.", must be a really power hungry chip. Joking aside, pJ means pico Joule (PJ would be Peta Joule).
Stacking is such a dead end because it concentrates all the heat into a smaller area and spreads that heat to additional layers with all the negative consequences.
Chiplet designs like AMD is doing is the only way multichip systems are going to work efficiently IMO.
Doesn't seem like its meant to scale outwards. More like they're able to stack low power draw segments below the high power draw segment, so rather than all the heat working upwards most of it is already at the top. Seems like it might work but I don't think it's comparable to chiplets in yield and full cores per package.
Exactly, the main advantage of vertical stacking is that it can offer much higher bandwidth between dies than horizontally connected dies. Which is probably why they've started by stacking stuff on top of cache chips. It lets them stack another level of cache with each block of CPU cores, and they've still got Foveros to do horizontal connections similar to AMD chiplets to ramp up the total number of cores.
But because each stacked chiplet is made of two dies not one, they can cram twice as many CPU cores into each; reducing the number of NUMA domains (or have double the absolute number of cores) and making it easier for software to scale up.
No, just assuming that it's going to be expensive initially which means it'll be used in high end server chips at first (where the socket is huge for IO anyway, and can be made larger as needed without major issues) and not trickle down to size sensitive laptop parts for several more years.
Every single diagram I've seen stacks the CPU on the bottom with memory and other layers on top. All that CPU heat then has to move through each additional layer to cool the CPU including heating the cache layers up to the point where it will affect memory performance requiring lower memory clocks etc.
I get that the intent is to boost bandwidth to the CPU but the heat issue IMO negates that much of that benefit. It will be interesting if they can get the tech working and can produce at volume without a huge financial cost but frankly I think it's dead end tech for general CPU's. Maybe it has value in other applications but concentrating heat in smaller form factors IMO is never going to be a winning strategy.
I mean it says right in the article "As far as Intel sees it, the most power hungry layer is required to go on the top of the stack for the time being."
That is the only logical solution, yes, however current solutions do exactly the opposite. Standard PoP packaging is even worse, with a significant air gap in there as well.
That's because in standard pop, you've just got the two chips talking to each other; Intel's R&D has been about having the top chips connections go through the bottom chip to the PCB and delivering enough power to the top chip for high performance uses.
Doing that lets them put high performance chips on top for optimal cooling; but it also means that the top and bottom chips are tightly coupled as a pair. That in turn means that POP isn't going anywhere in the mobile market; with the possible exception of Apple, no one is going to design a custom ram chip that can only work with their SoC (because it has all the connections to route the SoC to the mobo going through it).
The Lakefield layers total 1mm thick. Is there a huge difference in temperature between top and bottom chiplets after the initial warm-up? Perhaps the new insulation material helps ...
from a Wired article, "An Intel Breakthrough Rethinks How Chips Are Made":
"Koduri keeps the specifics of exactly how Intel cracked those problems closely held. But he says that a combination of rigorous testing, a new power delivery process, and a wholly invented insulation material to dissipate the heat has helped the company avoid the typical pitfalls."
Not every design has to be focused on peak performance with infinite space/power. In something like a phone, thermals are limited by the chassis itself, so whether you have 150 square millimeters, or 50 square millimeters contacting the chassis, that makes little difference if the chassis can only dissipate 5 watts of power passively anyways. In that sort of installation, reducing power draw, and heat generation via interconnects is more important. Stacking has the potential to dramatically reduce this inefficiency/thermal waste, meaning of your 5 watts of cooling capacity goes even further. Not to mention packaging efficiency, allowing for larger batteries, etc.
Intel lost the phone (and other low power devices) market to ARM years ago, Intel's CPUs are all in the >5W range (most >20W) for which cooling is a major design consideration. At the Ultrabook level (<15W) stacking one layer above the CPU might be tolerable, going beyond this into high performance CPUs and/or multiple layers above the CPU is very questionable.
Intel's Atom designs idle in the same range as ARM and ARM peak consumption keeps creeping closer to Intel. Also, you're delusional if you think Intel is counting itself out of phones forever. The MediaTek partnership, just for connected laptops? No.
I'd be interested in your evidence for that first claim. They never *did* idle particularly close to ARM, and I'm not aware of a current direct comparison that could be made because we've still not seen the numbers for Lakefield.
They don't have to use many chiplet layers to get the benefits of layering logic. For example, Lakefield uses an ultra low leakage 22FL IO bottom layer, with the compute layer on top of it and with the two layers of DDR sitting on top.
I could easily see a Lakefield stack expanded outward to more cores and IO. Perhaps add in a couple of internal layers for L3 cache to reduce the area even more. Perhaps put the Gracemont cores on a layer of their own in the coming Alder Lake.
It's only a dead end if you prioritize GHz over every other aspect of your design... So yeah, Intel's in a crazy spot with the left hand not knowing what the right hand is doing. But for everyone else, the sorts of people using eg https://www.tsmc.com/english/dedicatedFoundry/tech... it's rather less of an issue.
Of course those are DRAM and logic, not stacked SRAM (ala Samsung) or logic on logic. But mainly that reflects the fact that Apple's current SoCs are small (so don't need to be made of multiple pieces) and TSMC's process can yield a 150mm^2 die without breaking a sweat. It will be interesting to see what geometries Apple uses for this as they want to go to larger logic.
Given that cube is the most practical solid shape to manufacture with a high SA/V ratio (for heat dissipation), cube shaped chips via stacking seems the inevitable destination for CPU manufacturing.
3D stacking will never be achieved with packages above 15W, maybe 20W tops (with a robust -and non thin- cooling solution, which however would take space and thus largely negate the area savings of stacking everything on a single package), without some form of active cooling between the dies. Lakefield has a mere 7W TDP and is already thermally constrained.
Microfluidic active cooling solutions have been tested in many labs for quite a few years now, but I doubt they are anywhere near ready to enter the market - unless the industry surprises us that is. Without them the TDP upper cap of 3D stacked SoCs should be in the ~15W range, with momentary TDP spikes of ~5W above that for the turbo clocks. Even then, a robust solution of combined active and passive cooling would be required, so these laptops would clearly not be silent. They would also get *hot*, quite hotter* than the current setup of distinct SoCs and RAM, since all the heat that was previously coming from tens of square cm would be coming from just ~1 cm², in a much "denser" form.
* Technically they would be just as hot, but they would have a far higher thermal density, so it would feel like they were hotter at a particular spot, unless that heat was dissipated effectively.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
31 Comments
Back to Article
randomTechSpec - Friday, August 14, 2020 - link
"[...] with a power of 0.50 petajoules per bit transferred.", must be a really power hungry chip.Joking aside, pJ means pico Joule (PJ would be Peta Joule).
LiKenun - Friday, August 14, 2020 - link
Still not enough energy to create a kugelblitz black hole with a Schwartzschild radius bigger than the radius of a proton…azfacea - Saturday, August 15, 2020 - link
of course not. what were you thinking ?? my farts often measure more than several Peta Joules.azfacea - Saturday, August 15, 2020 - link
BTW its not my fault there is no delete/edit button.Santoval - Saturday, August 15, 2020 - link
Enough however to power 9 - 10,000 thousand homes for a full year.YB1064 - Saturday, August 15, 2020 - link
It comes with Avengers packaging, so it must be good!rahvin - Friday, August 14, 2020 - link
Stacking is such a dead end because it concentrates all the heat into a smaller area and spreads that heat to additional layers with all the negative consequences.Chiplet designs like AMD is doing is the only way multichip systems are going to work efficiently IMO.
whatthe123 - Friday, August 14, 2020 - link
Doesn't seem like its meant to scale outwards. More like they're able to stack low power draw segments below the high power draw segment, so rather than all the heat working upwards most of it is already at the top. Seems like it might work but I don't think it's comparable to chiplets in yield and full cores per package.DanNeely - Friday, August 14, 2020 - link
Exactly, the main advantage of vertical stacking is that it can offer much higher bandwidth between dies than horizontally connected dies. Which is probably why they've started by stacking stuff on top of cache chips. It lets them stack another level of cache with each block of CPU cores, and they've still got Foveros to do horizontal connections similar to AMD chiplets to ramp up the total number of cores.But because each stacked chiplet is made of two dies not one, they can cram twice as many CPU cores into each; reducing the number of NUMA domains (or have double the absolute number of cores) and making it easier for software to scale up.
lmcd - Friday, August 14, 2020 - link
Ignoring that it's a smaller package size, which is a huge advantage for the design.DanNeely - Monday, August 17, 2020 - link
No, just assuming that it's going to be expensive initially which means it'll be used in high end server chips at first (where the socket is huge for IO anyway, and can be made larger as needed without major issues) and not trickle down to size sensitive laptop parts for several more years.Santoval - Saturday, August 15, 2020 - link
"and they've still got Foveros to do horizontal connections similar to AMD chiplets"You rather meant "they've still got *EMIB*" right?
Spunjji - Monday, August 17, 2020 - link
That could be a pretty sweet way to do things. It does sound expensive, though...rahvin - Friday, August 14, 2020 - link
Every single diagram I've seen stacks the CPU on the bottom with memory and other layers on top. All that CPU heat then has to move through each additional layer to cool the CPU including heating the cache layers up to the point where it will affect memory performance requiring lower memory clocks etc.I get that the intent is to boost bandwidth to the CPU but the heat issue IMO negates that much of that benefit. It will be interesting if they can get the tech working and can produce at volume without a huge financial cost but frankly I think it's dead end tech for general CPU's. Maybe it has value in other applications but concentrating heat in smaller form factors IMO is never going to be a winning strategy.
coburn_c - Friday, August 14, 2020 - link
Flip. Chip.whatthe123 - Friday, August 14, 2020 - link
I mean it says right in the article "As far as Intel sees it, the most power hungry layer is required to go on the top of the stack for the time being."Valantar - Saturday, August 15, 2020 - link
That is the only logical solution, yes, however current solutions do exactly the opposite. Standard PoP packaging is even worse, with a significant air gap in there as well.DanNeely - Saturday, August 15, 2020 - link
That's because in standard pop, you've just got the two chips talking to each other; Intel's R&D has been about having the top chips connections go through the bottom chip to the PCB and delivering enough power to the top chip for high performance uses.Doing that lets them put high performance chips on top for optimal cooling; but it also means that the top and bottom chips are tightly coupled as a pair. That in turn means that POP isn't going anywhere in the mobile market; with the possible exception of Apple, no one is going to design a custom ram chip that can only work with their SoC (because it has all the connections to route the SoC to the mobo going through it).
JayNor - Saturday, August 15, 2020 - link
The Lakefield layers total 1mm thick. Is there a huge difference in temperature between top and bottom chiplets after the initial warm-up? Perhaps the new insulation material helps ...from a Wired article, "An Intel Breakthrough Rethinks How Chips Are Made":
"Koduri keeps the specifics of exactly how Intel cracked those problems closely held. But he says that a combination of rigorous testing, a new power delivery process, and a wholly invented insulation material to dissipate the heat has helped the company avoid the typical pitfalls."
c4v3man - Friday, August 14, 2020 - link
Not every design has to be focused on peak performance with infinite space/power. In something like a phone, thermals are limited by the chassis itself, so whether you have 150 square millimeters, or 50 square millimeters contacting the chassis, that makes little difference if the chassis can only dissipate 5 watts of power passively anyways. In that sort of installation, reducing power draw, and heat generation via interconnects is more important. Stacking has the potential to dramatically reduce this inefficiency/thermal waste, meaning of your 5 watts of cooling capacity goes even further. Not to mention packaging efficiency, allowing for larger batteries, etc.Duncan Macdonald - Friday, August 14, 2020 - link
Intel lost the phone (and other low power devices) market to ARM years ago, Intel's CPUs are all in the >5W range (most >20W) for which cooling is a major design consideration. At the Ultrabook level (<15W) stacking one layer above the CPU might be tolerable, going beyond this into high performance CPUs and/or multiple layers above the CPU is very questionable.lmcd - Friday, August 14, 2020 - link
Intel's Atom designs idle in the same range as ARM and ARM peak consumption keeps creeping closer to Intel. Also, you're delusional if you think Intel is counting itself out of phones forever. The MediaTek partnership, just for connected laptops? No.Spunjji - Monday, August 17, 2020 - link
I'd be interested in your evidence for that first claim. They never *did* idle particularly close to ARM, and I'm not aware of a current direct comparison that could be made because we've still not seen the numbers for Lakefield.nandnandnand - Friday, August 14, 2020 - link
Stacking is the future, and AMD will be doing it too:https://www.anandtech.com/show/15590/amd-discusses...
Getting memory closer to logic can lower power consumption and heat.
FaaR - Friday, August 14, 2020 - link
It lowers total heat yes, but as mentioned, it can also create local hotspots, which is problematic.JayNor - Saturday, August 15, 2020 - link
Intel is building 144 layer NAND chips.They don't have to use many chiplet layers to get the benefits of layering logic. For example, Lakefield uses an ultra low leakage 22FL IO bottom layer, with the compute layer on top of it and with the two layers of DDR sitting on top.
I could easily see a Lakefield stack expanded outward to more cores and IO. Perhaps add in a couple of internal layers for L3 cache to reduce the area even more. Perhaps put the Gracemont cores on a layer of their own in the coming Alder Lake.
name99 - Saturday, August 15, 2020 - link
It's only a dead end if you prioritize GHz over every other aspect of your design...So yeah, Intel's in a crazy spot with the left hand not knowing what the right hand is doing. But for everyone else, the sorts of people using eg https://www.tsmc.com/english/dedicatedFoundry/tech...
it's rather less of an issue.
We already have various versions of stacking in the mobile world -- the obvious PoP, the newer InFO used by iPhones for a while now,
https://www.systemplus.fr/wp-content/uploads/2018/...
and the strange A12X hybrid
https://sst.semiconductor-digest.com/chipworks_rea...
Of course those are DRAM and logic, not stacked SRAM (ala Samsung) or logic on logic. But mainly that reflects the fact that Apple's current SoCs are small (so don't need to be made of multiple pieces) and TSMC's process can yield a 150mm^2 die without breaking a sweat. It will be interesting to see what geometries Apple uses for this as they want to go to larger logic.
surt - Saturday, August 15, 2020 - link
Given that cube is the most practical solid shape to manufacture with a high SA/V ratio (for heat dissipation), cube shaped chips via stacking seems the inevitable destination for CPU manufacturing.Santoval - Saturday, August 15, 2020 - link
3D stacking will never be achieved with packages above 15W, maybe 20W tops (with a robust -and non thin- cooling solution, which however would take space and thus largely negate the area savings of stacking everything on a single package), without some form of active cooling between the dies. Lakefield has a mere 7W TDP and is already thermally constrained.Microfluidic active cooling solutions have been tested in many labs for quite a few years now, but I doubt they are anywhere near ready to enter the market - unless the industry surprises us that is. Without them the TDP upper cap of 3D stacked SoCs should be in the ~15W range, with momentary TDP spikes of ~5W above that for the turbo clocks. Even then, a robust solution of combined active and passive cooling would be required, so these laptops would clearly not be silent.
They would also get *hot*, quite hotter* than the current setup of distinct SoCs and RAM, since all the heat that was previously coming from tens of square cm would be coming from just ~1 cm², in a much "denser" form.
* Technically they would be just as hot, but they would have a far higher thermal density, so it would feel like they were hotter at a particular spot, unless that heat was dissipated effectively.
FunBunny2 - Saturday, August 15, 2020 - link
"Microfluidic active cooling solutions"IBM did that 40 years, albeit at a macro level: https://www.ibm.com/ibm/history/exhibits/vintage/v...
Frenetic Pony - Friday, August 14, 2020 - link
As far as I understand it, Hybrid Bonding looks to be fairly expensive, with not a lot of ways around that. Semi-engineering has a good overview of it: https://semiengineering.com/the-race-to-much-more-...