A fringe benefit of backside power delivery is that it will allow customers to stack less powerhungry chiplets below the more powerhungry ones without major disadvantages.
Right now, AMD X3D cpus clock lower than their normal ones, partially because power dissipation is compromised because there is a cache chiplet on top of the CPU one. They cannot stack the CPU on top of the cache because they'd have to turn half of the cache chiplet into TSVs for power. If they could power the CPU chiplet from the backside, they could do that and not have to compromise.
While I get what you're thinking, it's still wrong... if you have back side power delivery and you put the cache chiplet between the power and the high-power core chiplet, that would be even more TSVs than having the power within the power chiplet and sinking data connections to the cache... and that's also missing the even bigger problem that all the IO connections would also have to go through the cache if you stuck it under the core chiplet. Putting the cache on top is still the best case. In the long term (another couple generations / years from now) the power savings will allow higher performance, since thermals are what actually limits performance. That's the lesson Intel learned with the P4, then seemed to forget until the M1 outperformed their last-gen unlocked desktop parts... now they woke up for the second time lol.
> if you have back side power delivery and you put the cache chiplet between the power and the high-power core chiplet, that would be even more TSVs than having the power within the power chiplet and sinking data connections to the cache
How so? The only TSVs would be on the cache chiplet connecting data to the CPU. (Cache chiplet would be made on a process witthout backside power, obviously).
> the even bigger problem that all the IO connections would also have to go through the cache
This is desirable. The vast majority of the IO operations the CPU does are cached.
So you are basically admitting you understood nothing.
The whole point is to put power lines on the opposite side of signal lines. Power hungry chips will always be below memory chips, power rail needs the shortest path to the substrate to reduce loss (that's the whole point of flip-chip packaging). All BS-PDN does is to eliminate the necessity of flip-chip and signal vias.
>How so? The only TSVs would be on the cache chiplet connecting data to the CPU
You literally know nothing.
Right now it's (bottom to top) substrate -> CCD power + signal -> CCD transistor -> 3D$ power + signal -> 3D$ transistor.
Obviously vias are required for both signal and power because CCD transistors are in the middle.
With BS-PDN (CCD only), it's (bottom to top) substrate -> CCD power -> CCD transistor -> CCD signal -> 3D$ power + signal -> 3D$ transistor.
Now only power vias are required, signal can direct bond.
YOUR PROPOSAL with BS-PDN (CCD only), it's (bottom to top) substrate 3D$ transistor -> 3D$ power + signal -> CCD power -> CCD transistor -> CCD signal
Obviously CCD power needs to face down, otherwise you'll need to use power via in the CCD itself thus completely defeat the purpose of BS-PDN, or you use wire bond - reverting to pre-FCBGA era, which is EVEN WORSE
Either way since 3D$ is entirely in the way, you need CCD power vias through 3D$ chip, you also need 3D$ power vias because 3D$ transistor is in the way, on top of that you still need signal vias since CCD power and transistor are in the way.
Let's compare with CCD BS-PDN, **sane person** approach (3D$ on top): 3D$ power via (no signal via to substrate or 3D$, direct bonding in place) **your proposal** (CCD on top): CCD power via + 3D$ power via + 3D$/CCD signal via + signal via to substrate
>This is desirable. The vast majority of the IO operations the CPU does are cached.
That only works if data interface is on the 3D$ like Navi31, Infinity Fabric is on the CCD not 3D$. all data must go through CCD but not 3D$. Dumbass.
YOUR PROPOSAL II with BS-PDN (CCD only), it's (bottom to top) substrate -> 3D$ power + signal -> 3D$ transistor -> CCD power -> CCD transistor -> CCD signal **your proposal II** (CCD on top): CCD power via + 3D$/CCD signal via (though both dies instead on just CCD) + signal via to substrate
Hello. Just wanted to ask an opinion on MI300 stacking GCD and CCD on top of IO. Do you think RDNA4 can use something like that? I mean put MCD below GCD so if they go with 2 GCD of let's say 80-96 CUs each the size of the package will not be massive? I have like zero knowledge on this matter that's way I'm asking you. Do you think it's possible? And if yes can it cut any cost by making the package smaller or the tech is so expensive that exceeds the smaller package savings? I would really appreciate your answer.
Wish they allow upvote and downvote here. It's hilarious the person you are responding to has no idea what he was talking about and yet is still confident about it.
Please don't believe for a second that anything you said was even remotely correct. Backside power just puts metal layers on both sides of the transistor. One side (front) is for local signal routing, the other side (back) is for power. This helps increase logic and routing density because you are not mixing signal wires with power, so you get more signal wires. You can do the stacking either direction -- cache on bottom or cache on top. Either way the one on bottom has to have TSVs. It's WAY less expensive to put TSVs on a cache die than it is on a CPU die. CPUs have to be produced in your best, most expensive technology, while Caches do not. Putting the CPU on top would be way better for thermal conduction. AMDs Vcache is very neat, but it's not "cool" :)
And what yield would that be? Korean media claimed 63% yield in mid-2022. EE Times claimed 55% last week. One of them seems to be mistaken...
We do know that N5 yield rose from 50% to 80% after just a month of volume manufacturing. Chances are the analyst making the 55% claim acquired data that's more than a year old and did not update it appropriately.
TSMC graphs are always like this. They pre-announce what they are pretty sure they can deliver, not whatever wild fantasies they have about what might maybe work out in ten years. Unlike certain other companies in this space.
It's best to only show off what you can deliver, don't put up anything like "Wild Promises" & hope for the best. You're opening yourself up to unnecessary scrutiny, criticism, and potential lawsuits.
I know the table says ">1.15X" better density for N2 vs N3E. Isn't that a very small jump in density?Yes, the > sign suggests it can increase more but knowing marketing, companies usually want to communicate improvement as high as possible.
And it's failing miserably to the point it's already effectively discontinued before existing chip designs hitting the shelves. Apple will be the only user because of timeframe and prior commitment.
N3E and all following real 3nm-class from TSMC are much less dense.
Where by much less dense is meant "a few percent less dense".
N3B is supposed to be ~1.7x as dense, N3E to be ~1.6x as dense as N5. BUT That's for random logic that can make good use of FinFlex. SRAM is about 5% smaller on N3B, unchanged on N3E.
Also claims of "failing miserably" are dumb and misunderstand TSMC. We heard the same sorts of silly claims about eg N20 and then N10. TSMC operates by small cautious steps. The whole point of such a methodology is that when things don't work as well as they are supposed to, it's easy, practical, and non-disruptive to tweak the problematic steps and specs to move to a better process. That's exactly what we're seeing with the N3B/N3E redesign. It's no different conceptually from tweaking N7 to get the well regarded N6, or N5 to get N4. That's just how life WORKS if you're not committed to crazy leaps with no backup plan if the leap fails...
And if TSMC's official figure is to be believed, it's also less than 3.6% better than N4P. Further proof of density no longer bring performance.
N2 is on the same boat. Moderate density increase gives notable performance boost, and yes 10~15% IS notable for one generation. I don't know what you expect. N3E is only 3~7% better than N3, N3 is only -1~3.6% better than N4P.
+10-15% perf or -25-30% power is basically a full node improvements worth of benefit. See the numbers for N3 vs N5 or previous numbers for N5 vs N7. But in the case of N2 the performance improvements are not accompanied by density improvements.
Yes you’re right, on paper the improvement of 3nm looks similar to previous gen process nodes, however you’re ignoring another variable that should be considered in the equation, which is the price. On previous generation nodes, yes you reaped similar benefits as the 3nm node, but the node was also a similar price or cheaper compared to the previous node in that you could keep the die size the same, so instead of sticking with the same chip design as last year and just shrinking it and making it cheaper, chip designers could amplify the benefits of the new process node by adding in wider designs. But now with 3nm, the price increase is almost as much if not higher than the density increase, meaning new chips are going to have to stick with pretty much the existing transistor layout besides tweaking it a little bit to port it over to the new node, and only get the raw process node benefits, such as 15% higher freq. or 35% less power draw. Whereas on 7nm or 10nm, you would get that 15% higher frequency, but a wider design afforded by having the luxury of keeping the die size similar gave you that much more performance on top of that.
You're comparing apples to oranges. The 1.15x number is for "chip density", which as stated in the article is mixed density of logic, SRAM, and analog. Huge density increases like 1.7x for N3 were only for logic. SRAM and analog are hardly scaling (or not at all). Obviously N2 won't have 1.7x logic but it'll probably be something like 1.4-1.5x which for also introducing GAA is not bad.
The primary constraint on density right now (even for N3) is wiring. Smaller transistors are nice for power/performance reasons, but you can't pack them closer together than wiring allows.
BSPD will improve this situation substantially. BUT there are multiple different ways to implement BSPD. Some are focussed primarily on reducing the voltage drop of power delivery; some also care a lot about alleviating wiring congestion.
TSMC, in their usual way, seems to be approaching the problem by implementing the simplest solution first, presumably with a schedule to update the scheme one year at a time (so an N2+ or NA18 or whatever that switches to a Power Via type scheme, then maybe an NA16 that uses Backside Contacts). Intel believe they can bypass learning by doing and just jump straight to Power Via. These sorts of long distance leaps have worked out spectacularly badly for Intel (and then Samsung) over the past decade so I can't understand what makes them think this time will be different, but whatever, that's the way this lines up: - TSMC pretty much guaranteed to work in 26, to work better in 27, to work best of all in 28 - INTC claiming to be at level "better" in 2024Q3 with the all important caveat "Ramp/Retail may vary"...
So on time that in Apr 2023 you still did not see any N3 product and that is your on time.
TSMC N3 is CANCEL and N3E which is less density to N3 still on track.
That Intel Accountant CEO is rubbish but under PAT it can be late but not this, intel 4 is ramping, it does take 6 months from ramp to real product on the market, I think it did entered risk production back 2H 2022, so it did not slip.
That's chip/core density with non-logic components included on a Cortex-A72 afaik, N3E chip density improvement is 1.3x while the logic density is 1.6x. N3E no denser for cache than N5.
> that would be even more TSVs than having the power within the power chiplet and sinking data connections to the cache
What makes you think that? You make the bottom chiplet on an older process and power it from below, and the CPU from above. The only TSVs needed are the data connections from the cpu to the cache, on the chiplet that holds the cache.
> and that's also missing the even bigger problem that all the IO connections would also have to go through the cache
This is desirable. The vast majority of IO operations that the cpu does check cache first. The proportion of uncached (mostly PCIe) accesses is tiny compared to normal memory lookups.
"This timeline would put TSMC roughly two years behind rival Intel when it comes to backside power, assuming they're able to ship their own 20A process on time in 2024."
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
38 Comments
Back to Article
Tuna-Fish - Wednesday, April 26, 2023 - link
A fringe benefit of backside power delivery is that it will allow customers to stack less powerhungry chiplets below the more powerhungry ones without major disadvantages.Right now, AMD X3D cpus clock lower than their normal ones, partially because power dissipation is compromised because there is a cache chiplet on top of the CPU one. They cannot stack the CPU on top of the cache because they'd have to turn half of the cache chiplet into TSVs for power. If they could power the CPU chiplet from the backside, they could do that and not have to compromise.
linuxgeex - Wednesday, April 26, 2023 - link
While I get what you're thinking, it's still wrong... if you have back side power delivery and you put the cache chiplet between the power and the high-power core chiplet, that would be even more TSVs than having the power within the power chiplet and sinking data connections to the cache... and that's also missing the even bigger problem that all the IO connections would also have to go through the cache if you stuck it under the core chiplet. Putting the cache on top is still the best case. In the long term (another couple generations / years from now) the power savings will allow higher performance, since thermals are what actually limits performance. That's the lesson Intel learned with the P4, then seemed to forget until the M1 outperformed their last-gen unlocked desktop parts... now they woke up for the second time lol.Tuna-Fish - Sunday, April 30, 2023 - link
> if you have back side power delivery and you put the cache chiplet between the power and the high-power core chiplet, that would be even more TSVs than having the power within the power chiplet and sinking data connections to the cacheHow so? The only TSVs would be on the cache chiplet connecting data to the CPU. (Cache chiplet would be made on a process witthout backside power, obviously).
> the even bigger problem that all the IO connections would also have to go through the cache
This is desirable. The vast majority of the IO operations the CPU does are cached.
dotjaz - Thursday, May 11, 2023 - link
So you are basically admitting you understood nothing.The whole point is to put power lines on the opposite side of signal lines. Power hungry chips will always be below memory chips, power rail needs the shortest path to the substrate to reduce loss (that's the whole point of flip-chip packaging). All BS-PDN does is to eliminate the necessity of flip-chip and signal vias.
>How so? The only TSVs would be on the cache chiplet connecting data to the CPU
You literally know nothing.
Right now it's (bottom to top) substrate -> CCD power + signal -> CCD transistor -> 3D$ power + signal -> 3D$ transistor.
Obviously vias are required for both signal and power because CCD transistors are in the middle.
With BS-PDN (CCD only), it's (bottom to top) substrate -> CCD power -> CCD transistor -> CCD signal -> 3D$ power + signal -> 3D$ transistor.
Now only power vias are required, signal can direct bond.
YOUR PROPOSAL with BS-PDN (CCD only), it's (bottom to top) substrate 3D$ transistor -> 3D$ power + signal -> CCD power -> CCD transistor -> CCD signal
Obviously CCD power needs to face down, otherwise you'll need to use power via in the CCD itself thus completely defeat the purpose of BS-PDN, or you use wire bond - reverting to pre-FCBGA era, which is EVEN WORSE
Either way since 3D$ is entirely in the way, you need CCD power vias through 3D$ chip, you also need 3D$ power vias because 3D$ transistor is in the way, on top of that you still need signal vias since CCD power and transistor are in the way.
Let's compare with CCD BS-PDN,
**sane person** approach (3D$ on top): 3D$ power via (no signal via to substrate or 3D$, direct bonding in place)
**your proposal** (CCD on top): CCD power via + 3D$ power via + 3D$/CCD signal via + signal via to substrate
>This is desirable. The vast majority of the IO operations the CPU does are cached.
That only works if data interface is on the 3D$ like Navi31, Infinity Fabric is on the CCD not 3D$. all data must go through CCD but not 3D$. Dumbass.
dotjaz - Thursday, May 11, 2023 - link
YOUR PROPOSAL II with BS-PDN (CCD only), it's (bottom to top) substrate -> 3D$ power + signal -> 3D$ transistor -> CCD power -> CCD transistor -> CCD signal**your proposal II** (CCD on top): CCD power via + 3D$/CCD signal via (though both dies instead on just CCD) + signal via to substrate
CMOG - Wednesday, June 28, 2023 - link
Hello. Just wanted to ask an opinion on MI300 stacking GCD and CCD on top of IO. Do you think RDNA4 can use something like that? I mean put MCD below GCD so if they go with 2 GCD of let's say 80-96 CUs each the size of the package will not be massive? I have like zero knowledge on this matter that's way I'm asking you. Do you think it's possible? And if yes can it cut any cost by making the package smaller or the tech is so expensive that exceeds the smaller package savings? I would really appreciate your answer.mattbe - Friday, May 26, 2023 - link
Wish they allow upvote and downvote here. It's hilarious the person you are responding to has no idea what he was talking about and yet is still confident about it.jjjag - Monday, May 1, 2023 - link
Please don't believe for a second that anything you said was even remotely correct. Backside power just puts metal layers on both sides of the transistor. One side (front) is for local signal routing, the other side (back) is for power. This helps increase logic and routing density because you are not mixing signal wires with power, so you get more signal wires. You can do the stacking either direction -- cache on bottom or cache on top. Either way the one on bottom has to have TSVs. It's WAY less expensive to put TSVs on a cache die than it is on a CPU die. CPUs have to be produced in your best, most expensive technology, while Caches do not. Putting the CPU on top would be way better for thermal conduction. AMDs Vcache is very neat, but it's not "cool" :)Threska - Wednesday, April 26, 2023 - link
I imagine new layout tools will be needed to get the most of this technology.TeslaDomination - Wednesday, April 26, 2023 - link
I predict they’ll have even worse yield issues than their 3nm nodes.name99 - Thursday, April 27, 2023 - link
And what yield would that be?Korean media claimed 63% yield in mid-2022. EE Times claimed 55% last week.
One of them seems to be mistaken...
We do know that N5 yield rose from 50% to 80% after just a month of volume manufacturing. Chances are the analyst making the 55% claim acquired data that's more than a year old and did not update it appropriately.
Morawka - Thursday, April 27, 2023 - link
so Intel will have backside power delivery before TSMC? supposedly coming in 2024.name99 - Thursday, April 27, 2023 - link
Define "Have"...'As in "can demo a single chip for gullible media outlets" vs "can ship 50 million units over the next year"...
cellarnoise - Thursday, April 27, 2023 - link
Anyone find it troublesome that the timeline graphic stops at 2026? Shows more of the past than the future? Looks like things continue to slow down?name99 - Thursday, April 27, 2023 - link
TSMC graphs are always like this.They pre-announce what they are pretty sure they can deliver, not whatever wild fantasies they have about what might maybe work out in ten years. Unlike certain other companies in this space.
Kamen Rider Blade - Saturday, May 6, 2023 - link
It's best to only show off what you can deliver, don't put up anything like "Wild Promises" & hope for the best. You're opening yourself up to unnecessary scrutiny, criticism, and potential lawsuits.lemurbutton - Thursday, April 27, 2023 - link
I know the table says ">1.15X" better density for N2 vs N3E. Isn't that a very small jump in density?Yes, the > sign suggests it can increase more but knowing marketing, companies usually want to communicate improvement as high as possible.lemurbutton - Thursday, April 27, 2023 - link
N3 vs N5 is 1.7x better density for reference.dotjaz - Thursday, April 27, 2023 - link
And it's failing miserably to the point it's already effectively discontinued before existing chip designs hitting the shelves. Apple will be the only user because of timeframe and prior commitment.N3E and all following real 3nm-class from TSMC are much less dense.
name99 - Thursday, April 27, 2023 - link
Where by much less dense is meant "a few percent less dense".N3B is supposed to be ~1.7x as dense, N3E to be ~1.6x as dense as N5.
BUT
That's for random logic that can make good use of FinFlex.
SRAM is about 5% smaller on N3B, unchanged on N3E.
Also claims of "failing miserably" are dumb and misunderstand TSMC. We heard the same sorts of silly claims about eg N20 and then N10.
TSMC operates by small cautious steps. The whole point of such a methodology is that when things don't work as well as they are supposed to, it's easy, practical, and non-disruptive to tweak the problematic steps and specs to move to a better process. That's exactly what we're seeing with the N3B/N3E redesign. It's no different conceptually from tweaking N7 to get the well regarded N6, or N5 to get N4.
That's just how life WORKS if you're not committed to crazy leaps with no backup plan if the leap fails...
dotjaz - Thursday, April 27, 2023 - link
And if TSMC's official figure is to be believed, it's also less than 3.6% better than N4P. Further proof of density no longer bring performance.N2 is on the same boat. Moderate density increase gives notable performance boost, and yes 10~15% IS notable for one generation. I don't know what you expect. N3E is only 3~7% better than N3, N3 is only -1~3.6% better than N4P.
Ryan Smith - Thursday, April 27, 2023 - link
N2 is not intended to bring a big density increase. Its big feature is the switch to much better performing nanosheet (GAAFET) transistors.lemurbutton - Thursday, April 27, 2023 - link
If it's better performing, why isn't it reflected in the performance increase numbers? It's still "+10-15%" vs N3E.
techjunkie123 - Thursday, April 27, 2023 - link
+10-15% perf or -25-30% power is basically a full node improvements worth of benefit. See the numbers for N3 vs N5 or previous numbers for N5 vs N7. But in the case of N2 the performance improvements are not accompanied by density improvements.caribbeanblue - Friday, May 5, 2023 - link
Yes you’re right, on paper the improvement of 3nm looks similar to previous gen process nodes, however you’re ignoring another variable that should be considered in the equation, which is the price. On previous generation nodes, yes you reaped similar benefits as the 3nm node, but the node was also a similar price or cheaper compared to the previous node in that you could keep the die size the same, so instead of sticking with the same chip design as last year and just shrinking it and making it cheaper, chip designers could amplify the benefits of the new process node by adding in wider designs. But now with 3nm, the price increase is almost as much if not higher than the density increase, meaning new chips are going to have to stick with pretty much the existing transistor layout besides tweaking it a little bit to port it over to the new node, and only get the raw process node benefits, such as 15% higher freq. or 35% less power draw. Whereas on 7nm or 10nm, you would get that 15% higher frequency, but a wider design afforded by having the luxury of keeping the die size similar gave you that much more performance on top of that.dotjaz - Thursday, April 27, 2023 - link
What are you on about? +10-15% isn't 0%. How is it not better performing?FinFET is at the end of the road. It's no longer economically viable to increase performance on FinFET.
lemurbutton - Friday, April 28, 2023 - link
Previous node improvements included +10-15% increase in performance and 1.6 - 1.7x increase in density.This node only adds 1.15x increase in density and the same performance increase.
That's my point.
AzureNeptune - Saturday, April 29, 2023 - link
You're comparing apples to oranges. The 1.15x number is for "chip density", which as stated in the article is mixed density of logic, SRAM, and analog. Huge density increases like 1.7x for N3 were only for logic. SRAM and analog are hardly scaling (or not at all). Obviously N2 won't have 1.7x logic but it'll probably be something like 1.4-1.5x which for also introducing GAA is not bad.lemurbutton - Tuesday, May 2, 2023 - link
N5 to N3E is double the density of N3E to N2. No matter how you slice it, N2 seems like a weak full node upgrade based on what we know so far.name99 - Thursday, April 27, 2023 - link
The primary constraint on density right now (even for N3) is wiring. Smaller transistors are nice for power/performance reasons, but you can't pack them closer together than wiring allows.BSPD will improve this situation substantially. BUT there are multiple different ways to implement BSPD. Some are focussed primarily on reducing the voltage drop of power delivery; some also care a lot about alleviating wiring congestion.
The three basic BSPD schemes are described here: https://www.fabricatedknowledge.com/p/backside-pow...
TSMC, in their usual way, seems to be approaching the problem by implementing the simplest solution first, presumably with a schedule to update the scheme one year at a time (so an N2+ or NA18 or whatever that switches to a Power Via type scheme, then maybe an NA16 that uses Backside Contacts). Intel believe they can bypass learning by doing and just jump straight to Power Via. These sorts of long distance leaps have worked out spectacularly badly for Intel (and then Samsung) over the past decade so I can't understand what makes them think this time will be different, but whatever, that's the way this lines up:
- TSMC pretty much guaranteed to work in 26, to work better in 27, to work best of all in 28
- INTC claiming to be at level "better" in 2024Q3 with the all important caveat "Ramp/Retail may vary"...
my_wing - Friday, April 28, 2023 - link
So on time that in Apr 2023 you still did not see any N3 product and that is your on time.TSMC N3 is CANCEL and N3E which is less density to N3 still on track.
That Intel Accountant CEO is rubbish but under PAT it can be late but not this, intel 4 is ramping, it does take 6 months from ramp to real product on the market, I think it did entered risk production back 2H 2022, so it did not slip.
Zoolook - Sunday, April 30, 2023 - link
In that case we'll see Meteor Lake in July!I doubt it though, lately second half of the year means we'll scramble to launch before christmas.
Kamen Rider Blade - Saturday, May 6, 2023 - link
Proof is in the mass produced product, so until Intel delivers on it, it's just promises.caribbeanblue - Tuesday, May 2, 2023 - link
That's chip/core density with non-logic components included on a Cortex-A72 afaik, N3E chip density improvement is 1.3x while the logic density is 1.6x. N3E no denser for cache than N5.Tuna-Fish - Friday, April 28, 2023 - link
> that would be even more TSVs than having the power within the power chiplet and sinking data connections to the cacheWhat makes you think that? You make the bottom chiplet on an older process and power it from below, and the CPU from above. The only TSVs needed are the data connections from the cpu to the cache, on the chiplet that holds the cache.
> and that's also missing the even bigger problem that all the IO connections would also have to go through the cache
This is desirable. The vast majority of IO operations that the cpu does check cache first. The proportion of uncached (mostly PCIe) accesses is tiny compared to normal memory lookups.
quaz0r - Wednesday, May 10, 2023 - link
"This timeline would put TSMC roughly two years behind rival Intel when it comes to backside power, assuming they're able to ship their own 20A process on time in 2024."🤡
Santoval - Saturday, May 27, 2023 - link
All three PPA metrics are a bit disappointing for a brand new FET tech.I expected better values...
Oxford Guy - Wednesday, June 7, 2023 - link
How many nm is 2nm these days?