Does anyone know how they managed to add 64MB instead of 32MB? The size of the cache die seem the same as the cache portion of the CCD. Is AMD using some kind of dual layer process, similar to nand flash layers? Or perhaps using some kind of denser but slower cache implementation?
TSMC also demonstrated 12 inactive layers. It's possible that 2 layers is nowhere near the limit, but it's what yields allow for now. You would hope that 12 layers would be less than 6x the cost of 2 layers in the future.
Samsung also announced "X-Cube" 3D SRAM back in 2020, with an unknown amount of layers:
Thank you for the links, they were very helpful, although perhaps in the reverse way. It actually is confirmed in there that it is a single but denser layer.
Completely at the end of the article in the first link: "The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market."
It would be interesting to investigate what effect this has on the latency. I am sure Anandtech is already on top of it =).
taking above numbers: 128 threads are about $60 each thread (for comparing to cores with only 1 thread, 64threads are ~$140 each core) 512MB 3D SRAM is about $1.7/MB on 64 core EPYC 7773X (or 7763) and lower $1.4/MB for EPYC 7573X (or 75F3), 32 core, around $100/core for 24 core 74x3(X) (~$1.9/MB) and some $195/core (but each class highest Base/1T freq on lower 240W TDP) for 73x3(X) 16 core EPYC (with ~$1.3/MB), would sum to ~$1.57/MB for EPYC 3D SRAM on average?
if 16 threads are worth 40W TDP budget, and 512MB 3D V-cache is a Base freq difference of ~250-300MHz and 16 threads are worth a TDP budget comparable to ~250MHz Base freq (32 cores ~500-600MHz) on dominance of conventional L3 cache (256MB types) compared to additional 64MB V-cache, then 512MB 3D V-cache is worth roughly 40(to <80)W TDP on highest sRAM load also? Difficult for 12layers on that production node?
(if mostly reasonable) translates to 1.5-3W/(1 billion transistors of AMD 3D V-Cache) on full load performances for TSMC's N7 process node including interconnector resistances (?)
BTW, there's talk of RDNA3 GPUs having a 3D implementation of Infinity Cache, so we could see a similar technology with a different amount of SRAM (up to 512 MiB total) on a new node.
sRAM write endurance (for sustained 4GHz cpu cache clock speeds, 24/7) is about 10-30yrs and somewhat verified 7nm power requirements for 8T sRAM cells (4kB cells arrangement) transferred to this 6T sRAM V-cache might top out on a comparable 120-130W/512MB for absolute maximum power dissipation. This, around 1-4W(?)/mm² of V-cache chiplet, seems pretty high numbers? (From 0.25um-65nm high-power CMOS era, this cache sizes would require 1-2(-3)digit kW power transfer and heat dissipation, and/but 3nm for sRAM (gate length) will be getting a tough challenge)
Is there any hint that the thicker die may also impact thermals, because heat from the CCX has to pass through the additional silicon on its way to the heatspreader? I know it's very thin, but the impact can't be truly zero...
There's probably some non-zero thermal penalty, but the disabled overclocking on the 5800X3D may have more to do with not wanting any possible issues where differing thermal expansion damages the connections between the CPU die and the cache die. It's probably also a good idea to keep voltages within tighter tolerances.
I don't remember where I heard it any longer, but I heard it somewhere through the grapevine that the reason for disabling overclocking was that the V-cache die has significantly tighter voltage tolerances, basically requiring a single specified voltage.
It also shares a common voltage plane with the underlying compute die.
The L3 stack seems to be hard limited to 1.35v, rather than the 1.5v of the standard CCD, so the whole package is reigned in and locked down to the lower ceiling as a consequence.
The extra SRAM dies only cover the existing SRAM and associated plumbing on the CCD. The Zen 3 cores themselves are not covered by any silicon (though AMD does use a thermal spacer to keep the chip height consistent).
I think the correct perspective is 2x transistors in the same surface area, rather than the problem of passing heat through additional silicon. That's the real issue.
Given that the base die has been shaved down to preserve the total Z-height, I'm not sure I'd expect any significant difference from that. I'm sure there's *something* due to the additional interface between the dies, but that has to be extremely minimal. I think Iketh's perspective that there's simply more transistors in the same volume is far more relevant, then.
They cut down on the base and boost frequencies quite a bit. I wonder if the chips are harder to cool, or there's some latency hiding involved in accessing the additional cache.
SRAM is a notoriously power-hungry and transistor-inefficient type of memory, hence why it's restricted to CPU caches. I doubt there's much of a latency issue, since it's 3D stacked.
AMD chip’s performance is still behind Intel's Alder Lake or Sapphire Rapids, as Intel CEO explained, AMD is behind the rear mirror of Intel's. It will be so from here on, especially when Intel gets ahead of TSMC in 3nm or lower in 2023 or 2024, it will have the most advanced technology to make chips. I cannot see AMD could have any bright future, it will always be the second fiddler from here on as it always has been, it may sink Xilink along with it too.
There's definitely interest in Xeons at the datacenter level. Gracemont looks like a great fit for traditional datacenters, and the socket compatibility with Golden Cove-based designs adds an impressive level of flexibility for the platform.
> No one is interested in Xeons at the datacenter level.
AMD/TSMC simply can't supply the volume to shut out Intel. They still have certainly < 20% of this market. The bigger threat to Intel's datacenter ambitions is probably ARM, if you include various in-house efforts like Amazon's Graviton.
Unfortunately, whether you're interested in Xeon systems or not, it's virtually impossible to find anything else right now. Demand for EPYC servers is so high, there's no supply available for anyone ordering in small batches.
We've had to resort to Xeon systems the past 12 months as those can be received within weeks. We're still waiting for AMD servers we ordered almost a year ago!
We're been using mostly AMD since the Athlon64 and Athlon for laptops, desktops, tower servers, and rack mount servers. We just can't find them anymore. :(
Or course the Intel CEO would say that. But given the regular lies that came from Intel in the last years it’s hardly the truth. Then again the same CEO is as delusional to claim he can win Apple back, despite Apple being way better in execution, and simply too powerful on its own to be remotely interested in going back into the shackles of being dependent on intels subpar CPUs again.
Oh and AMDs CEO said it’s the best CPUs for anything including gaming, given the recent track record of AMD, ie saying the truth and not lying, I’d rather believe them. They also regularly post benchmarks where they lose to to give a fair perspective, something intel would never do.
True, but of course you can take benchmarks from the vendor at least as some basic indicator. And some vendors over promise consistently, others tend to over deliver. No guarantees, they all cherry pick their benchmarks but amd has been pretty consistently fair and accurate with their performance promises in the last 5-6 years or so (since zen, basically).
> the same CEO is as delusional to claim he can win Apple back
Huh? Where & when did that happen?
The only way I can see Apple *remotely* considering it is if Intel somehow makes considerably better ARM CPUs than Apple, which seems unlikely. There's definitely no way Intel can make a x86 CPU good enough to lure Apple back, that's for sure! Apple is definitely *done* with x86.
He did not state that he will get Apple back but said he is going to do everything he can. The latter is still the claim that it’s not only possible but likely enough to merit the mention.
That's a little different. It sounds like a response to one of those dumb press conference questions with only one correct answer. If he gave a realistic answer like: "oh well; win some, lose some", that would be judged negatively by investors. He's smart enough to know that.
He must know that there's 0% chance Apple is returning to x86. So, either it was a BS answer to a BS question, or he just tipped his hand that Intel is seriously designing its own on ARM cores.
What's known about the topology of their L3 cache? Is it effectively a unified block that sits in front of the link to the I/O die?
Also, this figure of 2 TB/s - does it apply to L3 cache accesses for non 3D variants, or is that only enabled by the additional concurrency provided by the stacked layers?
As stated in the article, it will behave like regular L3 cache with the same bandwidth and latency, the only special thing about it is that it’s more dense than regular L3 cache and 3D stacked.
If you are battling to structure your paper for assignment, you will find the best solution at our Assignment Help platform. Great Assignment Helper provides you the 360degree support required to develop premium quality assignments worthy of fetching excellent grades.
Funny enough, I remember wondering why we would possibly need to keep increasing RAM capacities at the same rate? Why could we possibly need a whole lot more?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
58 Comments
Back to Article
ddhelmet - Monday, March 21, 2022 - link
EPYC 74F3 24 78typo i think
Ryan Smith - Monday, March 21, 2022 - link
Thanks!cchi - Monday, March 21, 2022 - link
Does anyone know how they managed to add 64MB instead of 32MB? The size of the cache die seem the same as the cache portion of the CCD.Is AMD using some kind of dual layer process, similar to nand flash layers?
Or perhaps using some kind of denser but slower cache implementation?
nandnandnand - Monday, March 21, 2022 - link
It's two layers, check the analysis here:https://www.anandtech.com/show/16725/amd-demonstra...
TSMC also demonstrated 12 inactive layers. It's possible that 2 layers is nowhere near the limit, but it's what yields allow for now. You would hope that 12 layers would be less than 6x the cost of 2 layers in the future.
Samsung also announced "X-Cube" 3D SRAM back in 2020, with an unknown amount of layers:
https://www.anandtech.com/show/15976/samsung-annou...
cchi - Monday, March 21, 2022 - link
Thank you for the links, they were very helpful, although perhaps in the reverse way. It actually is confirmed in there that it is a single but denser layer.Completely at the end of the article in the first link:
"The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market."
It would be interesting to investigate what effect this has on the latency. I am sure Anandtech is already on top of it =).
nandnandnand - Monday, March 21, 2022 - link
Oops. I don't even want to count how many times I've made that mistake.Thunder 57 - Monday, March 21, 2022 - link
From Tomshardware:"As a result, the L3 chiplet provides the same 2 TB/s of peak throughput as the on-die L3 cache, but it only comes with a four-cycle latency penalty."
back2future - Monday, March 21, 2022 - link
taking above numbers:128 threads are about $60 each thread (for comparing to cores with only 1 thread, 64threads are ~$140 each core)
512MB 3D SRAM is about $1.7/MB on 64 core EPYC 7773X (or 7763) and lower $1.4/MB for EPYC 7573X (or 75F3), 32 core, around $100/core for 24 core 74x3(X) (~$1.9/MB) and some $195/core (but each class highest Base/1T freq on lower 240W TDP) for 73x3(X) 16 core EPYC (with ~$1.3/MB),
would sum to ~$1.57/MB for EPYC 3D SRAM on average?
back2future - Monday, March 21, 2022 - link
if 16 threads are worth 40W TDP budget, and 512MB 3D V-cache is a Base freq difference of ~250-300MHz and 16 threads are worth a TDP budget comparable to ~250MHz Base freq (32 cores ~500-600MHz) on dominance of conventional L3 cache (256MB types) compared to additional 64MB V-cache, then 512MB 3D V-cache is worth roughly 40(to <80)W TDP on highest sRAM load also?Difficult for 12layers on that production node?
back2future - Monday, March 21, 2022 - link
(if mostly reasonable) translates to 1.5-3W/(1 billion transistors of AMD 3D V-Cache) on full load performances for TSMC's N7 process node including interconnector resistances (?)nandnandnand - Monday, March 21, 2022 - link
BTW, there's talk of RDNA3 GPUs having a 3D implementation of Infinity Cache, so we could see a similar technology with a different amount of SRAM (up to 512 MiB total) on a new node.magtknp101 - Wednesday, March 23, 2022 - link
Who is the <a href="https://www.linkedin.com/in/syed-bilal-ahmad/"... founder</a> in India is Mr Syed Bilal Ahmad who take this brand to another level.mode_13h - Thursday, March 24, 2022 - link
@magtknp101, don't be a spammer.back2future - Wednesday, March 23, 2022 - link
sRAM write endurance (for sustained 4GHz cpu cache clock speeds, 24/7) is about 10-30yrs and somewhat verified 7nm power requirements for 8T sRAM cells (4kB cells arrangement) transferred to this 6T sRAM V-cache might top out on a comparable 120-130W/512MB for absolute maximum power dissipation. This, around 1-4W(?)/mm² of V-cache chiplet, seems pretty high numbers?(From 0.25um-65nm high-power CMOS era, this cache sizes would require 1-2(-3)digit kW power transfer and heat dissipation, and/but 3nm for sRAM (gate length) will be getting a tough challenge)
back2future - Wednesday, March 23, 2022 - link
Correction: "This, around 1-4W(?)/mm² of V-cache chiplet, seems pretty high numbers?"This, around 0.15-0.45W/mm² for V-cache chiplet (since 36mm² each 64MB, so divided by 8), sound reasonable and within public numbers
back2future - Wednesday, March 23, 2022 - link
for comparison: NVIDIA H100 (gpu GH100 area 814mm²) is ~0.86W/mm²https://www.anandtech.com/show/17327/nvidia-hopper...
myself248 - Monday, March 21, 2022 - link
Is there any hint that the thicker die may also impact thermals, because heat from the CCX has to pass through the additional silicon on its way to the heatspreader? I know it's very thin, but the impact can't be truly zero...nandnandnand - Monday, March 21, 2022 - link
A hint like the lower clock speeds and disabled overclocking on the 5800X3D?WaltC - Monday, March 21, 2022 - link
I think the lower clocks and clock speed locks are to maintain the same TDP between the v-cache and standard versions.TheinsanegamerN - Monday, March 21, 2022 - link
Except nothing else in the lineup is locked.Spunjji - Monday, March 21, 2022 - link
There's probably some non-zero thermal penalty, but the disabled overclocking on the 5800X3D may have more to do with not wanting any possible issues where differing thermal expansion damages the connections between the CPU die and the cache die. It's probably also a good idea to keep voltages within tighter tolerances.Dolda2000 - Monday, March 21, 2022 - link
I don't remember where I heard it any longer, but I heard it somewhere through the grapevine that the reason for disabling overclocking was that the V-cache die has significantly tighter voltage tolerances, basically requiring a single specified voltage.Slash3 - Monday, March 21, 2022 - link
It also shares a common voltage plane with the underlying compute die.The L3 stack seems to be hard limited to 1.35v, rather than the 1.5v of the standard CCD, so the whole package is reigned in and locked down to the lower ceiling as a consequence.
Ryan Smith - Monday, March 21, 2022 - link
The extra SRAM dies only cover the existing SRAM and associated plumbing on the CCD. The Zen 3 cores themselves are not covered by any silicon (though AMD does use a thermal spacer to keep the chip height consistent).Iketh - Monday, March 21, 2022 - link
I think the correct perspective is 2x transistors in the same surface area, rather than the problem of passing heat through additional silicon. That's the real issue.nandnandnand - Monday, March 21, 2022 - link
The SRAM should use less power than cores. But there was "structural silicon" added to the package.Dolda2000 - Monday, March 21, 2022 - link
Given that the base die has been shaved down to preserve the total Z-height, I'm not sure I'd expect any significant difference from that. I'm sure there's *something* due to the additional interface between the dies, but that has to be extremely minimal. I think Iketh's perspective that there's simply more transistors in the same volume is far more relevant, then.ballsystemlord - Monday, March 21, 2022 - link
They cut down on the base and boost frequencies quite a bit. I wonder if the chips are harder to cool, or there's some latency hiding involved in accessing the additional cache.Wereweeb - Monday, March 21, 2022 - link
SRAM is a notoriously power-hungry and transistor-inefficient type of memory, hence why it's restricted to CPU caches. I doubt there's much of a latency issue, since it's 3D stacked.Jimbo123 - Monday, March 21, 2022 - link
AMD chip’s performance is still behind Intel's Alder Lake or Sapphire Rapids, as Intel CEO explained, AMD is behind the rear mirror of Intel's. It will be so from here on, especially when Intel gets ahead of TSMC in 3nm or lower in 2023 or 2024, it will have the most advanced technology to make chips. I cannot see AMD could have any bright future, it will always be the second fiddler from here on as it always has been, it may sink Xilink along with it too.mjz_5 - Monday, March 21, 2022 - link
Why do people love Intel so much! They have been screwing people forever!Iketh - Monday, March 21, 2022 - link
why are you replying to such an obvious trollmsroadkill612 - Tuesday, March 22, 2022 - link
A dubiously literate one at that. I spot ~4 errors. Last I heard, that was a requisite to being informed & communicating.mode_13h - Tuesday, March 22, 2022 - link
> Why do people love Intel so much!Could be an attempt at stock price manipulation... or just a bitter Intel shareholder.
mdriftmeyer - Tuesday, March 22, 2022 - link
Keep dreaming on that. No one is interested in Xeons at the datacenter level.lmcd - Tuesday, March 22, 2022 - link
There's definitely interest in Xeons at the datacenter level. Gracemont looks like a great fit for traditional datacenters, and the socket compatibility with Golden Cove-based designs adds an impressive level of flexibility for the platform.mode_13h - Wednesday, March 23, 2022 - link
> No one is interested in Xeons at the datacenter level.AMD/TSMC simply can't supply the volume to shut out Intel. They still have certainly < 20% of this market. The bigger threat to Intel's datacenter ambitions is probably ARM, if you include various in-house efforts like Amazon's Graviton.
phoenix_rizzen - Thursday, March 31, 2022 - link
Unfortunately, whether you're interested in Xeon systems or not, it's virtually impossible to find anything else right now. Demand for EPYC servers is so high, there's no supply available for anyone ordering in small batches.We've had to resort to Xeon systems the past 12 months as those can be received within weeks. We're still waiting for AMD servers we ordered almost a year ago!
We're been using mostly AMD since the Athlon64 and Athlon for laptops, desktops, tower servers, and rack mount servers. We just can't find them anymore. :(
mode_13h - Thursday, March 31, 2022 - link
Sorry for your troubles, but thanks for the data point!Khanan - Tuesday, March 22, 2022 - link
Or course the Intel CEO would say that. But given the regular lies that came from Intel in the last years it’s hardly the truth. Then again the same CEO is as delusional to claim he can win Apple back, despite Apple being way better in execution, and simply too powerful on its own to be remotely interested in going back into the shackles of being dependent on intels subpar CPUs again.Oh and AMDs CEO said it’s the best CPUs for anything including gaming, given the recent track record of AMD, ie saying the truth and not lying, I’d rather believe them. They also regularly post benchmarks where they lose to to give a fair perspective, something intel would never do.
josmat - Tuesday, March 22, 2022 - link
> I’d rather believe them.I'd rather believe independent benchmarks, never what the company's CEO says.
jospoortvliet - Tuesday, March 22, 2022 - link
True, but of course you can take benchmarks from the vendor at least as some basic indicator. And some vendors over promise consistently, others tend to over deliver. No guarantees, they all cherry pick their benchmarks but amd has been pretty consistently fair and accurate with their performance promises in the last 5-6 years or so (since zen, basically).mode_13h - Wednesday, March 23, 2022 - link
> the same CEO is as delusional to claim he can win Apple backHuh? Where & when did that happen?
The only way I can see Apple *remotely* considering it is if Intel somehow makes considerably better ARM CPUs than Apple, which seems unlikely. There's definitely no way Intel can make a x86 CPU good enough to lure Apple back, that's for sure! Apple is definitely *done* with x86.
Oxford Guy - Friday, March 25, 2022 - link
‘Huh? Where & when did that happen?’It’s common knowledge. Google it.
He did not state that he will get Apple back but said he is going to do everything he can. The latter is still the claim that it’s not only possible but likely enough to merit the mention.
mode_13h - Saturday, March 26, 2022 - link
> said he is going to do everything he can.That's a little different. It sounds like a response to one of those dumb press conference questions with only one correct answer. If he gave a realistic answer like: "oh well; win some, lose some", that would be judged negatively by investors. He's smart enough to know that.
He must know that there's 0% chance Apple is returning to x86. So, either it was a BS answer to a BS question, or he just tipped his hand that Intel is seriously designing its own on ARM cores.
Oxford Guy - Monday, April 11, 2022 - link
'It sounds like a response to one of those dumb press conference questions with only one correct answer.'It wasn't. It was something he volunteered all by himself.
mode_13h - Tuesday, March 22, 2022 - link
What's known about the topology of their L3 cache? Is it effectively a unified block that sits in front of the link to the I/O die?Also, this figure of 2 TB/s - does it apply to L3 cache accesses for non 3D variants, or is that only enabled by the additional concurrency provided by the stacked layers?
Khanan - Tuesday, March 22, 2022 - link
As stated in the article, it will behave like regular L3 cache with the same bandwidth and latency, the only special thing about it is that it’s more dense than regular L3 cache and 3D stacked.529th - Tuesday, March 22, 2022 - link
Will the EPYC 7373X be available outside of OEM prebuilds? Could you imagine that for a gaming rig, lol I want one! lolnandnandnand - Wednesday, March 23, 2022 - link
Zen 4 desktop with or without 3D cache would game better.Oxford Guy - Wednesday, March 23, 2022 - link
Games aren’t complex enough yet. Developers have to enable them to run on cheaper simpler CPUs.bob27 - Wednesday, March 23, 2022 - link
"tripling the total about of L3 cache available"about -> amount
kennedyjones102 - Sunday, March 27, 2022 - link
If you are battling to structure your paper for assignment, you will find the best solution at our Assignment Help platform. Great Assignment Helper provides you the 360degree support required to develop premium quality assignments worthy of fetching excellent grades.mode_13h - Monday, March 28, 2022 - link
spammer.widyahong - Monday, March 28, 2022 - link
Now we can run windows xp without memory :D (kidding)mode_13h - Wednesday, March 30, 2022 - link
Heh, good point. I think my first Win XP machine had 512 MB, which was a decent amount for those days.mode_13h - Wednesday, March 30, 2022 - link
Funny enough, I remember wondering why we would possibly need to keep increasing RAM capacities at the same rate? Why could we possibly need a whole lot more?