Yeah I was about to comment on this myself. 32 split across both CCDs doesn't match with the image at the top of the article and the more I think about it the less sense it would make anyway.
Putting it another way, 1 core on the 7800X3D is capable of accessing up to 96 MB of L3 cache. Reducing that to 64 MB for the 7900X3D and 7950X3D would be a bizarre choice and sometimes perform worse.
Instead, for some reason, you will get one CCD with V-Cache and one without. There's speculation that the one without V-Cache will be able to clock higher because of that.
Applications and games could have a preference for the CCD with higher clock speeds or the one with more L3 cache, or prefer to put different threads on each one. This is AMD's first pseudo-hybrid CPU.
So one die has triple the cache but lower clocks (I presume up to 5ghz, like the 7800X3D), the other higher clocks but lacking the extra cache. Sounds like an absolute scheduling nightmare - as each thread / application will perform differently, the kernel would have to somehow determine on which ccd the app or thread performs best and move it there. Don’t hold your breath for optimal support for this :/
I guess the result will be that you can manually say “load this ccd first” (not helpful of course) and generally can expect hugely variable performance between runs. Essentially while it might bench well in games, for everything else all bets are off: it COULD do well depending on luck (aka match between performance needs of thread and ccd it is scheduled on).
AMD has already said that they've done significant work with Microsoft to update the Windows thread scheduler to properly take full advantage of the Zen 4 X3D layout. Pushing cache heavy workloads to the V Cache CCD & compute/clock-speed heavy workloads to the other.
And it won't be NEARLY as hard as you seem to think to do! Cache & memory access heavy processes are EASILY identifiable by the OS, making it trivial for it to move them to V Cache CCD if the OS' thread scheduler is actually coded to do so. Which Windows' will be.
> Cache & memory access heavy processes are EASILY identifiable by the OS
But how can it? Unless the OS is intensely profiling them somehow, or just looking up executable names, I dont see how it can tell the difference between ffmpeg and a minecraft server maxing out a core.
Maybe they can make some assumptions, like "a process using the GPU probably should run on the cache die" like they already use for big/little cores and HT, but that will only get them so far.
Windows (and Linux DEs) will need a ProcessLasso-like utility soon with all the assynmety making its way into desktop CPUs.
CPUs nowadays have fairly non-intrusive performance counters; the kernel can ask the CPU to count how many L2 and L3 cache misses a process has, and use that information for scheduling.
It's not speculation, AMD has officially confirmed that's the reason they did it (for single & lightly threaded boost clock-speeds). They are working with Microsoft to update the Windows thread scheduler so cache heavy workloads like gaming are first pushed to the V Cache CCD while compute heavy workloads like 3D rendering for multiple cores or Photoshop for mostly a single one get pushed to the "regular" CCD for higher clock-speeds.
It's actually absolutely freaking genius when you think about it, as additional L3 cache beyond what a single V Cache CCD already provides is likely to have significantly diminishing returns for gaming performance whereas using one of each not only makes the product cheaper to make but MUCH more well rounded in terms of performance! It's a win-win scenario if I've ever seen one.
>> "It's not speculation, AMD has officially confirmed that's the reason they did it... They are working with Microsoft..." Got proof? Got a link please?
Microsoft Windows, the utterly brilliant operating system that restarts when someone is in the middle of their work — to install an "update" that overwrites the software the end-user (the ostensible owner of the machine) prefers to have.
That Windows is still the dominant OS says a lot about humanity and it's not favorable.
So basically the only one worth getting is the 7800X3D as that's the only one that's fully cached. The others are misleading as only 1/2 the cores have the cache, and the cached CCD (which will be faster nearly all the time) has lower than advertised clocks.
The top image of the CPU would also seem to indicate this. (Unless only one CCD's V-cache die being exposed is for illustrative purposes only.)
It would also make sense for the use case. Gains from V-cache are mostly in gaming, and games usually don't use more than one CCD's worth of cores.
As long as that sort of asymmetric setup works (and Windows scheduler remains on top of things to prioritize the die with the V-cache), omitting an extra configuration with 32MB of V-cache would help to streamline production. It's doubtful such parts would be useful for EPYCs, so why create them especially for gaming?
That top image is identical to how 5800X3D was portrayed, and it's labeled 7800X3D. The two gray parts are just unusable structural silicon, not cache:
Windows scheduler would prefer work at first CCD when handle any 2CCD/CCX CPU anyway. It would just work as usual on Win10/11. 3D cache is mean to reduce latency and reduce access across infinity fabric. Dual X3D CCD probably not very useful on a client setting as you need at least two single latency sensitive application to run at two CCD.
The two 7950X CPU's I've used already have two unequal die. Only cores 0-7 can hit the max boost; cores 8-15 are on a weaker die and boost about 200MHz lower even when only one thread is pinned to them. When loading 32 threads, cores 8-15 consistently run 100MHz slower than cores 0-7. The Windows 11 scheduler seems to know this and prioritizes loading cores 0-7 before turning on cores 8-15. In fact, I rarely see Windows assign anything to cores 8-15 unless running some single piece of software that can fully load more than 8 cores.
Sadly, that won’t help here as it isn’t a case of a fast and a slow die. One die runs at max 5ghz but has 96mb l3, the other boosts to 5.7 but has only 32mb cache. So some apps run better on ccd with cache others with raw speed. And your OS has no clue so half your threads likely end up running on the wrong ccd. I am very curious to learn how AMD expects to optimize this.
I agree. This would be the first CPU I know where the core assignment is based upon the type of software being not just how many CPU cycles are needed. All the other hybrid core designs on desktops and even in the ARM world have a hierarchy of slow and fast cores where the fast cores complete any type of workload quicker than the slow cores.
From a gaming perspective, I think it makes sense to have one chiplet with the 3D V-cache, as the games that demand this tend to only use 1 core. Other programs may prefer the higher clock speeds
Another reason is the price. One 3d cache increase the price about $150 to $200… Putting two would increase price by $300 to $400 and not be usefull in gaming! The MSRP of 7900x is $550… So one 3d cache chip version will cost $700 to $750 while two 3c cache chipelts would make it $850 $900.. The previous could sell, the later, not so much,,,
This is a very nuanced thing to schedule for as the workloads are not obvious what core would be the better performer. Some will like the raw higher turbo clocks while others will enjoy the larger cache. On the flip side, I don't think we'll have any growing pains like when Alder Lake arrived as the raw core capabilities are the same with the extra L3 cache being the big difference.
What would be nice is a 16 core model with 192 MB of L3 though. Even at reduced clocks vs. the non-V-cache models, it'd be nice have a symmetrical option. 7970X3D perhaps?
Highly interesting. I suspect the better clocks vs 5800x3D come via 5nm improvements of the Cache, vs 7nm of the predecessors X3D cache. This is almost perfection, it will shred the competition, man I wish I could buy it right now. The 12 and 16 cores are dreams for content creators.
I think AMD put only half of effort here. First Zen 4 did not need such a quick refresh, it's barely old why would you need this, esp when Zen 4 literally screams through all the workloads across the entire x86 lineup esp the RPCS3 AVX512. Intel Raptor Lake i7 and i9 are faster in Clocks but also higher power harder to cool. And more over if you get an i9 13900K and then unlock it then only it will beat 7950X3D but also upshots the Power. Mid Range 7600X performs perfect for what it is but cannot compete on MT, due to Cinebench accelerator E cores in i5 13600K. Still the chip is potent.
So AMD should have launched the 7800X3D and waited for Raptor Lake Refresh and put out a full X3D dual CCD Cache chiplets instead of just one on the R9 SKUs 7900X3D, 7950X3D that would ensure Zen 4 X3D refresh being 100% proper. Bet they are saving the top end bins for Genoa-X and not giving much into the Mainstream platform. It's just a single stack that is missing for 2 R9 X3D SKUs damn AMD it's like you almost did but took a step back.
Also looking at the 300MHz Base clock cut on the R9 parts 12C and 16C along with TDP cut from 170W to 120W that is again conservative, why AMD is always this conservative. The Chips are fast they damn clock at 5.7-5.8GHz why do they need to be like this. Esp when the fact the upper boost range is maintained. I suspect these might not have tuning as well the TDP is limited, so it's just a small advantage that AMD is doing by giving those 16C32T users X3D for gaming advantage, but not a full bore unlocked power. They could have done 200W TDP and gave the dual stack without clock regression if they tried.
Same for their Radeon RDNA3, they put the slide deck with 3GHz capable design and hamfisted with just 2x 8Pin and also blocked OC tables.
All in all It's just an OK refresh. 5800X3D also competes extremely good it was able to fight with Alder Lake, Raptor Lake, Zen 4 all of them in Cache intense Gaming workloads. Zen 4 X3D looks like a mid jump of that. RPL Refresh will be setting the parity a little I think as Intel will boost the clocks by 200-300MHz base clock while AMD reduced, that's my guess. Also that gives Intel to fight Zen 5 with 14th gen in 2024 and a 2nd CPU upgrade 2025 on that socket while AMD's Zen 5 X3D might be last, that's however into 2025.
Okay I think maybe it's because of AM5 PPT, 230W, AMD does want to balance out them with TDP by adjusting the R9 top end with higher boost clocks on one of the CCD and reduce base clock boost. A trade off basically. So X3D Stack has some limitations definitely esp when we are dealing with damn 5GHz+ Boost and 4.7GHz Base clocks.
Look at Intel i9 they always run at lower Base Clocks since a long time not just Big Little but before that Comet Lake and so on.
A small refresh for those who care about games 7800X3D and for those who want both 7950X3D (not sacrificing the MT performance)
At some point, 3D cache should become a standard feature on all but the cheapest CPUs from AMD and Intel. Since more L3 or a larger L4 is almost always useful, and 2D SRAM can't scale much more.
I would wait for reviews before counting out the 7900X3D and 7950X3D asymmetric approach. This has been a long time coming, since 3D V-Cache was introduced using a 5900X.
... would there be a reason for this? Maybe some kind of inter-CCD link? If anything they should be right next to the IO die and farther away from each other.
Hopefully starting with zen5 we see vcache as a standard feature, not a separate lineup, and vcache on all chiplets. Maybe if some people really still want the weird hybrid setup with vcache on one chiplet, that could be a separate model.
I have completed my audit across the web that began here on January 4 when I found shing332 "ccd cross latency" observation of interest as well as "thread directed" ccd application optimization Cooe, brucethemoose, Tomwomack (is this SF sailor TW the 1980s ingot broker) and Espinosidro conferred.
haukionkannel stated a cost estimate and I will end with data supporting my R7K 3D cost : price determination.
At 5800X3D my observations limiting 3D to 5800X were cost and heat associated x2 ccd subject same package area and in relation Ryzen if Ryzen 3D the same special (under Fujitsu license KamW told me in 2017) package material as TR/Epyc subject heat transfer tolerance. I queried Hallock on a live stream specific my thoughts and was informed power management would address 5800X3D heat in package adding 64 MB SRAM $ and that seems the case on power management which Hallock pointed at the time. I will rule out heat as a two SRAM add concern.
Like many I did not see 64 MB SRAM add coming to one ccd on 79x0XT3D but understood the cost and validation impact of adding x2 xx MB SRAM including the cost of failing qualification.
I buy AMD's reasoning adding one 64 MB $ slice tightly coupled to one ccd for loading simulations while enabling alternate ccd for high frequency. Which presents a question. There is no hexa 7600X3D, maybe because on cache per core subject price it might cannibalize into 7800X3D sales having 33% more L3 per core as 3600/5600 demonstrate. Alternative thought is SRAM add on interconnect placement can only be added to an octa ccd? This would mean that 7900X3D is 8C+4C avoiding the cost of wasting one good hexa ccd? Next question moves into performance aspects of 8C+L3$ add + 4 C for frequency ccd?
The next find relates to the ccd's themselves and the cross latency and bus contention question. In Kevin Krewell, "RISC-V Summit 2022: All Your CPUs Belong to Us", EE Times, 1.4.22, the Trias analyst (former AMD) notes, "the V1 (Risc V) chiplet architecture is similar to AMD’s EPYC processors, but Ventana differs in some significant ways. The chiplet connection to the memory and I/O hub uses a very low latency interface called “Bunch of Wires” (BoW) developed by the Open Compute Project in the Open Domain-Specific Architecture (ODSA) sub-project. BoW is a parallel interconnect and does not use higher-latency SerDes connections like AMD’s Infinity Fabric to convert parallel interfaces to serial, introducing latency. Although the company is using BoW today, it does plan to use UCIe in the future.
The key is SerDes to get on and off Infinity fabric between each 79x0XT3D ccd is what caught my attention specific latency and bus contention. Specific Win 11 optimization I also bumped into a comment string noting Intel Management Engine performing some RL/AL hypervisor functionality?
Finally on my cost : price estimate, first consider the full line supply data as a proxy for production;
7950X 16C = 33.4% of WW channel supply 7900X 12C = 34.2% note on a total revenue basis 67.7% are x2 ccd components 7700X 8C = 14.6% 7600X 6C = 17.7%
Now I can normalize silicon area across the full line. I will not get into the precision of whether or not 7900X is x2 hexa or 8C + 4C specific cost, a full ccd is a full ccd. Or cost difference between 5 nm ccd and 6 nm i/o my presentation here is 'all up'.
The normalized area across full line production is 237.74 mm2 of die area which by the way is a historically relevant Intel desktop area sweet spot.
Now to determine cost and again I will make this simple, rather than estimating bottom up TSMC cost per mm2 or area subject 'design production' cost incorporating OpEx, I will rely on AMD cost : price / margin.
TSMC price to AMD is determined stakeholders / 3; TSMC, AMD, OEM. In this square / 3 component total value deal everyone splits the value for no arguing. On the production by grade SKU (that is actually the supply split proxy for production) R7K $1K average weighed price on original MSRP is $533.06 / 3 stakeholder is $177.69 TSMC price to AMD before AMD to OEM mark-up.
Now, how much cost add is 64 MB SRAM L3 $. Looking at the photo 64 MB slice is approximately 2/3rd of one 70 mm2 cdd or 46 mm2. On R7K normalized area design production cost is 237 mm2 / 177.69 = $0.75 cents per mm2 of component area all up. 64 MB SRAM slice adds $34.53 to the cost of producing one R7K 3D = $212.19. Note 64 MB 5 nm SRAM slice is basically $5 more than 5800X3D 64 MB 32 mm2. On that reference TSMC 5 nm price to AMD over 7 nm is + 16.6%. The next question is that 64 MB slice 5 nm and my answer is maybe not on captive charge endurance and interconnect pitch that I presume is silicon facing silicon . . . was it, IBM that invented that? What's it called?
So what will R7K 3D price be to the OEM on a full product line procurement mirroring percent by core grade coming out of production. To get to this price traditionally its total production volume / first tier customer split equals the minimum unit commitment. Here I will calculate at AMD's q3 2022 + 50% margin take but under gross margin pressure it might now be + 60%? AMD price to OEM is ($177.69 + $34.50 = $217.19 + AMD mark-up) range $318.28 to $339.50 and AMD as a profit maximizer the low volume price is x2 cost or $385.38 per unit.
Now what is the retail price and I provide that estimate for each SKU
7950X3D = $802.50 or $799 ($699 / 3 = $233 + 34.50 * 3 = $802.5 7900X3D = $652 or $649 7800X3D = $502 or $499 If a 7600X3D is possible = $402 or $399
Note price estimates do not incorporate any end seller price premium. Also, haukionkannel and my price estimate are roughly in the same ball park.
I have speculated R7K and RL are all Extremes and we'd see the top bin EE $999 price point but not officially and one reason is Intel continues to low ball Raptor price on in house manufacturing cost advantage goosing retail with up to 100% margin potential through the holidays where AMD at retail is approximately + 30%.
Noteworthy on channel data current desktop and mobile sales trend back to Coffee Lake finds Intel desktop and mobile selling in relation to AMD at 1.24 Intel to 1 AMD ratio and between Raptor + Alder and Raphael + Vermeer desktop only at a 1 to 1 ratio.
nandnandnand, R7K iGPU boost from big cache? Not sure for what . . .
"The iGPU on the AMD Ryzen 7000 CPUs will feature 2 Compute Units for a total of 128 stream processors. These cores will run at a base clock speed of 400 MHz and a graphics frequency of 2200 MHz which could be the peak frequency. Offering up to 0.563 TFLOPs of 563 GFLOPs of compute power, this will deliver slightly better performance than the Nintendo Switch which is rated at 500 GFLOPs" reports Hassan on 9.7.22
I'm still waiting for Intel to substantiate E cores are relied on applications compilation as a SIMD (parallel processing) array? Seems may be . . .
Hard to say without applications development guide. Not sure why they are not public anymore?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
48 Comments
Back to Article
nandnandnand - Wednesday, January 4, 2023 - link
"and another 64 MB sliced on top with TSVs (presumably placing 32MB on each CCD)"Presumably it's 64 MB on one chiplet and not the other.
Nacimota - Wednesday, January 4, 2023 - link
Yeah I was about to comment on this myself. 32 split across both CCDs doesn't match with the image at the top of the article and the more I think about it the less sense it would make anyway.nandnandnand - Wednesday, January 4, 2023 - link
Putting it another way, 1 core on the 7800X3D is capable of accessing up to 96 MB of L3 cache. Reducing that to 64 MB for the 7900X3D and 7950X3D would be a bizarre choice and sometimes perform worse.Instead, for some reason, you will get one CCD with V-Cache and one without. There's speculation that the one without V-Cache will be able to clock higher because of that.
Applications and games could have a preference for the CCD with higher clock speeds or the one with more L3 cache, or prefer to put different threads on each one. This is AMD's first pseudo-hybrid CPU.
shing3232 - Thursday, January 5, 2023 - link
adding two 3D cache probably wouldn't make sense cost per perf wise due to cross CCD latency unlike EYPC.jospoortvliet - Thursday, January 5, 2023 - link
So one die has triple the cache but lower clocks (I presume up to 5ghz, like the 7800X3D), the other higher clocks but lacking the extra cache. Sounds like an absolute scheduling nightmare - as each thread / application will perform differently, the kernel would have to somehow determine on which ccd the app or thread performs best and move it there. Don’t hold your breath for optimal support for this :/I guess the result will be that you can manually say “load this ccd first” (not helpful of course) and generally can expect hugely variable performance between runs. Essentially while it might bench well in games, for everything else all bets are off: it COULD do well depending on luck (aka match between performance needs of thread and ccd it is scheduled on).
I’m hoping I am wrong ;-)
Cooe - Thursday, January 5, 2023 - link
AMD has already said that they've done significant work with Microsoft to update the Windows thread scheduler to properly take full advantage of the Zen 4 X3D layout. Pushing cache heavy workloads to the V Cache CCD & compute/clock-speed heavy workloads to the other.And it won't be NEARLY as hard as you seem to think to do! Cache & memory access heavy processes are EASILY identifiable by the OS, making it trivial for it to move them to V Cache CCD if the OS' thread scheduler is actually coded to do so. Which Windows' will be.
brucethemoose - Thursday, January 5, 2023 - link
> Cache & memory access heavy processes are EASILY identifiable by the OSBut how can it? Unless the OS is intensely profiling them somehow, or just looking up executable names, I dont see how it can tell the difference between ffmpeg and a minecraft server maxing out a core.
Maybe they can make some assumptions, like "a process using the GPU probably should run on the cache die" like they already use for big/little cores and HT, but that will only get them so far.
Windows (and Linux DEs) will need a ProcessLasso-like utility soon with all the assynmety making its way into desktop CPUs.
TomWomack - Friday, January 6, 2023 - link
CPUs nowadays have fairly non-intrusive performance counters; the kernel can ask the CPU to count how many L2 and L3 cache misses a process has, and use that information for scheduling.Espinosidro - Saturday, January 7, 2023 - link
All memory accesses go through the OS? Plus the OS can also profile page faults, cache misses, etc.Storing and checking that data once a second or two is an easy task for the OS.
Cooe - Thursday, January 5, 2023 - link
It's not speculation, AMD has officially confirmed that's the reason they did it (for single & lightly threaded boost clock-speeds). They are working with Microsoft to update the Windows thread scheduler so cache heavy workloads like gaming are first pushed to the V Cache CCD while compute heavy workloads like 3D rendering for multiple cores or Photoshop for mostly a single one get pushed to the "regular" CCD for higher clock-speeds.It's actually absolutely freaking genius when you think about it, as additional L3 cache beyond what a single V Cache CCD already provides is likely to have significantly diminishing returns for gaming performance whereas using one of each not only makes the product cheaper to make but MUCH more well rounded in terms of performance! It's a win-win scenario if I've ever seen one.
AndrewJacksonZA - Friday, January 6, 2023 - link
>> "It's not speculation, AMD has officially confirmed that's the reason they did it... They are working with Microsoft..."Got proof? Got a link please?
Oxford Guy - Friday, January 6, 2023 - link
Microsoft Windows, the utterly brilliant operating system that restarts when someone is in the middle of their work — to install an "update" that overwrites the software the end-user (the ostensible owner of the machine) prefers to have.That Windows is still the dominant OS says a lot about humanity and it's not favorable.
Dribble - Thursday, January 5, 2023 - link
So basically the only one worth getting is the 7800X3D as that's the only one that's fully cached. The others are misleading as only 1/2 the cores have the cache, and the cached CCD (which will be faster nearly all the time) has lower than advertised clocks.Hul8 - Wednesday, January 4, 2023 - link
The top image of the CPU would also seem to indicate this. (Unless only one CCD's V-cache die being exposed is for illustrative purposes only.)It would also make sense for the use case. Gains from V-cache are mostly in gaming, and games usually don't use more than one CCD's worth of cores.
As long as that sort of asymmetric setup works (and Windows scheduler remains on top of things to prioritize the die with the V-cache), omitting an extra configuration with 32MB of V-cache would help to streamline production. It's doubtful such parts would be useful for EPYCs, so why create them especially for gaming?
nandnandnand - Wednesday, January 4, 2023 - link
That top image is identical to how 5800X3D was portrayed, and it's labeled 7800X3D. The two gray parts are just unusable structural silicon, not cache:https://cdn.arstechnica.net/wp-content/uploads/202...
nandnandnand - Wednesday, January 4, 2023 - link
I got heavily confused but you get the point.shing3232 - Thursday, January 5, 2023 - link
Windows scheduler would prefer work at first CCD when handle any 2CCD/CCX CPU anyway. It would just work as usual on Win10/11.3D cache is mean to reduce latency and reduce access across infinity fabric. Dual X3D CCD probably not very useful on a client setting as you need at least two single latency sensitive application to run at two CCD.
The Von Matrices - Thursday, January 5, 2023 - link
The two 7950X CPU's I've used already have two unequal die. Only cores 0-7 can hit the max boost; cores 8-15 are on a weaker die and boost about 200MHz lower even when only one thread is pinned to them. When loading 32 threads, cores 8-15 consistently run 100MHz slower than cores 0-7. The Windows 11 scheduler seems to know this and prioritizes loading cores 0-7 before turning on cores 8-15. In fact, I rarely see Windows assign anything to cores 8-15 unless running some single piece of software that can fully load more than 8 cores.jospoortvliet - Thursday, January 5, 2023 - link
Sadly, that won’t help here as it isn’t a case of a fast and a slow die. One die runs at max 5ghz but has 96mb l3, the other boosts to 5.7 but has only 32mb cache. So some apps run better on ccd with cache others with raw speed. And your OS has no clue so half your threads likely end up running on the wrong ccd. I am very curious to learn how AMD expects to optimize this.The Von Matrices - Friday, January 6, 2023 - link
I agree. This would be the first CPU I know where the core assignment is based upon the type of software being not just how many CPU cycles are needed. All the other hybrid core designs on desktops and even in the ARM world have a hierarchy of slow and fast cores where the fast cores complete any type of workload quicker than the slow cores.Ryan Smith - Wednesday, January 4, 2023 - link
You are correct. Thanks!meacupla - Wednesday, January 4, 2023 - link
From a gaming perspective, I think it makes sense to have one chiplet with the 3D V-cache, as the games that demand this tend to only use 1 core. Other programs may prefer the higher clock speedsnandnandnand - Wednesday, January 4, 2023 - link
That seems to be the idea. Now we have to see how well Windows and Linux handle it.haukionkannel - Thursday, January 5, 2023 - link
Another reason is the price. One 3d cache increase the price about $150 to $200… Putting two would increase price by $300 to $400 and not be usefull in gaming!The MSRP of 7900x is $550… So one 3d cache chip version will cost $700 to $750 while two 3c cache chipelts would make it $850 $900.. The previous could sell, the later, not so much,,,
futurepastnow - Thursday, January 5, 2023 - link
Probably the VCache can't boost above 5GHz which is why the 7800X3D is limited to that. It's asymmetric.Kevin G - Thursday, January 5, 2023 - link
This is a very nuanced thing to schedule for as the workloads are not obvious what core would be the better performer. Some will like the raw higher turbo clocks while others will enjoy the larger cache. On the flip side, I don't think we'll have any growing pains like when Alder Lake arrived as the raw core capabilities are the same with the extra L3 cache being the big difference.What would be nice is a 16 core model with 192 MB of L3 though. Even at reduced clocks vs. the non-V-cache models, it'd be nice have a symmetrical option. 7970X3D perhaps?
Khanan - Thursday, January 5, 2023 - link
Highly interesting. I suspect the better clocks vs 5800x3D come via 5nm improvements of the Cache, vs 7nm of the predecessors X3D cache. This is almost perfection, it will shred the competition, man I wish I could buy it right now. The 12 and 16 cores are dreams for content creators.Silver5urfer - Thursday, January 5, 2023 - link
Finally AMD revealed the Zen 4 X3D lineup.I think AMD put only half of effort here. First Zen 4 did not need such a quick refresh, it's barely old why would you need this, esp when Zen 4 literally screams through all the workloads across the entire x86 lineup esp the RPCS3 AVX512. Intel Raptor Lake i7 and i9 are faster in Clocks but also higher power harder to cool. And more over if you get an i9 13900K and then unlock it then only it will beat 7950X3D but also upshots the Power. Mid Range 7600X performs perfect for what it is but cannot compete on MT, due to Cinebench accelerator E cores in i5 13600K. Still the chip is potent.
So AMD should have launched the 7800X3D and waited for Raptor Lake Refresh and put out a full X3D dual CCD Cache chiplets instead of just one on the R9 SKUs 7900X3D, 7950X3D that would ensure Zen 4 X3D refresh being 100% proper. Bet they are saving the top end bins for Genoa-X and not giving much into the Mainstream platform. It's just a single stack that is missing for 2 R9 X3D SKUs damn AMD it's like you almost did but took a step back.
Also looking at the 300MHz Base clock cut on the R9 parts 12C and 16C along with TDP cut from 170W to 120W that is again conservative, why AMD is always this conservative. The Chips are fast they damn clock at 5.7-5.8GHz why do they need to be like this. Esp when the fact the upper boost range is maintained. I suspect these might not have tuning as well the TDP is limited, so it's just a small advantage that AMD is doing by giving those 16C32T users X3D for gaming advantage, but not a full bore unlocked power. They could have done 200W TDP and gave the dual stack without clock regression if they tried.
Same for their Radeon RDNA3, they put the slide deck with 3GHz capable design and hamfisted with just 2x 8Pin and also blocked OC tables.
All in all It's just an OK refresh. 5800X3D also competes extremely good it was able to fight with Alder Lake, Raptor Lake, Zen 4 all of them in Cache intense Gaming workloads. Zen 4 X3D looks like a mid jump of that. RPL Refresh will be setting the parity a little I think as Intel will boost the clocks by 200-300MHz base clock while AMD reduced, that's my guess. Also that gives Intel to fight Zen 5 with 14th gen in 2024 and a 2nd CPU upgrade 2025 on that socket while AMD's Zen 5 X3D might be last, that's however into 2025.
Silver5urfer - Thursday, January 5, 2023 - link
Okay I think maybe it's because of AM5 PPT, 230W, AMD does want to balance out them with TDP by adjusting the R9 top end with higher boost clocks on one of the CCD and reduce base clock boost. A trade off basically. So X3D Stack has some limitations definitely esp when we are dealing with damn 5GHz+ Boost and 4.7GHz Base clocks.Look at Intel i9 they always run at lower Base Clocks since a long time not just Big Little but before that Comet Lake and so on.
A small refresh for those who care about games 7800X3D and for those who want both 7950X3D (not sacrificing the MT performance)
Threska - Thursday, January 5, 2023 - link
Considering zen 4 sales maybe they did.https://www.digitaltrends.com/computing/amd-zen-4-...
Bruzzone - Saturday, January 7, 2023 - link
Intel offers retail up to 100% over CPU volume buy cost and AMD is at + 30%. No wonder. mbnandnandnand - Thursday, January 5, 2023 - link
At some point, 3D cache should become a standard feature on all but the cheapest CPUs from AMD and Intel. Since more L3 or a larger L4 is almost always useful, and 2D SRAM can't scale much more.I would wait for reviews before counting out the 7900X3D and 7950X3D asymmetric approach. This has been a long time coming, since 3D V-Cache was introduced using a 5900X.
haukionkannel - Sunday, January 8, 2023 - link
Too expensive for that. Better just increase normal cache little by little.nandnandnand - Wednesday, January 11, 2023 - link
It's not all that expensive, and these CPUs have high MSRPs anyway. Like the nothing special 7700X debuting at an MSRP of $400.msroadkill612 - Thursday, February 2, 2023 - link
I hear you, but we dont see the big picture Lisa does.After 2 dreadful quarters & a grim looking roadmap, an Intel on the ropes is no longer unthinkable.
The one token bright spot, is that tho alder lake is selling badly, its selling better than am5, & it is a halo product.
To not only unleash the crushing perf? of 3D, but at bargain prices, would be a disproportionate blow to morale & mindshare.
brucethemoose - Thursday, January 5, 2023 - link
Those 2 CCDs are awfully close to each other.... would there be a reason for this? Maybe some kind of inter-CCD link? If anything they should be right next to the IO die and farther away from each other.
quaz0r - Thursday, January 5, 2023 - link
Hopefully starting with zen5 we see vcache as a standard feature, not a separate lineup, and vcache on all chiplets. Maybe if some people really still want the weird hybrid setup with vcache on one chiplet, that could be a separate model.Wereweeb - Thursday, January 5, 2023 - link
Yeah just make a stupendously wide CPU with 1GB of cache fuck performance/$ and performance/W metricsnandnandnand - Friday, January 6, 2023 - link
Put 8 GB of L4 on the chip.Makaveli - Friday, January 6, 2023 - link
lol alot of cpu engineer's in the comment section.quaz0r - Friday, January 6, 2023 - link
My good sir, your myopia is unparalleled.evilspoons - Friday, January 6, 2023 - link
"Champing" at the bit, not "chomping". Heh.Oxford Guy - Saturday, January 14, 2023 - link
Quite.Bruzzone - Saturday, January 7, 2023 - link
I have completed my audit across the web that began here on January 4 when I found shing332 "ccd cross latency" observation of interest as well as "thread directed" ccd application optimization Cooe, brucethemoose, Tomwomack (is this SF sailor TW the 1980s ingot broker) and Espinosidro conferred.haukionkannel stated a cost estimate and I will end with data supporting my R7K 3D cost : price determination.
At 5800X3D my observations limiting 3D to 5800X were cost and heat associated x2 ccd subject same package area and in relation Ryzen if Ryzen 3D the same special (under Fujitsu license KamW told me in 2017) package material as TR/Epyc subject heat transfer tolerance. I queried Hallock on a live stream specific my thoughts and was informed power management would address 5800X3D heat in package adding 64 MB SRAM $ and that seems the case on power management which Hallock pointed at the time. I will rule out heat as a two SRAM add concern.
Like many I did not see 64 MB SRAM add coming to one ccd on 79x0XT3D but understood the cost and validation impact of adding x2 xx MB SRAM including the cost of failing qualification.
I buy AMD's reasoning adding one 64 MB $ slice tightly coupled to one ccd for loading simulations while enabling alternate ccd for high frequency. Which presents a question. There is no hexa 7600X3D, maybe because on cache per core subject price it might cannibalize into 7800X3D sales having 33% more L3 per core as 3600/5600 demonstrate. Alternative thought is SRAM add on interconnect placement can only be added to an octa ccd? This would mean that 7900X3D is 8C+4C avoiding the cost of wasting one good hexa ccd? Next question moves into performance aspects of 8C+L3$ add + 4 C for frequency ccd?
The next find relates to the ccd's themselves and the cross latency and bus contention question. In Kevin Krewell, "RISC-V Summit 2022: All Your CPUs Belong to Us", EE Times, 1.4.22, the Trias analyst (former AMD) notes, "the V1 (Risc V) chiplet architecture is similar to AMD’s EPYC processors, but Ventana differs in some significant ways. The chiplet connection to the memory and I/O hub uses a very low latency interface called “Bunch of Wires” (BoW) developed by the Open Compute Project in the Open Domain-Specific Architecture (ODSA) sub-project. BoW is a parallel interconnect and does not use higher-latency SerDes connections like AMD’s Infinity Fabric to convert parallel interfaces to serial, introducing latency. Although the company is using BoW today, it does plan to use UCIe in the future.
The key is SerDes to get on and off Infinity fabric between each 79x0XT3D ccd is what caught my attention specific latency and bus contention. Specific Win 11 optimization I also bumped into a comment string noting Intel Management Engine performing some RL/AL hypervisor functionality?
Finally on my cost : price estimate, first consider the full line supply data as a proxy for production;
7950X 16C = 33.4% of WW channel supply
7900X 12C = 34.2% note on a total revenue basis 67.7% are x2 ccd components
7700X 8C = 14.6%
7600X 6C = 17.7%
Now I can normalize silicon area across the full line. I will not get into the precision of whether or not 7900X is x2 hexa or 8C + 4C specific cost, a full ccd is a full ccd. Or cost difference between 5 nm ccd and 6 nm i/o my presentation here is 'all up'.
79x0_ = 261 mm2 (70*2 cd +121 i/o)
77/7600_ = 191 mm2 (70*1 + 121)
The normalized area across full line production is 237.74 mm2 of die area which by the way is a historically relevant Intel desktop area sweet spot.
Now to determine cost and again I will make this simple, rather than estimating bottom up TSMC cost per mm2 or area subject 'design production' cost incorporating OpEx, I will rely on AMD cost : price / margin.
TSMC price to AMD is determined stakeholders / 3; TSMC, AMD, OEM. In this square / 3 component total value deal everyone splits the value for no arguing. On the production by grade SKU (that is actually the supply split proxy for production) R7K $1K average weighed price on original MSRP is $533.06 / 3 stakeholder is $177.69 TSMC price to AMD before AMD to OEM mark-up.
Now, how much cost add is 64 MB SRAM L3 $. Looking at the photo 64 MB slice is approximately 2/3rd of one 70 mm2 cdd or 46 mm2. On R7K normalized area design production cost is 237 mm2 / 177.69 = $0.75 cents per mm2 of component area all up. 64 MB SRAM slice adds $34.53 to the cost of producing one R7K 3D = $212.19. Note 64 MB 5 nm SRAM slice is basically $5 more than 5800X3D 64 MB 32 mm2. On that reference TSMC 5 nm price to AMD over 7 nm is + 16.6%. The next question is that 64 MB slice 5 nm and my answer is maybe not on captive charge endurance and interconnect pitch that I presume is silicon facing silicon . . . was it, IBM that invented that? What's it called?
So what will R7K 3D price be to the OEM on a full product line procurement mirroring percent by core grade coming out of production. To get to this price traditionally its total production volume / first tier customer split equals the minimum unit commitment. Here I will calculate at AMD's q3 2022 + 50% margin take but under gross margin pressure it might now be + 60%? AMD price to OEM is ($177.69 + $34.50 = $217.19 + AMD mark-up) range $318.28 to $339.50 and AMD as a profit maximizer the low volume price is x2 cost or $385.38 per unit.
Now what is the retail price and I provide that estimate for each SKU
7950X3D = $802.50 or $799 ($699 / 3 = $233 + 34.50 * 3 = $802.5
7900X3D = $652 or $649
7800X3D = $502 or $499
If a 7600X3D is possible = $402 or $399
Note price estimates do not incorporate any end seller price premium. Also, haukionkannel and my price estimate are roughly in the same ball park.
I have speculated R7K and RL are all Extremes and we'd see the top bin EE $999 price point but not officially and one reason is Intel continues to low ball Raptor price on in house manufacturing cost advantage goosing retail with up to 100% margin potential through the holidays where AMD at retail is approximately + 30%.
Noteworthy on channel data current desktop and mobile sales trend back to Coffee Lake finds Intel desktop and mobile selling in relation to AMD at 1.24 Intel to 1 AMD ratio and between Raptor + Alder and Raphael + Vermeer desktop only at a 1 to 1 ratio.
Mike Bruzzone, Camp Marketing
Bruzzone - Saturday, January 7, 2023 - link
Whoops transposition (if Q comes by) correction $177.69 / 237 mm2 = $0.75. mbnandnandnand - Saturday, January 7, 2023 - link
Interesting idea about 8+4 7900X3D. We'll find out in February.I just realized the 7000X3D CPUs have the iGPU. It would be funny to see a performance boost in integrated graphics performance from the big cache.
Bruzzone - Monday, January 9, 2023 - link
nandnandnand, R7K iGPU boost from big cache? Not sure for what . . ."The iGPU on the AMD Ryzen 7000 CPUs will feature 2 Compute Units for a total of 128 stream processors. These cores will run at a base clock speed of 400 MHz and a graphics frequency of 2200 MHz which could be the peak frequency. Offering up to 0.563 TFLOPs of 563 GFLOPs of compute power, this will deliver slightly better performance than the Nintendo Switch which is rated at 500 GFLOPs" reports Hassan on 9.7.22
I'm still waiting for Intel to substantiate E cores are relied on applications compilation as a SIMD (parallel processing) array? Seems may be . . .
Hard to say without applications development guide. Not sure why they are not public anymore?
mb
lopri - Tuesday, January 17, 2023 - link
iGPU is in the I/O die and V-cache is with a chiplet, so iGPU boost is not likely.