That seems consistent with favoring density over performance. An IF link per CCX is going to have more fabric overhead than 1 IF link per 2 CCX but will potentially cost some performance if the IF link is a bottleneck for your workload with that many cores sharing.
The existing EPYC IO die only supports up to 12 IF links. The only way to add more would've been to make a new IO die which would've increased fab cost and added extra design complexity. I wouldn't be surprised if AMD would make an IO die with more IF links for Zen 5 EPYC CPUs if demand for these CPUs is high enough. Even if they don't use all of the IF links for the standard EPYC CPUs, using the same IO die for both would help keep fab and design costs in check.
They have to advance I/O die for several reasons. Next gen Bergamo cannot add another two 16 core chiplets on current package. If they plan Turin dense with 192 c cores, new package and I/O die are needed on the same 6096 socket.
Nothing unusual. 2x4 CCX was already used in early Ryzen chiplets. This is an iterative step-up. Zen5c will have unified 16 core CCX. They move by gradually perfecting the logic.
Is there any information out there about a unified 16-core CCX or is that speculation? There might be scaling issues that make 2x8 the way to go for multiple generations.
i feel like a unified 16 will have serious routing problems or latency issues similar to the groups of E cores on intel chips. intel and amd have stuck with 8 on P cores or meshes to deal with the mess of routing, so if you want 16 cores AND fast core to core communication you have to get creative in 2.5 or 3D space, which I doubt will happen by next generation considering the struggles even with 2.5 designs.
Contrary to the Intel x86 E cores these have SMT also enabled too, they so day its the same IPC at Clock rate but perhaps these are lower clocked thus higher density than Zen 4.
I really hope this does not make it to the AM5 socket as some sort of hybrid abomination like Intel and ARM BS. Plus if AMD wants to create a hybrid solution on a Chiplet based design they have a lot of hoops 16Cores 32Threads per CCD and an IODie able to handle 4c and 4 on top of the whole MCM design will be a massive PITA. Plus AMD never said AM5 / Desktop Ryzen will be getting the c variant, and moreover the name signifies as Cloud for Zen 4c. Just imagine the complexity since the X3D 7900X3D and 7950X3D rely on OS scheduler this complication will make matters worse, so that's the best news I can think of - No Hybrid nonsensical BS on the AM5 socket. They'll dump this onto maybe Next Console revision as a subset of cores for the OS and Base subsystems over the main performance cores since Consoles are BGA and monolithic. Same for the BGA laptop and other disposable parts. Plus unlike Intel these will also have AVX512 execution blocks too.
All in all very impressive to be able to cut the core down and get same IPC. Super high density too on top unlike ARM with no SMT.
I wish to see the day when AMD releases 4-way SMT on their Zen x86 design processors it would be mind blowing for sure.
I don't understand your point. AMD has (theoretically) the same CPU tailored for high clock speed or tailored for high density.
Meaning in theory if you underclocked a Genoa Zen 4 (to save power in a laptop, for example) you would see the same (absent cache differences) performance as a Zen 4c part at the same clock.
The difference is the size: you can fit 3 Zen 4c cores in the same area as 2 Zen cores.
There's a tipping point where the heat generated by Zen 4 cores forces the complex to underclock itself (throttling due to excess heat) and you can see it very clearly: https://www.anandtech.com/show/18763/amd-announces...
The more cores in a CPU, the lower the base clock. At the 360W TDP they offer both a 96c and 64c part, and there's a 700MHz drop in base clock in adding 32 more cores. There's another 350MHz drop between 64 and 48, which I would believe is due to the binning process.
At 290/280W they offer 48c at 2.75GHz or 32c at 3.25GHz
But the Bergamo parts can fit 128 cores at 2.25GHz in the same 360W envelope as the 9654 with only 96 cores. A hybrid design might see 48 high performance cores running at 3GHz and 64 high efficiency cores running at 2.25GHz
You can imagine different mixes and matches is possible (assuming AMD did the work to engineer the possibility). At worst case when throttled by heat all the CPU cores will be more or less the same due to clock speed limits. At best you get more cores than is possible using straight Zen 4 and you get more performance than is possible using straight Zen 4c due to clock speed and cache differences.
On HPC side you aint seeing hybrid designs, that will cause a gigantic mess on the VMWare and other similar workloads. It will all be homogeneous not even ARM has such type of processors. Ofc heat is an issue but you can check the AMD website on how base clocks are for Genoa and Bergamo there's a clock rate cut for Bergamo.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
14 Comments
Back to Article
Slash3 - Tuesday, June 13, 2023 - link
These also revert to a dual-CCX (per CCD) design, which is an odd choice. Two eight core CCXes per chiplet.kpb321 - Tuesday, June 13, 2023 - link
That seems consistent with favoring density over performance. An IF link per CCX is going to have more fabric overhead than 1 IF link per 2 CCX but will potentially cost some performance if the IF link is a bottleneck for your workload with that many cores sharing.trevdawg94 - Tuesday, June 13, 2023 - link
The existing EPYC IO die only supports up to 12 IF links. The only way to add more would've been to make a new IO die which would've increased fab cost and added extra design complexity. I wouldn't be surprised if AMD would make an IO die with more IF links for Zen 5 EPYC CPUs if demand for these CPUs is high enough. Even if they don't use all of the IF links for the standard EPYC CPUs, using the same IO die for both would help keep fab and design costs in check.SanX - Tuesday, June 13, 2023 - link
What fab and design cost are specifically?TekCheck - Tuesday, June 13, 2023 - link
They have to advance I/O die for several reasons. Next gen Bergamo cannot add another two 16 core chiplets on current package. If they plan Turin dense with 192 c cores, new package and I/O die are needed on the same 6096 socket.TekCheck - Tuesday, June 13, 2023 - link
Nothing unusual. 2x4 CCX was already used in early Ryzen chiplets. This is an iterative step-up. Zen5c will have unified 16 core CCX. They move by gradually perfecting the logic.nandnandnand - Tuesday, June 13, 2023 - link
Is there any information out there about a unified 16-core CCX or is that speculation? There might be scaling issues that make 2x8 the way to go for multiple generations.whatthe123 - Wednesday, June 14, 2023 - link
i feel like a unified 16 will have serious routing problems or latency issues similar to the groups of E cores on intel chips. intel and amd have stuck with 8 on P cores or meshes to deal with the mess of routing, so if you want 16 cores AND fast core to core communication you have to get creative in 2.5 or 3D space, which I doubt will happen by next generation considering the struggles even with 2.5 designs.cbm80 - Tuesday, June 13, 2023 - link
The 8B lower transistor count is accounted for by the 128MB less L3 cache.Silver5urfer - Wednesday, June 14, 2023 - link
Contrary to the Intel x86 E cores these have SMT also enabled too, they so day its the same IPC at Clock rate but perhaps these are lower clocked thus higher density than Zen 4.I really hope this does not make it to the AM5 socket as some sort of hybrid abomination like Intel and ARM BS. Plus if AMD wants to create a hybrid solution on a Chiplet based design they have a lot of hoops 16Cores 32Threads per CCD and an IODie able to handle 4c and 4 on top of the whole MCM design will be a massive PITA. Plus AMD never said AM5 / Desktop Ryzen will be getting the c variant, and moreover the name signifies as Cloud for Zen 4c. Just imagine the complexity since the X3D 7900X3D and 7950X3D rely on OS scheduler this complication will make matters worse, so that's the best news I can think of - No Hybrid nonsensical BS on the AM5 socket. They'll dump this onto maybe Next Console revision as a subset of cores for the OS and Base subsystems over the main performance cores since Consoles are BGA and monolithic. Same for the BGA laptop and other disposable parts. Plus unlike Intel these will also have AVX512 execution blocks too.
All in all very impressive to be able to cut the core down and get same IPC. Super high density too on top unlike ARM with no SMT.
I wish to see the day when AMD releases 4-way SMT on their Zen x86 design processors it would be mind blowing for sure.
michael2k - Wednesday, June 14, 2023 - link
I don't understand your point. AMD has (theoretically) the same CPU tailored for high clock speed or tailored for high density.Meaning in theory if you underclocked a Genoa Zen 4 (to save power in a laptop, for example) you would see the same (absent cache differences) performance as a Zen 4c part at the same clock.
The difference is the size: you can fit 3 Zen 4c cores in the same area as 2 Zen cores.
There's a tipping point where the heat generated by Zen 4 cores forces the complex to underclock itself (throttling due to excess heat) and you can see it very clearly:
https://www.anandtech.com/show/18763/amd-announces...
The more cores in a CPU, the lower the base clock. At the 360W TDP they offer both a 96c and 64c part, and there's a 700MHz drop in base clock in adding 32 more cores. There's another 350MHz drop between 64 and 48, which I would believe is due to the binning process.
At 290/280W they offer 48c at 2.75GHz or 32c at 3.25GHz
But the Bergamo parts can fit 128 cores at 2.25GHz in the same 360W envelope as the 9654 with only 96 cores. A hybrid design might see 48 high performance cores running at 3GHz and 64 high efficiency cores running at 2.25GHz
You can imagine different mixes and matches is possible (assuming AMD did the work to engineer the possibility). At worst case when throttled by heat all the CPU cores will be more or less the same due to clock speed limits. At best you get more cores than is possible using straight Zen 4 and you get more performance than is possible using straight Zen 4c due to clock speed and cache differences.
Silver5urfer - Friday, June 16, 2023 - link
On HPC side you aint seeing hybrid designs, that will cause a gigantic mess on the VMWare and other similar workloads. It will all be homogeneous not even ARM has such type of processors. Ofc heat is an issue but you can check the AMD website on how base clocks are for Genoa and Bergamo there's a clock rate cut for Bergamo.[email protected] - Friday, June 16, 2023 - link
Nice increase in cores/efficiency. But what is the effect of L3 caching when the cores/threads increase proportional more than the cache?supdawgwtfd - Tuesday, June 27, 2023 - link
Depends on workload as said in the article.