AMD is holding back here a bit. If the IO die is shared across Turin models (which was the same between Genoa and Bergamo), there are 16 Infinity Fabric links for Turin's IO die. In the picture there are only 12 processor chiplets with 16 cores each to get to 192 cores. However, AMD's 128 core part is built around 16 chiplets with 8 cores each. There is physical room to fit four additional dies and the Infinity Fabric links are there for a potential 256 core part.
Speaking of the IO die, this one is huge. Off hand it appears to be even larger than the one used in Genoa/Bergamo. Only surprise here is that AMD hasn't moved to 136 PCIe 5.0 lanes instead of the previous 128 PCIe 5.0 + 8 lanes of PCIe 3.0. My presumption is that the lanes meant for IPMI and service tasks would be getting a bandwidth boost. Likely a bit too early for disclosure, but what levels and version of CXL is supported is also expected to be greater than Genoa/Bergamo.
I doubt that is the main reason. Each Zen4 chiplet can use 2 IF links, so the 12 chiplet models actually lose half their interface bandwidth compared to the 8 chiplet models. How much performance loss is debatable but the main reason for 16 links was always to have dual links for each chiplet when possible, not to have 16 chiplets
True but with Zen 4 Genoa, running dual Infinity Fabric links was only possible with the Epyc IO die with 4 chiplets or less due to how the links are partitioned internally on the IO die. There are 12 links available arranged in sets of three tied to a six by 32 bit DDR5 memory controller in a chip's quadrants. Oddball rule but you can't have a CCD split across two of the internal quadrants However to my knowledge, AMD never leverage dual links to Zen 4 dies in any shipping product: it was more cost effective to do design/test/validate/mass manufacture only handful of interposers and selectively populate dies on them. For example, a 48 core part would have four pads empty where two more dies could be added to make it a 64 core part. Essentially 8 chiplets and below + IO die got one interposer design while the 96 core part and Bergamo got their own specialized interposers.
AMD has done something similar with AM5 Zen 4 parts as there is a spot on the interposer where a second die could go on the 8 core and less models. It has been my impression that AMD's consumer IO die for AM5 also supports running dual link to a single CCD but they haven't leveraged it in shipping products for identical testing/economies of scale reasoning.
This new IO die increases link count so running up to 8 chiplets with dual Infinity Fabric links is on the table. It'll be interesting to see if AMD flexes their options more with Zen 5 as the focus has been on latency and bandwidth.
"In processor models with four CPU dies, two connections can be used to optimize bandwidth to each CPU die. This is the case for some EPYC 9004 Series CPUs and all EPYC 8004 Series CPUs"
I wouldn't presume that the Zen 5 (non-C) version is built using TSMC N4. It is probable, yes, but on the mobile side "Strix Point" is a combination of Zen 5/Zen 5C and using TSMC N4. That means AMD already took the time to design Zen 5C around both TSMC N4 and N3 nodes, releasing in the same year even. It wouldn't be a stretch to say they did the same for Zen 5, especially being that EPYC is much more profitable than Ryzen.
So as far as i understand Zen C cores are developed from mobile, so if next generation APU is being designed on 3nm it would make sense for zen 5c to have both 4nm and 3nm where as big Zen 5 is only on 4nm.
How much were loosing previous gen Bergamo chips vs regular Genoa on broad number of real life tests like the ones in The Phoronix Test Suite developed by Michael Larabel et al?
Passmark CPU tests claims that previous to Turin Bergamo with 128 efficient cores was equivalent to ~54 performance cores of Genoa. Current 128c Turin probably faster and could be like 64c Genoa i guess, and 192c one will be in average like 96c Genoa. Surprise me AMD if i am wrong
128c Bergamo has a faster Spec INT than 96c Genoa but lower Spec FP. Depending on workload there will be minimal performance difference between Bergamo and Genoa outside of clock speed.
That lower one and higher another is already well summarized into final scores of Bergamo vs Genoa in the PassMark High End numbers : 94 vs 122. It is expected that 192c Turin hopefully will add some performance and both will be approximately in par. What we have at the end? The 3nm 360W TDP 192c Bergamo vs 3 years old 5nm 360W TDP 96c Genoa having no difference. Where is progress here ? It looks more like a fake nanometers flop.
Intel deserves it. Intel was probably first who started funding VUV lithography, and that happened around 2000-2001 as i participated in these talks and research, but came last who with kicks from others finally began using it. To me what we observe with Intel looks like totally unbelievable, impossible, almost a surr.
If memory serves, the industry started its EUV efforts in the late 1990s (1997/1998ish). That included companies like Zeiss. E. g. they identified two potential wavelengths and key technologies for EUV litho. It is amazing to think that EUV has been over three decades into the making.
EUV is part of broader spectral area called X-ray lithography. People meant by it spectrum starting from vacuum ultraviolet and shorter untill the gamma radiation, approximately from 10 eV to 10 keV. First publications mentioning it dated at least by beginning of 1980th.
I realize the individual pieces and ideas are even older. What I meant is that the industry started to align their efforts in the late 1990s. E. g. focussing on two wavelengths allowed the industry to develop mirrors, resists and so forth independently from one another.
There are other ideas that popped up, too, but weren’t picked up (plasmon-polariton-mediated near-field exposure comes to mind).
Would like to be wrong but for performance core 3nm Turin this sounds like a pipe dream even with 192c variant. Though in good old days of not that fake nanometers and planar transistors the transition from 5nm to 3nm (the largest jump actually of the order of area ratios (5/3)^2 = 2.8 ) would get us even probably 300 cores .
Biggest data point was 33% Datacenter marketshare by AMD now was mentioned at Computex by Lisa Su. That means ARM and AMD will pillage everything stronghold that Intel had for decades, it's eroding very fast. Add the Supercomputer Aurora failures due to Pointe Vecchio and SPR XEONs Intel is really in a sad state. Deserves them right for selling their soul to pathetic investors and the leadership also CA state political landscape.
That said this is Turin Zen 5C, I think I was shocked seeing how can AMD put 16C CCDs, I remember Ian mentioning AMDs limit of 8C CCDs, yea but still impressive anyways over Bergamo. I hope Zen continues to innovate.
I believe AMD's hard time is to get TMSC allocation and maintain the cost, being the sole leader TSMC will charge a ton for every client and AMD has to manage all of that so they balance out the Client by giving it N4 and the Data Center N3.
Intel's short comings in server haven't their lack of meeting their target performance expectation but rather hitting their release dates. Intel's target for Sapphire Rapids was to be released around the time frame as Milan-X and it would've been far more competitive. Ice Lake-SP was also more than a year late and be more of a Zen 2 competitor. Intel had to get its 10 nm production in line for Ice Lake. Even prior to this, Intel has fighting their own design errors with Sky Lake needing a bug fix in Cascade Lake for Optane support and the infamous TSX issues in Haswell-EP/EX. Intel simply hasn't been executing on their road maps like they should have in server for a very, very long time.
Ponte Vecchio was stupidly ambitious with as many of chiplets that it used in its packaging. Had it launched on time as they initially said it would, it'd have been a major competitor to nVidia at the time. While ultimately it didn't live up to expectations upon release, Ponte Vecchio did try to do something different and that is what Intel has been lacking for a very, very long time.
If I recall, the 8 CCD limit was for Zen 3 lineage. The IO die for Zen 4 Epyc does support 12 CCD which you can witness on the many images for it without the heat spreader attached.
The CEOs that f'd Intel are long gone. If you only look in the rearview mirror you'll miss the next turn. This industry has a long history of today's losers becoming tomorrow's winners. Intel has been on both sides of that more than once.
Worst part of multicore race is that motherboard manufacturers do not offer easy integration of multiple motherboards into parallel system with number of cores you want. Only dual socket motherboards exist and that is your hard like a concrete wall limit.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
29 Comments
Back to Article
Kevin G - Monday, June 3, 2024 - link
AMD is holding back here a bit. If the IO die is shared across Turin models (which was the same between Genoa and Bergamo), there are 16 Infinity Fabric links for Turin's IO die. In the picture there are only 12 processor chiplets with 16 cores each to get to 192 cores. However, AMD's 128 core part is built around 16 chiplets with 8 cores each. There is physical room to fit four additional dies and the Infinity Fabric links are there for a potential 256 core part.Speaking of the IO die, this one is huge. Off hand it appears to be even larger than the one used in Genoa/Bergamo. Only surprise here is that AMD hasn't moved to 136 PCIe 5.0 lanes instead of the previous 128 PCIe 5.0 + 8 lanes of PCIe 3.0. My presumption is that the lanes meant for IPMI and service tasks would be getting a bandwidth boost. Likely a bit too early for disclosure, but what levels and version of CXL is supported is also expected to be greater than Genoa/Bergamo.
The Von Matrices - Monday, June 3, 2024 - link
I doubt that is the main reason. Each Zen4 chiplet can use 2 IF links, so the 12 chiplet models actually lose half their interface bandwidth compared to the 8 chiplet models. How much performance loss is debatable but the main reason for 16 links was always to have dual links for each chiplet when possible, not to have 16 chipletsKevin G - Tuesday, June 4, 2024 - link
True but with Zen 4 Genoa, running dual Infinity Fabric links was only possible with the Epyc IO die with 4 chiplets or less due to how the links are partitioned internally on the IO die. There are 12 links available arranged in sets of three tied to a six by 32 bit DDR5 memory controller in a chip's quadrants. Oddball rule but you can't have a CCD split across two of the internal quadrants However to my knowledge, AMD never leverage dual links to Zen 4 dies in any shipping product: it was more cost effective to do design/test/validate/mass manufacture only handful of interposers and selectively populate dies on them. For example, a 48 core part would have four pads empty where two more dies could be added to make it a 64 core part. Essentially 8 chiplets and below + IO die got one interposer design while the 96 core part and Bergamo got their own specialized interposers.AMD has done something similar with AM5 Zen 4 parts as there is a spot on the interposer where a second die could go on the 8 core and less models. It has been my impression that AMD's consumer IO die for AM5 also supports running dual link to a single CCD but they haven't leveraged it in shipping products for identical testing/economies of scale reasoning.
This new IO die increases link count so running up to 8 chiplets with dual Infinity Fabric links is on the table. It'll be interesting to see if AMD flexes their options more with Zen 5 as the focus has been on latency and bandwidth.
dotjaz - Friday, July 26, 2024 - link
Direct quote from AMD."In processor models
with four CPU dies, two connections can be used to optimize
bandwidth to each CPU die. This is the case for some EPYC 9004
Series CPUs and all EPYC 8004 Series CPUs"
dotjaz - Friday, July 26, 2024 - link
https://www.amd.com/system/files/documents/4th-gen...Kaique Gerais - Friday, June 7, 2024 - link
Where are you getting 16 Chiplets from? 12 ccd by all information is the maximum amountdotjaz - Friday, July 26, 2024 - link
There are 16 IF links, therefore up to 16 CCX is possible, maybe not on SP5 due to routing, but it is possible.NextGen_Gamer - Monday, June 3, 2024 - link
I wouldn't presume that the Zen 5 (non-C) version is built using TSMC N4. It is probable, yes, but on the mobile side "Strix Point" is a combination of Zen 5/Zen 5C and using TSMC N4. That means AMD already took the time to design Zen 5C around both TSMC N4 and N3 nodes, releasing in the same year even. It wouldn't be a stretch to say they did the same for Zen 5, especially being that EPYC is much more profitable than Ryzen.Kaique Gerais - Friday, June 7, 2024 - link
So as far as i understand Zen C cores are developed from mobile, so if next generation APU is being designed on 3nm it would make sense for zen 5c to have both 4nm and 3nm where as big Zen 5 is only on 4nm.ballsystemlord - Monday, June 3, 2024 - link
A real pity we only got a NAMD benchmark for regular computing applications.SanX - Tuesday, June 4, 2024 - link
How much were loosing previous gen Bergamo chips vs regular Genoa on broad number of real life tests like the ones in The Phoronix Test Suite developed by Michael Larabel et al?SanX - Wednesday, June 5, 2024 - link
Passmark CPU tests claims that previous to Turin Bergamo with 128 efficient cores was equivalent to ~54 performance cores of Genoa. Current 128c Turin probably faster and could be like 64c Genoa i guess, and 192c one will be in average like 96c Genoa. Surprise me AMD if i am wrongschujj07 - Wednesday, June 5, 2024 - link
128c Bergamo has a faster Spec INT than 96c Genoa but lower Spec FP. Depending on workload there will be minimal performance difference between Bergamo and Genoa outside of clock speed.SanX - Thursday, June 6, 2024 - link
That lower one and higher another is already well summarized into final scores of Bergamo vs Genoa in the PassMark High End numbers : 94 vs 122. It is expected that 192c Turin hopefully will add some performance and both will be approximately in par. What we have at the end? The 3nm 360W TDP 192c Bergamo vs 3 years old 5nm 360W TDP 96c Genoa having no difference. Where is progress here ? It looks more like a fake nanometers flop.shabby - Tuesday, June 4, 2024 - link
Intel: please stop, we can't take the beating anymore in servers...Threska - Tuesday, June 4, 2024 - link
Still ahead, which is why everyone wants in from AMD to ARM.SanX - Wednesday, June 5, 2024 - link
Intel deserves it. Intel was probably first who started funding VUV lithography, and that happened around 2000-2001 as i participated in these talks and research, but came last who with kicks from others finally began using it. To me what we observe with Intel looks like totally unbelievable, impossible, almost a surr.OreoCookie - Friday, June 7, 2024 - link
If memory serves, the industry started its EUV efforts in the late 1990s (1997/1998ish). That included companies like Zeiss. E. g. they identified two potential wavelengths and key technologies for EUV litho. It is amazing to think that EUV has been over three decades into the making.SanX - Friday, June 7, 2024 - link
EUV is part of broader spectral area called X-ray lithography. People meant by it spectrum starting from vacuum ultraviolet and shorter untill the gamma radiation, approximately from 10 eV to 10 keV. First publications mentioning it dated at least by beginning of 1980th.OreoCookie - Friday, June 7, 2024 - link
I realize the individual pieces and ideas are even older. What I meant is that the industry started to align their efforts in the late 1990s. E. g. focussing on two wavelengths allowed the industry to develop mirrors, resists and so forth independently from one another.There are other ideas that popped up, too, but weren’t picked up (plasmon-polariton-mediated near-field exposure comes to mind).
Blastdoor - Wednesday, June 5, 2024 - link
AMD should enjoy the moment. Intel's new E-cores appear to be pretty impressive and they are going to be packing a ton of them into upcoming Xeons.Dante Verizon - Tuesday, June 4, 2024 - link
So there will be a 256-core 3nm version?SanX - Wednesday, June 5, 2024 - link
Would like to be wrong but for performance core 3nm Turin this sounds like a pipe dream even with 192c variant. Though in good old days of not that fake nanometers and planar transistors the transition from 5nm to 3nm (the largest jump actually of the order of area ratios (5/3)^2 = 2.8 ) would get us even probably 300 cores .Silver5urfer - Tuesday, June 4, 2024 - link
Biggest data point was 33% Datacenter marketshare by AMD now was mentioned at Computex by Lisa Su. That means ARM and AMD will pillage everything stronghold that Intel had for decades, it's eroding very fast. Add the Supercomputer Aurora failures due to Pointe Vecchio and SPR XEONs Intel is really in a sad state. Deserves them right for selling their soul to pathetic investors and the leadership also CA state political landscape.That said this is Turin Zen 5C, I think I was shocked seeing how can AMD put 16C CCDs, I remember Ian mentioning AMDs limit of 8C CCDs, yea but still impressive anyways over Bergamo. I hope Zen continues to innovate.
I believe AMD's hard time is to get TMSC allocation and maintain the cost, being the sole leader TSMC will charge a ton for every client and AMD has to manage all of that so they balance out the Client by giving it N4 and the Data Center N3.
Kevin G - Tuesday, June 4, 2024 - link
Intel's short comings in server haven't their lack of meeting their target performance expectation but rather hitting their release dates. Intel's target for Sapphire Rapids was to be released around the time frame as Milan-X and it would've been far more competitive. Ice Lake-SP was also more than a year late and be more of a Zen 2 competitor. Intel had to get its 10 nm production in line for Ice Lake. Even prior to this, Intel has fighting their own design errors with Sky Lake needing a bug fix in Cascade Lake for Optane support and the infamous TSX issues in Haswell-EP/EX. Intel simply hasn't been executing on their road maps like they should have in server for a very, very long time.Ponte Vecchio was stupidly ambitious with as many of chiplets that it used in its packaging. Had it launched on time as they initially said it would, it'd have been a major competitor to nVidia at the time. While ultimately it didn't live up to expectations upon release, Ponte Vecchio did try to do something different and that is what Intel has been lacking for a very, very long time.
If I recall, the 8 CCD limit was for Zen 3 lineage. The IO die for Zen 4 Epyc does support 12 CCD which you can witness on the many images for it without the heat spreader attached.
Blastdoor - Thursday, June 6, 2024 - link
Prescient post, if it was 2015.The CEOs that f'd Intel are long gone. If you only look in the rearview mirror you'll miss the next turn. This industry has a long history of today's losers becoming tomorrow's winners. Intel has been on both sides of that more than once.
SanX - Tuesday, June 4, 2024 - link
Why showing tests for 128c Turin not 192c one? I saw with Milan funny situations where 48c runs faster than 64c oneSanX - Tuesday, June 4, 2024 - link
Worst part of multicore race is that motherboard manufacturers do not offer easy integration of multiple motherboards into parallel system with number of cores you want. Only dual socket motherboards exist and that is your hard like a concrete wall limit.Rudde - Thursday, June 6, 2024 - link
Intel Sapphire rapids still offer 8-socket processors. To my understanding only hyperscalers use more than 2 sockets.