Name: AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome
Item: AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome
Author: Dr. Ian Cutress

AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome

by Dr. Ian Cutress on 6/10/2019 7:22 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

216 Comments

Back to Article

JohnLook - Monday, June 10, 2019 - link
@Ian Cutress Are you sure the Io dies are on TSMC's 14 & 12 nm processes ?
all info so far was that they were on GloFo's 14 nm ...
Ian Cutress - Monday, June 10, 2019 - link
Sorry, glofo 14 and 12. Matisse IO die is Glofo 12nm. We triple confirmed.
JohnLook - Monday, June 10, 2019 - link
Thanks :-)
scineram - Tuesday, June 11, 2019 - link
It still says Epyc is TSMC.
John_M - Tuesday, June 11, 2019 - link
It would be nice if the article was updated as not everyone reads the comments section and AnandTech articles do often get cited in Wikipedia articles.
Smell This - Wednesday, June 12, 2019 - link
I feel safe in saying that Wiki-Dom will be right on it . . .
;-)

So __ those little white lines are the Infinity Scalable Data Fabric (SDF) and the Infinity Scalable Control Fabric (SCF), connecting "Core" chiplets to the I/O core.

"The SDF might have dozens of connecting points hooking together things such as PCIe PHYs, memory controllers, USB hub, and the various computing and execution units."

"The SDF is a superset of what was previously HyperTransport. The SCF is a complementary plane that handles the transmission ..."
https://en.wikichip.org/wiki/amd/infinity_fabric

Of course, I counted them (rolling eyes at myself), and determined there were 32 connecting a single core chiplet to the I/O core. I'm smelling a rational relationship between those 32, and other such stuff. Are the number of IF links a proprietary secret to AMD?

Yah know? It would be a nice 'get' if a tech writer interviewed someone in that former Sea Micro bunch, and spilled a few beans . . .
Smell This - Wednesday, June 12, 2019 - link

Might be 36 ... LOL
Smell This - Wednesday, June 12, 2019 - link
Could be 42- or 46 IF links on the right
(I'll stop obsessing)
sweetca - Thursday, June 13, 2019 - link
I don't understand anything you said 🙂
Smell This - Sunday, June 16, 2019 - link
I was (am) trolling Ian/AT for a **Deep(er) Dive** on the Infinity Fabric -- its past, and its future. The EPYC Rome processors have 8 "Core" chiplets connecting to the I/O core. Right? Those 'little white lines' (32- to 46?) from each chiplet, presumably, scale to ... infinity?

AMD purchased SeaMicro 7 years ago as the "Freedom Fabric" platform was developed. Initially the SM15000 'stitched' together 512 compute cores, 160 gigabits of I/O networking and 5+ petabytes of storage to form a 'very-high-density server.'

And then . . . they went dark.

https://www.anandtech.com/show/9170/amd-exits-dens...
(see the last comment on that link)
Smell This - Sunday, June 16, 2019 - link

AND ...
it might be 12- to 16 IF links or, another substrate ?
Targon - Thursday, June 13, 2019 - link
Epyc and Ryzen CCX units are TSMC, the true CPU cores. The I/O unit is the only part that comes from Global Foundries, and is probably at TSMC just to satisfy the contracts currently in place.
YukaKun - Monday, June 10, 2019 - link
"Users focused on performance will love the new 16-core Ryzen 9 3950X, while the processor seems nice an efficient at 65W, so it will be interesting so see what happens at lower power."

Shouldn't that be 105W?

And great read as usual.

Cheers!
jjj - Monday, June 10, 2019 - link
The big problem with this platform is that ST perf per dollar gains are from zero to minimal, depending on SKU.
They give us around 20% ST gains (IPC+clocks) but at a cost. Would rather have 10-15% gains for free than to pay for 20%. Pretty much all SKUs need a price drop to become exciting, some about 50$, some a bit less and the 16 cores a lot more.

Got to wonder about memory BW with the 16 cores. 2 channels with 8 cores is one thing but at 16 cores, it might become a limiting factor here and there.
Threska - Tuesday, June 11, 2019 - link
That could be said of any processor. "Yeah, drop the price of whatever it is and we'll love you for it." Improvements cost, just like DVD's costed more than VHS.
jjj - Tuesday, June 11, 2019 - link
In the semi business the entire point is to offer significantly more perf per dollar every year. That's what Moore's Law was, 2x the perf at same price every 2 years. Now progress is slower but consumers aren't getting anything anymore.

And in pretty much all tech driven areas, products become better every year, even cars. When there is no innovation, it means that the market is dysfunctional. AMD certainly does not innovate here, except on the balance sheet. Innovation means that you get better value and that is missing here. TSMC gives them more perf per dollar, they have additional gains from packaging but those gains do not trickle down to us. At the end of the day even Intel tries to offer 10-15% perf per dollar gains every cycle.
AlyxSharkBite - Tuesday, June 11, 2019 - link
That’s not Moore’s Law at all. It stated that the number of transistors would double. Also it’s been dead a while

Sandy bridge 4c 1.16b
Coffee lake 4c is 2.1b (can’t compare the 6c or 8c)

And that’s a lot more than 2 years.
mode_13h - Tuesday, June 11, 2019 - link
Yeah, but those two chips occupy different market segments. So, you should compare Sandybridge i7 vs. Coffelake i7.
Teutorix - Tuesday, June 11, 2019 - link
The number of transistors in an IC, not the number of transistors per CPU core. This is an important distinction since a CPU core in Moore's day had very little in it besides registers and an ALU. They didn't integrate FPUs until relatively recently.

It's about overall transistor density, nothing more. You absolutely can compare an 8c to a 4c chip, because they are both a single IC.

An 8 core coffee lake chip is 20% smaller than a quad core sandy bridge chip. That's double the CPU cores, double the GPU cores, with probably a massive increase in the transistors/core also.

Moore's law had a minor slowdown with intel stuck at 14nm but its not dead.
Wilco1 - Tuesday, June 11, 2019 - link
Moore's Law is actually accelerating. Just not at Intel. See https://en.wikipedia.org/wiki/Transistor_count - the largest chips now have ~20 Billion transistors, and with 7nm and 5nm it looks like we're getting some more doublings soon.
nandnandnand - Tuesday, June 11, 2019 - link
Shouldn't we be looking at highest transistors per square millimeter plotted over time? The Wikipedia article helpfully includes die area for most of the processors, but the graph near the top just plots number of transistors without regard to die size. If Intel's Xe hype is accurate, they will be putting out massive GPUs (1600 mm^2?) made of multiple connected dies, and AMD already does something similar with CPU chiplets.

I know that the original Moore's law did not take into account die size, multi chip modules, etc. but to ignore that seems cheaty now. Regardless, performance is what really matters. Hopefully we see tight integration of CPU and L4 DRAM cache boosting performance within the next 2-3 years.
Wilco1 - Wednesday, June 12, 2019 - link
Moore's law is about transistors on a single integrated chip. But yes density matters too, especially actual density achieved in real chips (rather than marketing slides). TSMC 7nm does 80-90 million transistors/mm^2 for A12X, Kirin 980, Snapdragon 8cx. Intel is still stuck at ~16 million transistors/mm^2.
FunBunny2 - Wednesday, June 12, 2019 - link
enough about Moore, unless you can get it right. Moore said nothing about transistors. He said that compute capability was doubling about every second year. This is what he actually wrote:

"The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. "

[the wiki]

the main reason the Law has slowed is just physics: Xnm is little more (teehee) than propaganda for some years, at least since the end of agreed dimensions of what a 'transistor' was. couple that with the coalescing of the maths around 'the best' compute algorithms; complexity has run into the limiting factor of the maths. you can see it in these comments: gimme more ST, I don't care about cores. and so on. Mother Nature's Laws are fixed and immutable; we just don't know all of them at any given moment, but we're getting closer. in the old days, we had the saying 'doing the easy 80%'. we're well into the tough 20%.
extide - Monday, June 17, 2019 - link
"The complexity for minimum component costs..."

He was directly referring to transistor count with the word "complexity" in your quote -- so yes he was literally talking about transistor count.
crazy_crank - Tuesday, June 11, 2019 - link
Actually the number of cores doesn't matter AFAIK, as Moores Law originally only was about transistor density, so all you need to compare is transistors per square millimeter. Looked at it like this, it actually doesn't even look that bad
chada - Wednesday, June 12, 2019 - link
Moore's law specifically talks about density doubling. If they can fit 6 cores into the same footprint, you can absolutely consider 6 cores for a density comparison. That being said, we have been off this pace for a while.
III-V - Wednesday, June 12, 2019 - link
>Moore's law specifically talks about density doubling.

No it doesn't.

Jesus Christ, why is Moore's Law so fucking hard for people to understand?
LordSojar - Thursday, June 13, 2019 - link
Why it ever became known as a "law" is totally beyond me. More like Moore's Theory (and that's pushing it, as he made a LOT of suppositions about things he couldn't possibly predict, not being an expert in those areas. ie material sciences, quantum mechanics, etc)
sing_electric - Friday, June 14, 2019 - link
This. He wasn't describing something fundamental about the way nature works - he was looking at technological advancements in one field over a short time frame. I guess 'Moore's Observation" just didn't sound as good.

And the reason why no one seems to get it right is that Moore wrote and said several different things about it over the years - he'd OBSERVED that the number of transistors you could get on an IC was increasing at a certain rate, and from there, that this lead to performance increases, so both the density AND performance arguments have some amount of accuracy behind them.

And almost no one points out that it's ultimately just a function of geometry: As process decreases linearly (say, 10 units to 7 units) , you get a geometric increase in the # of transistors because you get to multiply that by two dimensions. Other benefits - like decreased power use per transistor, etc. - ultimately flow largely from that as well (or they did, before we had to start using more and more exotic materials to get shrinks to work.)
FunBunny2 - Thursday, June 13, 2019 - link
"Jesus Christ, why is Moore's Law so fucking hard for people to understand?"

because, in this era of truthiness, simplistic is more fun than reality. Moore made his observation in 1965, at which time IC fabrication had not even reached LSI levels. IOW, the era when node size was dropping like a stone and frequency was rising like a Saturn rocket; performance increases with each new iteration of a device were obvious to even the most casual observer. just like prices in the housing market before the Great Recession, the simpleminded still think that both vectors will continue forevvvvaaahhh.
Ratman6161 - Friday, June 14, 2019 - link
Better yet, why even bother talking about it? I read these architecture articles and find them interesting, but I'll spend my money based on real world performance.
Notmyusualid - Sunday, July 7, 2019 - link
@ Ratman - aye, I give this all passing attention too. Hoping one day another 'Conroe' moment lands at our feet.
RedGreenBlue - Tuesday, June 11, 2019 - link
The immediate value at these price points is the multithreading. Even ignoring the CPU cost, the motherboard costs of Zen 2 on AM4 can be substantially cheaper than the threadripper platform. Also, keep in mind what AMD did soon after the Zen 1000 series launch, and, I think, Zen 2 launch to a degree. They knocked down the prices pretty substantially. The initial pricing is for early adopters with less price sensitivity and who have been holding off upgrading as long as possible and are ready to spring for something. 3 months or so from launch these prices may be reduced officially, if not unofficially by 3rd parties.
RedGreenBlue - Tuesday, June 11, 2019 - link
*Meant to say Z+ launch, not Zen 2.
Spoelie - Wednesday, June 12, 2019 - link
To be fair, those price drops were also partially instigated by CPU launches from Intel - companies typically don't lower prices automatically, usually it is from competitive pressure or low sales.
just4U - Thursday, June 13, 2019 - link
I don't believe that's true at all S. Pricing was already lower than the 8th gen Intels and the 9th while adding cores wasn't competing against the Ryzens any more than the older series..
sing_electric - Friday, June 14, 2019 - link
That's true, but by most indications, if you want the "full" AM4 experience, you'll be paying more than you did previously because the 500-series motherboards will cost significantly more - I'm sure that TR boards will see an increase, too, but I think, proportionately, it might be smaller (because the cost increase for say, PCIe 4.0 is probably a fixed dollar amount, give or take).
mode_13h - Tuesday, June 11, 2019 - link
Huh? There've been lots of Intel generations that did not generate those kinds of performance gains, and Intel has not introduced a newer product at a lower price point, since at least the Core i-series. So, I have no idea where you get this 10-15% perf per dollar figure.
Irata - Tuesday, June 11, 2019 - link
So who does innovate in your humble opinion ?
Looking at your posts, you seem to confuse / jumble quite a lot of things.
Example TSMC: So yes, they are giving AMD a better manufacturing that allows them to offer more transistors per area or lower power use at the same clock speed.
But better perf/ $ ? Not sure - that all depends on the price per good die, i.e. yields, price etc. all play a role and I assume you do not know any of this data.

Moores law - Alx already covered that...

As for the 16 core - what would the ideal price be for you ? $199 ? What do the alternatives cost (CPU + HSF and total platform cost).

If you want to look a price - yes, it did go up compared to the 2xxx series, but compared to the first Ryzen (2017), you do get quite a lot more than you did with the original Ryzen.

1800x 8C/16T 3,6 Ghz base / 4 Ghz boost for $499
3900x 12C/24T 3.8 Ghz base / 4,6 Ghz boost for $499

Now the 2700x was only $329, but its counterpart the 3700x has the same price, roughly the same frequency but a lower power consumption and supposedly better performance in just the range you mention.
Spunjji - Tuesday, June 11, 2019 - link
Nice comprehensive summary there!
Kjella - Thursday, June 13, 2019 - link
The Ryzen 1800x got dropped $150 in MSRP nine months after launch, I think AMD thought octo-core might be a niche market they needed strong margins on but realized they'd make more money making it a mainstream part. I bought one at launch price and knew it probably wouldn't be great value but it was like good enough, let's support AMD by splurging on the top model. Very happy it's now paying off, even though I'm not in the market for a replacement yet.
deltaFx2 - Tuesday, June 11, 2019 - link
@jjj: Rriigght... Moore's law applies to transistors. You are getting more transistors per sq. mm, and that translates to more cores. ST performance is an arbitrary metric you came up with. It's like expecting car power output (HP/W) go increase linearly every new model and it does not work that way. Physics. So, they innovate on other things like fuel economy, better drive quality, handling, safety features... it's life. We aren't in the 1980s anymore where you got 2x ST perf from a process shrink. Frequency scaling is long dead.

The other thing you miss is that the economies of scale you talk about are dying. 7nm is *more* expensive per transistor than 28nm. Finfet, quad patterning, etc etc. So "TSMC gives them more perf per dollar" compared to what? 28nm? No way. 14nm? Nope.
RedGreenBlue - Tuesday, June 11, 2019 - link
Multi-threaded performance does have an effect on single threaded performance in a lot of use cases. If you can afford a 12 core cpu instead of an 8, you would end up with better performance in the following situation: You have one or two multithreaded workloads that will have the most throughput when maxing out 7 strong threads, you want to play a game or run one task that is single-threaded. That single-threaded task is now hindered by the OS and any random updates or processes running in the background.
Point being, if you ever do something that maxes out (t - 1) cores, even if there's only one thread running on each, then suddenly your single threaded performance can suffer at the expense of a random OS or background process. So being able to afford more cores will improve single-thread performance in a multitasking environment, and yes multitasking like that is a reality today in the target market AMD is after. So get over it, because that's what a lot of people need. Nobody cares about you, it's the target market AMD and Intel care about.
FunBunny2 - Wednesday, June 12, 2019 - link
"You have one or two multithreaded workloads that will have the most throughput when maxing out 7 strong threads, you want to play a game or run one task that is single-threaded."

the problem, as stated for years: 'they ain't all that many embarrassingly parallel user space problems'. IOW, unless you've got an app *expressly* compiled for parallel execution, you get nothing from multi-core/thread. what you do get is the ability to run multiple, independent programs. since we only have one set of hands and one set of eyes, humans don't integrate very well with OS/MFT. that's from the late 360 machines.
yankeeDDL - Thursday, June 13, 2019 - link
I am doing office work, and, according to Task Manager, there are ~3500 threads running on my laptop. Obviously, most threads are dormant, however, as soon as I start downloading something, while listening to music and editing an image, while the email client checks the email and the antivirus scans "everything", I will certainly have more than "10" threads active and running. Having more cores is nearly always beneficial, even for office use. I do swap often between a Core i7 5500 (2 cores, 4 threads) and a desktop with Ryzen 5 1600 (6C, 12T). It is an apples to oranges comparison, granted, but the smoothness of the desktop does not compare. The laptop chokes easily when it is downloading updates, the antivirus is scanning and I'm swapping between multiple applications (and it has 16GB of RAM - it was high end, 4 years back).
2C4T is just not enough in a productivity environment, today. 4C8T is the baseline, but anyone looking to future-proof its purchase should aim for 6C12T or more. Intel's 9 gen is quite limited in that respect.
Ratman6161 - Friday, June 14, 2019 - link
Personally I would ignore anything to do with pricing at this point. The MSRP can be anything they want, but the street prices on AMD processors have traditionally been much lower. At ouor local Microcenter for example, 2700X can be had for $250 while the 2700 is only $180. On the new processors, if history is any indicator, prices will fall rapidly after release.
mode_13h - Tuesday, June 11, 2019 - link
I think the launch prices won't hold, if that's your main gripe.

I would like to be able to buy an 8-core that turbos as well as the 16-core, however. I hope they offer such a model, as their yields improve. I don't need 16 cores, but I do care about single-thread perf.
azazel1024 - Tuesday, June 11, 2019 - link
I have a financial drain right now, but once that gets resolved in (hopefully) a couple of months I think I am finally going to upgrade my desktop with Zen 2. Probably look at one of the 8-core variants. I am running an i5-3570 driven at 4Ghz right now.

So the performance improvement should be pretty darned substantial. And I DO a lot of multithread heavy applications like Handbrake, Photoshop, lightroom and a couple of others. The last time I upgraded was from a Core 2 Duo E6750 to my current Ivy Bridge. That was around a 4 year upgrade (I got a Conroe after they were deprecated, but before official EOL and manufacturing ceasing) IIRC. Now we are talking something like 6-7 years from when I got my Ivy Bridge to a Zen 2 if I finally jump on one.

E6750 to i5-3570 represented about a 4x increase in performance multithreaded in 4 years (or ballpark). i5-3570 to 3800x would likely represent about a 3x improvement in multithreaded in 6-7 years.

I wonder if I can swing a 3900x when the time comes. That would be probably somewhat over 4x performance improvement (and knowing how cheap I am, probably get a 3700x).
Peter2k - Tuesday, June 11, 2019 - link
Yeah, but you have to say something negative about everything
bobhumplick - Tuesday, June 11, 2019 - link
i agree. i mean its an incredible cpu line. but just throwing all of that extra die space and power budget at just more cores and cahce. i mean look at the cache to core ratio. when cpu makers just throw more cache or cores at a node shrink it can be because that makes the most sense at the time(the market, workloads, or supporting tech like dram have to catch up to enaable more per core performance) but it can also mean they just didnt know what else to do with the space.

maybe its just not possible to go much beyond the widths modern cpus are approaching. even intel is using a lot of space up for avx 512 which is similar to just adding more cores (more fpu crunchers ) so its possible that neither company knows how to balance all those intructions in flight or make use of more execution resources. maybe cores cant get much more performance.

but if so that means a prettyy boring future. especially if programmers cant find ways to use those cores in more common tasks
GreenReaper - Tuesday, June 11, 2019 - link
A lot of progress has been made. Browsers are far more multithreaded than they once were - and as web pages become more complex, that benefit can scale. Similarly, databases and rendering can scale very well over certain operations.

Said scaling tends to work best for the longest operations, because they can be split up into chunks without too much overhead. The overall impact should be that there are fewer long, noticeable delays. There isn't so much progress for things that are already pretty fast - or long sequences of operations that rely on one another. (However, precomputing and prefetching can help.)
stephenbrooks - Thursday, June 13, 2019 - link
I find it surprising how they add these smallish increases onto execution width, out of order buffers, register files etc. The IPC hasn't stopped increasing, it's just slow-ish. Maybe they're fighting power and latency in those part of the core so the 2x density from a node doesn't translate fully.
Santoval - Tuesday, June 11, 2019 - link
Prices should drop when the competition with Intel becomes fiercer. I don't expect that anytime soon though.. It doesn't look like Intel will manage to release Ice Lake CPUs (except apparently the -U and -Y ones they announced) this year or at all.

Their 10nm+ node is still having serious issues with clocks and thermals, and the yields are much lower than TSMC's 7nm (high performance) node. So "word on the street" is that they won't release Ice Lake CPUs for desktop at all. Id est that they'll can them and release instead Tiger Lake desktop CPUs fabbed with their fixed (??) 10nm++ node variant late next year (as in Q4 2020).
piroroadkill - Wednesday, June 12, 2019 - link
You're wrong. You get more performance than Intel at a lower price. In the case of the 3950X, it's significant. To sell them cheaper would devalue an incredible product, for no reason.
Targon - Thursday, June 13, 2019 - link
Ryzen 7 2700X vs. Ryzen 7 3700X. Same price, better performance. Looking at the 3800X which is $399, look at the IPC+clock speed improvements. The 3900X will obviously come at a cost, because you are getting 50% more cores for that increased price. Single threaded though....at what point do you really focus on how fast or slow a single threaded program is running in this day and age where you run dozens of processes at the same time? If you are running dozens of single threaded programs, then performance will change based on how the OS scheduler assigns them to different CPU cores.
Qasar - Thursday, June 13, 2019 - link
jjj
" They give us around 20% ST gains (IPC+clocks) but at a cost. " that same thing could be said about intels cpus over the last few years... how much performance increase did they give us year over year ?? all while only giving is 4 cores for the mainstream... amd's prices are just fine.. intel is the one that should be dropping their prices, some as low as the $50 you say, but most, $500 or more
Tunnah - Monday, June 10, 2019 - link
I bet now Intel is just going to completely flood ads with the title "Intel beats AMD in pure FPS tests!", because they'll get 210fps where AMD gets 200. And some people will eat it up.

I'm so excited for this upgrade. Replacing a 2700K with a 3800X, where I'll not only get a doubling of cores, but clock for clock I reckon it's a 40, 50% improvement there too.

My Civ games are gonna be so zoomy now..
xrror - Monday, June 10, 2019 - link
Intel will always beat AMD ...
...
...
(at a price point you don't give a f*ck about) (4 digits or more)

Are you a micro-trader hardwired into the BS Stock Exchange? You think $1000+ is too much for the fully enabled processor arch you want to overclock should cost you?

Sorry, Intel doesn't have the time of day for you after 2011, after Sandy Bridge took away the ability to overclock blessed "K" skus...

oh sure, there are others. IDT and Cyrix are dead but... let me introduce you to...
AMD
xrror - Monday, June 10, 2019 - link
This isn't aimed at you Tunnah. I meant it as humor.

Read my comment like some exciting infocommercial, with ... (insert commanding infomercial voice here) hehe
Makaveli - Tuesday, June 11, 2019 - link
The 2700k and the 3800X are both 8C 16T designs.
scineram - Wednesday, June 12, 2019 - link
No.
Xyler94 - Thursday, June 13, 2019 - link
Yes
Xyler94 - Thursday, June 13, 2019 - link
If he meant 2700x, of course. Darn misreading :P
nevcairiel - Monday, June 10, 2019 - link
A quick note. AVX2 is actually primarily Integer. AVX1 (or just AVX) is 256-bit floating point. The article often refers to "full AVX2 support", which isn't necessarily wrong, but Zen2 also adds full AVX support equally.
NikosD - Saturday, June 15, 2019 - link
AVX256 is both integer and floating point because it includes AVX2 FMA which doubles floating point capability compared to AVX1
NikosD - Saturday, June 15, 2019 - link
AVX256 was a typo, I meant AVX2 obviously.
eastcoast_pete - Monday, June 10, 2019 - link
Thanks Ian? Two questions: what is the official memory bandwidth for the consumer chips? (Sounds like they remain dual channel) and: Any words on relative performance of AMD's AVX 2 implementation vs. Intel's AVX 512 with software that can use either?
emn13 - Tuesday, June 11, 2019 - link
AVX-512 is a really misleading name; the interesting... bits... aren't the 512-bit width, but the dramatically increased flexibility. All kinds of operations are now maskable and better reshufflable, and where specific sub-segements of the vector were used, they're now sometimes usable at 1bit granularity (whereas previously that was greater).

Assuming x86 sticks around for high-perf computing long enough for compilers to be able to automatically leverage it and then for most software to use it, AVX-512 is likely to be quite the game changer - but given intel's super-slow rollout so far, and AFAIK no AMD support... that's going to take a while.

Which is all a long-winded way to say that you might well expect AMDs AVX2 implementation to be not all that much slower than intel's 512 when executing code that's essentially AVX2-esque (because intel drops the frequency, so won't get the full factor 2 speedup), but AVX-512 has the potential to be *much* faster than that, because the win isn't actually in vector-width.
GreenReaper - Tuesday, June 11, 2019 - link
Intel's own product segmentation has caused it to lose its first-mover advantage here. System software aside, there's little point in most developers seeking to use instructions that most of their users will not have (and which they themselves may not have). By the time software does support it, AMD is likely to have it. And of course an increasing number of developers will be pouncing on Zen 2 thanks to fast, cheap cores that they can use to compile on...
HStewart - Tuesday, June 11, 2019 - link
Intel only had AVX 512 versions in Xeon and Xeon derive chips, but the with Ice Lake ( don't really count Canon Lake test run ) AVX 512 will hit main stream starting with in a month and 2020 should be fully roll out.

As for AMD AVX 2 is true 256 bit, the last I heard is that it actually like dual 128 bit unless they change it in Zen 2. I serious doubt AMD AVX 2 implement is going to any much different that Intel AVX 2 and AVX 512 is a total different beast.

It funny years ago we heard the same thing about 64 bit in x86 instructions, and now we here in 512 bit AVX.

As for as AMD support for AVX 512, that does not matter much since Intel is coming out with AVX 512 in full line over next year or so.

But keep in mind unlike normal x86 instruction, AVX is kind of specialize with vectorize processing, I know with Video processing like Power Director this was a deciding factor earlier for it.,
GreenReaper - Wednesday, June 12, 2019 - link
The last you heard? It says clearly on page 6 that there is "single-op" AVX 256, and on page 9 explicitly that the width has been increased to 256 bits:
https://www.anandtech.com/show/14525/amd-zen-2-mic...
https://www.anandtech.com/show/14525/amd-zen-2-mic...

To be honest, I don't mind how it's implemented as long as the real-world performance is there at a reasonable price and power budget. It'll be interesting to see the difference in benchmarks.
arashi - Wednesday, June 12, 2019 - link
Don't expect too much cognitive abilities regarding AMD from HStewart, his pay from big blue depends on his misinformation disguised as misunderstanding.
Qasar - Thursday, June 13, 2019 - link
HA ! so that explains it..... the more misinformation and misunderstanding he spreads.. the more he gets paid.......
HStewart - Thursday, June 13, 2019 - link
I don't get paid for any of this - I just not extremely heavily AMD bias like a lot of people here. It just really interesting to me when Intel release information about new Ice Lake processor with 2 load / s store processor that with in a a couple days here bla bla about Zen+++. Just because 7nm does not mean they change much.

Maybe AMD did change it 256 width - and not dual 128, they should be AVX 2 has been that way for a long time and Ice Lake is now 512. Maybe by time of Zen 4 or Zen+++++ it will be AVX 512 support.
Korguz - Thursday, June 13, 2019 - link
no.. but it is known.. you are heavily intel bias..

whats zen +++++++++ ????
x 86-512 ??????
but you are usually the one spreading misinformation about amd...
" and support for single-operation AVX-256 (or AVX2). AMD has stated that there is no frequency penalty for AVX2 " " AMD has increased the execution unit width from 128-bit to 256-bit, allowing for single-cycle AVX2 calculations, rather than cracking the calculation into two instructions and two cycles. This is enhanced by giving 256-bit loads and stores, so the FMA units can be continuously fed. "
HStewart - Thursday, June 13, 2019 - link
Zen+++++ was my joke as every AMD fan jokes about Intel 10+++ Just get over it

x-86 512 - is likely not going to happen, it just to make sure people are not confusing vector processing bits with cpu bits 64 bit is what most os uses now. for last decade or so

Intel has been using 256 AVX 2 since day one, the earlier version of AMD chips on only had two combine 128 bit - did they fix this with Zen 2 - this is of course different that AVX 512. which standard in in all Ice Lake and higher cpus and older Xeon's.
Qasar - Thursday, June 13, 2019 - link
sorry HStewart... but even sone intel fans are making fun of the 14++++++ and it would be funny.. if you were making fun of the process node.. not the architeCture...
"
x-86 512 - is likely not going to happen, it just to make sure people are not confusing vector processing bits with cpu bits 64 bit is what most os uses now. for last decade or so " that makes NO sense...
HStewart - Thursday, June 13, 2019 - link
One more thing I stay away from AMD unless there are one that bias against Intel like spreading misinformation that AVX 512 is misleading. and it really not 512 surely they do not have proof of that.

AVX 512 is not the same as x86-512, I seriously doubt we will ever need that that but then at time people didn't think we need x86-64 - I remember original day of 8088,. no body thought we needed more 64meg AVX-512 is for vectors which is totally different.
just4U - Thursday, June 13, 2019 - link
I always have a higher end Intel setup and normally a AMD setup as well.. plus I build a fair amount of setups on both. No bias here except maybe.. wanting AMD to be competitive. The news that dropped over the past month was the biggest for AMD in over a decade HS.. If you can't even acknowledge that (even grudgingly..) then geez.. I dunno.

This has been awesome news for the industry and will put intel on their toes to do better. Be happy about it.
Xyler94 - Monday, June 17, 2019 - link
HStewart, please. You don't stay away from AMD at all. You take ANY opportunity to try and make Intel look better than AMD.

There was an article, it was Windows on ARM. You somehow managed to make a post about Intel winning over AMD. Don't spew that BS. People don't hate Intel as much as you make them out to be, they don't like you glorifying Intel.
Korguz - Monday, June 17, 2019 - link
im glad im not the only one that sees this...
Qasar - Monday, June 17, 2019 - link
korguz, you aren't the only one that sees it.

Xyler94, i dont hate intel.. but i am sick of what they have done so far to the cpu industry, sticking the mainstream with quad cores for how many years ? i would of loved to get a 6 or 8 core intel chip, but the cost of the platform, made it out of my reach. the little performance gains year over year, come on, thats the best intel can do with all the money they have ?? and the constant lies about 10nm.... then Zen is released and what was it, less then 2 months later, intel all of a sudden has more then 4 cores for the mainstream, and even more cores for the HEDT ? my next upgrade at this point, looks to be zen 2.. but i am waiting till the 7th, to read the reviews. hstewart does glorify intel any chance he can, and it just looks so stupid, cause some one calls him out on it.. and he seems to pretty much vanish from that convo
HStewart - Thursday, June 13, 2019 - link
Notice that I mention unless they change it from dual 128 bit.
Targon - Thursday, June 13, 2019 - link
Socket AM4 is limited to a dual-channel memory controller, because you need more pins to add more memory channels. The same applies to the number of PCI Express lanes as well. The only way around this would be to use one of the abilities of Gen-Z where the CPU would just talk to the Gen-Z bus, at which point, dedicated pins for memory and PCI Express could be replaced by a very wide and fast connection to the system bus/fabric. Since that would require a new motherboard and for the CPU to be designed around it, why bother with socket AM4 at that point?
Korguz - Thursday, June 13, 2019 - link
why bother?? um upgrade ability ? maybe not quite needed ? the things you suggest, sound like they would be a little expensive to implement. if you need more memory bandwidth and pcie lanes.. grab a TR board and a lower end cpu....
austinsguitar - Monday, June 10, 2019 - link
Thank you Ian for this write up. :)
megapleb - Monday, June 10, 2019 - link
Why does the 3600X have power consumption of 95W, and the 3700X, with two more cores, four more threads, and the same frequency max, consume only 65W? I'm guessing those two got switched around?
anonomouse - Monday, June 10, 2019 - link
higher sustained base clock drives up the tdp
megapleb - Monday, June 10, 2019 - link
200Mhz extra base increases power consumption by 46%? I would have though max power consumption would be all cores operating at maximum frequency so the base would have nothing to do with it?
scineram - Tuesday, June 11, 2019 - link
Nobody said anything about power consumption.
Teutorix - Tuesday, June 11, 2019 - link
If TDPs are accurate they should reflect power consumption.

If a chip needs 95W cooling it's using 95W of power. The heat doesn't come out of nowhere.
zmatt - Tuesday, June 11, 2019 - link
I think technically it would be drawing a more than its TDP. The heat generated by electronics is waste due to the inefficiency of semi conductors. If you had a perfect conductor with zero resistance in a perfect world then it shouldn't make any heat. However the TDP cannot exceed power draw as that's where the heat comes from. How much TDP differs from power draw would depend on a lot of things such as what material the semiconductor is made or, silicon, germanium etc. And I'm sure design also factors in a great deal.

If you read Gamers Nexus, they occasionally measure real power draw on systems, https://www.gamersnexus.net/hwreviews/3066-intel-i...
And you can see that draw massively exceeds TDP in some cases, especially at the high end. This makes sense, if semiconductors were only 10% efficient then they wouldn't perform nearly as well as they do.
Teutorix - Tuesday, June 11, 2019 - link
"I think technically it would be drawing a more than its TDP"

Yeah, but if a chip is drawing more power than its TDP it is also producing more heat than its TDP. Making the TDP basically a lie.

"The heat generated by electronics is waste due to the inefficiency of semi conductors. If you had a perfect conductor with zero resistance in a perfect world then it shouldn't make any heat"

Essentially yes, there is a lower limit on power consumption but its many orders of magnitude below where we are today.

"How much TDP differs from power draw would depend on a lot of things such as what material the semiconductor is made or, silicon, germanium etc. And I'm sure design also factors in a great deal."

No. TDP = the "intended" thermal output of the device. The themal output is directly equal to the power input. There's nothing that will ever change that. If your chip is drawing 200W, its outputting 200W of heat, end of story.

Intel defines TDP at base clocks, but nobody expects a CPU to sit at base clocks even in extended workloads. So when you have a 9900k for example its TDP is 95W, but only when its at 3.6GHz. If you get up to its all core boost of 4.7 its suddenly draining 200W sustained assuming you have enough cooling.

Speaking of cooling. If you buy a 9900k with a 95W TDP you'd be forgiven for thinking that a hyper 212 with a max capacity of 180W would be more than capable of handling this chip. NOPE. Say goodbye to that 4.7GHz all core boost.

"If you read Gamers Nexus, they occasionally measure real power draw on systems, https://www.gamersnexus.net/hwreviews/3066-intel-i...
And you can see that draw massively exceeds TDP in some cases, especially at the high end. This makes sense, if semiconductors were only 10% efficient then they wouldn't perform nearly as well as they do."

None of that makes any difference. TDP is supposed to represent the cooling capacity needed for the chip. If a "95W" chip can't be sufficiently cooled by a 150W cooler there's a problem.

Both Intel and AMD need to start quoting TDPs that match the boost frequencies they use to market the chips.
Cooe - Tuesday, June 11, 2019 - link
... AMD DOES include boost in their TDP calculations (unlike Intel), and always have. They make their methodology for this calculation freely available & explicit.
Spoelie - Wednesday, June 12, 2019 - link
Look at these power tables for 2700X
https://www.anandtech.com/show/12625/amd-second-ge...

=>You are only hitting 'TDP' figures at close to full loading, so "frequency max" is not limited by TDP but by the silicon.
=>Slightly lowering frequency *and voltage* really adds up the power savings over many cores. The load table of the 3700 will look on the whole different than for the 3600X. The 3700 will probably lose out in some medium threaded scenarios (not lightly and not heavily threaded)
Gastec - Wednesday, June 12, 2019 - link
That's not actually the real power consumption. Most likely you will get a 3700X with 70-75 W (according to the software app indications) but a bit more if tested with a multimeter. Add to that the inefficiency of the PSU, say 85-90%, and you have about 85 W of real power consumption. Somewhat better than my current 110W i7-860 or the 150+W Intel 9000 series ones I would say :)
xrror - Monday, June 10, 2019 - link
funny you say that. AMD TDP and Intel TDP differ. I think.

HEY IAN, does AMD still measure TDP as "real" (total) dissipation power or Intel's weaksauce "Typical" dissipation power?
Teutorix - Tuesday, June 11, 2019 - link
Intel rate TDP at base clocks. AMD do something a little more complex.

Neither of them reflect real world power consumption for sustained workloads.
FreckledTrout - Tuesday, June 11, 2019 - link
In desktops they are simply starting points for the cooling solution needed. They do a lot better in the laptop/tablet space where TDP's make or break designs.
Cooe - Tuesday, June 11, 2019 - link
Yes they do. A 2700X pulls almost exactly 105W under the kind of conditions you describe. Just because Intel's values are completely nonsense doesn't mean they all are.
Targon - Thursday, June 13, 2019 - link
The TDP figures are always a bit vague, because it is about the heat generation, not about power draw. A higher TDP on a chip with the same number of cores on the same design could indicate that it will overclock higher. Intel always sets the TDP to the base clock speed, while AMD has been more about what can be expected in normal usage. The higher the clock speed, the more power will be required, and the higher the amount of heat will be that needs to be handled by the cooler.

So, if a chip has a TDP of 105W, then in theory, you should be able to get away with a cooler that can handle 105W of heat output, but if that TDP is based only on the base clock speed, you will want a better cooler to allow for turbo/boost for sustained periods.
wilsonkf - Monday, June 10, 2019 - link
We want faster memory for Zen/Zen+ because we want higher IF clock, so cutting the IF clock by half to enable higher memory freq. does not make sense. However the improved IF could move the bottleneck somewhere else.
AlexDaum - Tuesday, June 11, 2019 - link
It seems like IF2 can not hit frequencies higher than about 3733MHz DDR (so 1,8GHz real frequency) for some reason, so they added the ability to scale it down to have higher memory clocks. But it is probably only worth it if you can overclock memory a lot higher than 3733, so that the IF clock gets a bit higher again
Xyler94 - Tuesday, June 11, 2019 - link
If I recall, IF2's clock speed is decoupled from RAM speed.
Cooe - Tuesday, June 11, 2019 - link
This is wrong Xyler. Still completely connected.
Xyler94 - Thursday, June 13, 2019 - link
Per this exact Article:

"One of the features of IF2 is that the clock has been decoupled from the main DRAM clock. In Zen and Zen+, the IF frequency was coupled to the DRAM frequency, which led to some interesting scenarios where the memory could go a lot faster but the limitations in the IF meant that they were both limited by the lock-step nature of the clock. For Zen 2, AMD has introduced ratios to the IF2, enabling a 1:1 normal ratio or a 2:1 ratio that reduces the IF2 clock in half."

It seems it has been, but it may still benefit from faster RAM still
extide - Monday, June 17, 2019 - link
It is completely connected -- you can just pick a 1:1 or 2:1 divider now but they are absolutely still tightly coupled. YOu can't just set them independently.
Cooe - Tuesday, June 11, 2019 - link
You're missing the point for >3733MHz memory overclocked where the IF switches to a 2:1 divider. It's for workloads that highly prioritize memory bandwidth over latency, NOT to try and run your sticks 24/7 at like 5GHz+ for the absolute lowest latency possible (bc even then, 3733MHz will prolly still be lower).
Targon - Thursday, June 13, 2019 - link
From what I remember, up to DDR4-3733, Infinity Fabric on Ryzen 3rd generation is now at a 1:1(where previously, Infinity Fabric would run at half the DDR4 speed. You can go above that, but then the improvements are not going to be as significant. For latency, your best bet is to get 3733 or 3600 with as low a CAS rating as you can get.
zodiacfml - Tuesday, June 11, 2019 - link
that 105W TDP is a sign that the 8 core is efficient at 50W or a base clock of 3.5 GHz. The AMD 7nm 8-Core Zen 2 chip has a TDP equal or less than my i3-8100.😅
fmcjw - Tuesday, June 11, 2019 - link
All good and fine, but I want Zen 2 and 7nm on my laptop. If they aren't announcing it today, products aren't gonna ship by holiday 2019, and most consumers will end up buying 10nm Intel devices. Missed chance.
mode_13h - Tuesday, June 11, 2019 - link
Eh, they have perfectly good 12 nm laptop SoCs. 7 nm would've been nice, but it's hard to do everything at once.
levizx - Tuesday, June 11, 2019 - link
Nope, those 12nm APUs have worse battery life (than current 8th Gen) and no TB3/USB4 support. I can't think of a reason where I would choose Ryzen 3xxxU over Ice Lake
mode_13h - Tuesday, June 11, 2019 - link
Do price & availability count?
Xyler94 - Tuesday, June 11, 2019 - link
Misleading remarks. Huawei was able to make a Ryzen APU have better battery life than an 8th gen processor. TB3 and USB4 aren't readily used mainstream yet. Heck USB-C hasn't even caught on yet.

Currently laptop makers aren't optimizing AMD's CPU, that's just the fact.
Cooe - Wednesday, June 12, 2019 - link
This is mostly nonsense. Performance AND battery life for Ryzen Mobile 2nd Gen is extremely close to Intel's current 8th & 9th gen 4-core parts. And until Ice Lake is a real thing that you can actually buy, Ryzen still has a major value advantage + far better iGPU performance. Ice Lake also isn't really any faster CPU wise than Whiskey Lake, because despite increasing IPC by +18%, clock-speeds were dropped from 4.8 to 4.1GHz, or about -16%, erasing nearly all those gains.
fmcjw - Tuesday, June 11, 2019 - link
Yeah, I get that they still need time to get the GPU down to 7nm, so they pushed it back to focus on the CPU for desktop (where performance per watt matters much less than server or mobile). But the silence is not reassuring, and mobile-wise, Zen is still inferior to Intel, maybe not performance-wise as Huawei demonstrates with its Matebook, but definitely battery-wise because of the more powerful GPU.
scineram - Tuesday, June 11, 2019 - link
Nobody is going to buy Shintel vaporware. Or only very few.
The_Assimilator - Tuesday, June 11, 2019 - link
Please edit the table on page 1 to combine the rows with identical values into a single row (e.g. the RAM speed). Also edit the 3950X price to have a ? after it as it's not yet confirmed.
jfmonty2 - Tuesday, June 11, 2019 - link
The 3950X price is most definitely confirmed; Lisa Su said it loud and clear (and showed it on the slide) in AMD's E3 presentation yesterday: https://www.youtube.com/watch?v=yxPBXNuX6Xs&t=...
The_Assimilator - Wednesday, June 12, 2019 - link
The original version of this article noted the 3950X price wasn't confirmed at the time of publication, but it seems they edited that bit out after Su's presentation.

Still need the table to be updated - PCIe and DDR4 columns at least.
vFunct - Tuesday, June 11, 2019 - link
Eventually these Multi-chip packages should incorporate system DRAM (via HBM) as well as SSD NVRAM and GPUs, and sold as full packages that you'd typically see in common configurations. 64GB memory + 1TB SSD + 16 CPU cores + whatever GPU.
mode_13h - Tuesday, June 11, 2019 - link
GPUs are often upgraded more often than CPUs. And GPUs dissipate up to about 300 W, while desktop CPUs often around 100 W (except for Intel's Coffee Lake).

So, it wouldn't really seem like CPUs and GPUs belong together, either from an upgrade or a cooling perspective. Consoles can make it work by virtue of being custom form factor and obviously you don't upgrade a console's GPU or CPU - you just buy a new console.

Therefore, I don't see this grand unification happening for performance-oriented desktops. That said, APUs will probably continue to get more powerful and perhaps occupy ever more of the laptop market.
Threska - Tuesday, June 11, 2019 - link
I imagine that's why there's PCIe 4.0 and now 5.0.
R3MF - Tuesday, June 11, 2019 - link
memory support?

3200 official, or higher...
SquarePeg - Tuesday, June 11, 2019 - link
According to AMD 3200mhz is officially supported but they (AMD) have had memory clocked to over 5000mhz. Infinity fabric will run 1:1 with up to 3733mhz ram but any higher and it splits to 2:1. AMD also said that they have found DDR4 3600 16-21-21 to be the best bang for buck on performance returns.
R3MF - Wednesday, June 12, 2019 - link
cheers
Gastec - Wednesday, June 12, 2019 - link
But will those be 3200 MHz overclocked (XMP) or 3200 SPD?
Cooe - Wednesday, June 12, 2019 - link
The latter. Only >3200MHz is now overclocked.
Lord of the Bored - Tuesday, June 11, 2019 - link
That security slide, though...
Most of a page of "N/A"

I love it.
mikato - Tuesday, June 11, 2019 - link
Hehe, yeah I saw that. That was a good one for the marketing team or whoever makes the slides.
Atari2600 - Wednesday, June 12, 2019 - link
No, for each of those line items they should have said "Intel only"
zalves - Tuesday, June 11, 2019 - link
I really don't understand how one can compare these AMD CPU's with Intel's HEDT, they lack PCIe Lanes and don't support quad-channel memory. And that a huge deal breaker for anyone that wants and needs some serious IO and multi tasking.
TheUnhandledException - Tuesday, June 11, 2019 - link
Well that is what Threadripper is for. Can't wait to see the 3000 series Threadrippers.
John_M - Tuesday, June 11, 2019 - link
So, 5th generation EPYC codename is going to be either Turin, Bolognia or Florence as Palermo has already been used for Sempron.
John_M - Tuesday, June 11, 2019 - link
*that's Bologna, of course. It would be nice to be able to edit posts for typos.
WaltC - Tuesday, June 11, 2019 - link
Great read!
John_M - Tuesday, June 11, 2019 - link
What is the advantage in halving the L1 instruction cache? Was the change forced by the doubling of its associativity? According to the (I suspect somewhat oversimplified) Wikipedia article on CPU Cache, doubling the associativity increases the probability of a hit by about the same amount as doubling the cache size, but with more complexity. So how is this Zen2 configuration better than that in Zen and Zen+?
John_M - Tuesday, June 11, 2019 - link
Ah! It's sort of explained at the bottom of page 7. I had glossed over that because the first two paragraphs were too technical for my understanding. I see that it was halved to make room for something else to be made bigger, which on balance seems to be a successful trade off.
arnd - Wednesday, June 12, 2019 - link
More importantly, 32K 8-way is a sweet spot for an L1 cache. This is what AMD is using for the D$ already and what all modern Intel L1 caches (both I and D) are. With eight ways, this is the largest size you can have for a non-aliasing virtually indexed cache using the 4KB page size of the x86 architecture. Having more than eight ways has diminishing returns, so going beyond 32KB requires extra complexity for dealing with aliasing or physically indexed caches like the L2.
Thunder 57 - Sunday, June 16, 2019 - link
It appears they traded half the L1 instruction cache to double the uop cache. They doubled the associativity to keep the same hit rate but it will hold fewer instructions. However, the micro-op cache holds already decoded instructions and if there is a hit there it saves a few stages in the pipeline for decoding, which saves power and increases performance.
phoenix_rizzen - Tuesday, June 11, 2019 - link
From the article:
"Zen 2 will offer greater than a >1.25x performance gain at the same power,"

I don't think that means what you meant. :) 1.25x gain would be 225% or over 2x the performance. I think you meant either:

"Zen 2 will offer greater than a 25% performance gain at the same power,"

or maybe:

"Zen 2 will offer greater than 125% performance at the same power,"

or possibly:

"Zen 2 will offer greater than 1.25x performance at the same power,"
phoenix_rizzen - Tuesday, June 11, 2019 - link
From the article:
"With Matisse staying in the AM4 socket, and Rome in the EPYC socket,"

The server socket name is SP3, not EPYC, so this should read:

"With Matisse staying in the AM4 socket, and Rome in the SP3 socket,"
phoenix_rizzen - Tuesday, June 11, 2019 - link
From the article:
"This also becomes somewhat complicated for single core chiplet and dual core chiplet processors,"

core is superfluous here. The chiplets are up to 8-core. You probably mean "single chiplet and dual chiplet processors".
scineram - Wednesday, June 12, 2019 - link
No, becausethere is no single chiplet. It is the core chiplet that is either 1 or 2 in number.
phoenix_rizzen - Tuesday, June 11, 2019 - link
From the article:
"all of this also needs to be taken into consideration as provide the optimal path for signaling"

"as" should be "to"
thesavvymage - Wednesday, June 12, 2019 - link
A 1.25x gain is the exact same as a 25% performance gain, it doesnt meant 225% as you stated
dsplover - Tuesday, June 11, 2019 - link
So in other words Anadtech no longer receives engineering samples but tells us what everyone else is saying.
Still love coming here as reviews are good, but boy oh boy yuze guys sure slipped down the ladder.

Bring back Anand Shimpli.
Korguz - Wednesday, June 12, 2019 - link
the do still get engineering samples... but usually cpus...

not likely.. hes working for apple now....
coburn_c - Wednesday, June 12, 2019 - link
What the heck is UEFI CPPC2?
nonoverclock - Wednesday, June 12, 2019 - link
It's related to platform power management.
wurizen - Wednesday, June 12, 2019 - link
"Raw Memory Latency" graph shows 69ns for for 3200 and 3600 Mhz RAM. This "69ns" is irrelevant, right? Isn't the "high latency" associated with Ryzen and IF due to "Cross-CCX-Memory-Latency? This is suppose to be ~110ns at 3200 Mhz RAM as tested by PCPER/etc.... This in my experiences causes "micro-stuttering" in games like BO3/BF4/etc.... And, a "Ryzen-micro-stutter/pause" is different than a micro-stutter/pause associated with Intel. With Intel the micro-stutter/pause happens in BFV, for example, but they happen once or twice per match. With Ryzen, not only is the quality/feeling of the "micro-stutter/pause" different (seems worst), but it is constant throughout the match. One gets a feeling that it is not server-side, GPU side, nor WIndows 10 side. But, CPU-side issue... Infinity Fabric side. So, now Inifinity Fabric 2 is out. Is it 2.0 as in better? No more high latency? Is that 69ns Cross-CCX-memory latency? Why is AMD and Tech sites like Anand so... like... not talking about this?
igavus - Wednesday, June 12, 2019 - link
You are misattributing things here. Your stutter is most def. not caused by memory access latency variations. For it to be visible on even an 144Hz monitor with the game running at the native rate, the differences would have to be obscenely high. That's just unrealistic.

Not that it helps to determine what is causing your issues, but that's not it.
wurizen - Wednesday, June 12, 2019 - link
What?
wurizen - Wednesday, June 12, 2019 - link
Maybe, you guys don't know what Cross-CCX-Memory-Latyency is... my main goal of commenting was what that SLIDE showing "Raw Memory Latency" refers to? Is it Inter-core-memory or Intra-core-memory (intra-core is the same as cross-ccx-memory)...

inter-core memory is data being shuffled within the cores in a CCX module. Ryzen and Ryzen + had two CCX modules with 4 cores each, totaling 8 cores for the 2700x, as an example. If, the memory/data is traveling in the same CCX, the latency is fine and is even faster than Intel. This was true with Ryzen and Ryzen +.

The issue is when data and memory is being shuffled between the CCX modules, and when traversing the so called "Infinity Fabric." Intel uses a Ring Bus and doesn't have an equivalent measurement and data. Intel does have MESH with the x299 which is similar-ish to AMD's CCX and IF. But, Intel Mesh latency is lower (I think. But havent dug around since I dont care about it since I cant afford it)....

So... that is what Cross-CCX-memory-latency is... and that SLIDE shown on this article... WTF does that refer to? 69ns is similar to Intel Ring Bus memory latency, which have shown to be fast enough and is the standard in regards to latency that won't cause issues...

So... as PCPER tested, Ryzen Infinity Fanri 1.0 has a cross-ccx-latnecy to be around 110ns... and I stand my ground (its not bios/reinstallwindows/or windows scheduler/or user-error/or imperceptible/or a misunderstanding / or a mis-atribution (I think)) that it was the reason why I suffered "micro-pauses/stutters in some games. I had two systems at the time (3700k and R7-1700x) and so I was able to diagnose/observe/interpret what was happening....

Also.. I would like to add that the "Ryzen Micro-stutter-Pause" FEELS/LOOKS/BEHAVES different... weird, right?
deltaFx2 - Thursday, June 13, 2019 - link
You might "stand your ground" but that doesn't make it true. First of all, it's pretty clear you don't understand what you're talking about. Intel's Mesh is NOTHING like AMD's CCX. Intel Mesh is an alternative interconnect to ring bus; mesh scales better to many cores relative to ring. In theory mesh should be faster but for whatever reason intel's memory latency on skylake X parts are quite bad relative to skylake client (i.e. no bueno for gaming). I recall 70ns-ish for Skylake X vs 60ns for the Skylake client.

Cross CCX memory latency should not matter unless you have shared memory across threads that span CCXs. Games don't need that many threads: 8 is overkill in many cases and each CCX can comfortably handle 8. Unless you pinned threads to cores and ran an experiment that conclusively showed that the issue was inter-ccx latency (I doubt it), your standing ground doesn't mean much. One could just as well argue that the microstutter was due to driver issues or other software/bios issues. Zen has been around for quite some time and if this was a widespread problem, we'd know.
wurizen - Friday, June 14, 2019 - link
Well, I did mention "similar-ish" of Mesh to Infinity Fabric. It's meshy. And, i guess, you get "comraderie" points for calling me out as "pretty clear you don't understand what you're talking about." That hurts, man! :(

"In theory... Mesh should be faster..." nice way to switch subjects, bruh. yeh, i can throw some at ya, bruh! what?

Cross-CCX-High-Memory-Latency DOES MATTER!

You know why? Because a game shuffles data randomly. It doesn't know that traversing said Data from Core 0 (residing in CCX 1) to Core 3 (in CCX 2) via Infinity Fabric means that there is a latency penalty.

Bruh
deltaFx2 - Friday, June 14, 2019 - link
Actually, no, you're wrong about the mesh. Intel has a logically unified L3 cache; i.e. any core can access any slice of the L3, or even one core can use the entire L3 for itself. AMD has a logically distributed L3 cache which means only the cores from the CCX can access its cache. You simply cannot have core 3 (CCX 0) fetch a line into CCX1's cache. The tradeoff is that the distributed L3 is much faster than the logically unified one but the logically unified one obviously offers better hit rates and does not suffer from sharing issues.

"Cross-CCX-High-Memory-Latency DOES MATTER!" Yes it does, no question about that. It matters when you have lock contention or shared memory that spans CCXs. In order to span CCXs, you should be using more than 8 threads (4 cores to a CCX, 2 threads per core). I don't think games are _that_ multithreaded. This article mentions a Windows 10 patch to ensure that threads get assigned to the same CCX before going to the adjacent one. It can be a problem for compute-intensive applications (y'know, real work), but games? I doubt it, and you should be able to fix it easily by pinning threads to cores in the same CCX.
deltaFx2 - Friday, June 14, 2019 - link
"shared memory that spans CCXs." -> shared DIRTY memory. i.e. core 8 writes data, core 0 wants to read. All other kinds of sharing are a non-issue. Each CCX gets a local copy of the data.
wurizen - Friday, June 14, 2019 - link
Why do you keep on blabbing on about this? Are you trying to fix some sort of muscle?
wurizen - Friday, June 14, 2019 - link
flex^^^
wurizen - Friday, June 14, 2019 - link
OMFG! I. Am. Not. Talking. About. Intel. Mesh.

I. Am. Talking. About. Infinity. Fabric. High. Memory. Latency!

Now that I got that off my chest, let's proceed shall we...

OMFG!

L3 Cache? WTF!

Do you think you're so clever to talk about L3 cache to show off your knowledge as if to convince ppl here you know something? Nah, man!

WTF are you talking about L3 cache, dude? Come on, dude, get with the program.

The program is "Cross-CCX-High-Memory-Latency" with Infinity Fabric 1.0

And, games (BO3, BF1, BF4 from my testing) are what is affected by this high latency penalty in real-time. Imagine playing a game of BO3 while throughout the game, the game is "micro-pausing" "Micro-slow-motioning" repeatedly throughout the match? Yep, you got it, it makes it unplayeable.

In productive work like video editing, I would not see the high latency as an issue unless it affects "timeline editing" causing it to lag, as well.

I have heard some complain issues with it in audio editing with audio work. But I don't do that so I can't say.

As for "compute-intensive applications (y'know, real work)" --delatFx2

....

.....

......

You duh man, bruh! a real compute-intensive, man!

"This article mentions a Windows 10 patch to ensure that threads get assigned to the same CCX before going to the adjacent one." --deltaFx2

Uhhh... that won't fix it. Only AMD can fix it in Infinity Fabric 2.0 (Ryzen 2), if, indeed, AMD has fixed it. By making it faster! And/or, reducing that ~110ns latency to around 69ns.

Now, my question is, and you (deltaFx2) hasn't mentioned it in your wise-response to my comments is that SLIDE of "Raw Memory Performance" showing 69ns latency at 3200 Mhz RAM. Is that Raw memory performance Intra-CCX-Memory-Performance or Inter-core-Memory-Performance? Bada-boom, bish!
wurizen - Friday, June 14, 2019 - link
it's a problem ppl are having, if you search enough....
Alistair - Wednesday, June 12, 2019 - link
those kinds of micro stutters are usually motherboard or most likely your windows installation causing it, reinstall windows, then try a different motherboard maybe
wurizen - Wednesday, June 12, 2019 - link
Wow, really? Re-install windows?

I just wanna know (cough, cough Anand) what the Cross-CCX-Latency is for Ryzen 2 and Infinity Fabric 2.0.

If, it is still ~110ns like before.... well, guess what? 110 nano-effin-seconds is not fast enough. It's too HIGH a latency!

You can't update bios/motherboard or re-install windows, or get 6000 Mhz RAM (the price for that, tjo?) to fix it. (As shown in the graph for whatever "Raw Memory Latency" is for that 3200 Mhz RAM to 3600 Mhz stays at 69 ns and only at 37333 Mhz RAM does it drop to 67ns?).... This is the same result PCPER did with Ryzen IF 1.0 showing that getting Faster RAM at 3200 Mhz did not improve the Cross-CCX-Memory-Latency....
supdawgwtfd - Thursday, June 13, 2019 - link
O don't get any stutters with my 1600.

As above. It's nothing to do with the CPU directly.

Something else is causing the problem.
deltaFx2 - Thursday, June 13, 2019 - link
How so you know for sure that the microstutter or whatever it is you think you are facing is due to the inter-ccx latency? Did you actually pin threads to CCXs to confirm this theory? Do you know when inter-ccx latency even comes into play? Inter-ccx latency ONLY matters for shared memory being modified by different threads; this should be a tiny fraction of your execution time, otherwise you are not much better going multithreaded. Moreover, Each CCX runs 8 threads so are you saying your game uses more than 8? That would be an interesting game indeed, given that intel's mainstream gaming CPUs don't have a problem on 4c8t.

To me, you've just jumped the the gun and gone from "I have got some microstutter issues" to "I know PCPer ran some microbenchmark to find out the latency" to "that must be the problem". It does not follow.
FreckledTrout - Thursday, June 13, 2019 - link
I agree. If micro stutter from CCX latency was really occurring this would be a huge issue. These issues really have to be something unrelated.
wurizen - Friday, June 14, 2019 - link
Another thing that was weird was GPU usage drop from 98% to like 0% in-game, midst-action, while I was playing... constantly, in a repeated pattern throughout the game... this is not a server or games hitching. we understand as gamers that a game will "hitch" once in a while. this is like "slow-motion" "micro-pause" thing happening through out the game. happens in single player (BF1) so I ruled out server-side. It's like the game goes in "slow-motion" for a second... not once or twice in a match, per se. But, throughout and in a repeated constant fashion... along with seeing GPU usage to accompany the effect dropping from 98% or so (normal) to 0% for split seconds (again, not once or twice in a match; but a constant, repeated pattern throughout the match)

And, there are people having head-scratching issues similar to me with Ryzen CPU.

No one (cough, cough Anand; nor youtube tech tubers will address it) seems to address it tho.

But, I think that Ryzen 2 is coming out and if Cross-CCX-High-LAtency-Issue is the same, then we're bound to hear more. I'm sure.

I am thinking tech sites are giving AMD a chance... but not sure... doesn't matter tho. I got a 7700k (I wanted the 8-core thing when 1700x Ryzen came out) but its fine. Im not a fanboy. Just a techboy.... if anything...
wurizen - Friday, June 14, 2019 - link
The "micro-stutter" or "micro-pausing" is not once or twice (I get those with Intel, as well) but, a repeated, constant pattern throughout the match and round of game. The "micro-stutter" and "micro-pause" also "FEELS" different than what I felt with my prior 3700K CPU and current 7700K CPU. It's like a "micro-slow-motion." I am not making this up. I am not crazy!
Gastec - Wednesday, June 19, 2019 - link
I'm 95% convinced that your micro-stuttering is caused by the GPU/drivers. Disable SLI or Crossfire if that's what you have (you never said what video card you use). And please stop trolling.
wurizen - Thursday, June 20, 2019 - link
Really? After all that I said about this... you think that you're 95% sure it's caused by GPU drivers and you want me to disable SLI or Crossfire? Really?
Qasar - Thursday, June 20, 2019 - link
have you even mentioned which vid card you are using, or what version the drivers are, or if they are up to date ??
Gastec - Wednesday, June 19, 2019 - link
It could also be related to G-sync/FreeSync and your monitor. When debugging the best way is to reduce everything to a minimum.
wurizen - Thursday, June 20, 2019 - link
Really, dude? You think it's related to Gsyng and Freesync?
Qasar - Thursday, June 20, 2019 - link
it very well could be.. a little while ago.. there was a whole issue with micro stuttering and the fix.. was in new drivers after a certain revision...
wurizen - Thursday, June 20, 2019 - link
This is gonna be my last comment regarding my comment about Infinity Fabric High memory latency issue... an objective response would be "It could;" or, "it's quit possible;" or, "110 nanoseconds latency via cross-ccx-memory-performance is nothing to sneeze at or disregard or a non-issue;"

instead, i get the replies above; which doesn't need to be repeated since one can just read them. but, just in case, the replies basically say I am trolling such as the most recent from user Gastec; and someone prior I jumped to my conclusion of pointing my scrawny little finger at Infinity Fabric high memory latency; someone plain said I didn't know what I was talking about; etc!

So, I just wanna say that as my one last piece. It's odd no one has aired to the caution of objectivity and just plain responded with "It's possible..."

Instead, we get the usual techligious/fanboyish responses.
Qasar - Thursday, June 20, 2019 - link
it doesnt help, you also havent cited any links or other proof of this other then your own posts... and i quote " And, there are people having head-scratching issues similar to me with Ryzen CPU. " oh.. and where are these other people ?? where are the links and URLs that show this ??? lastly.. IF you have a spare hdd ( ssd or mechanical ) that isnt in use that you could install windows on to, so you wont have to touch the current one you are using, try installing windows on to that, update windows as much as you can via windows update, update all drivers, and do the same things you are doing to get this issue.. and see if you still get it.. if you do.. then it isnt your current install of windows, and it is something else.
Carmen00 - Friday, June 21, 2019 - link
Qasar, Gastec et al, I appreciate that you're trying to educate wurizen but when you get responses like "bruh!" and "Really?", I think it's time to call it quits. Like HStewart, feeding wurizen will just encourage him and that makes it difficult to go through the comments and see the important ones. Trust that the majority of Anandtech's readership is indeed savvy enough to know pseudo-technical BS when we encounter it!
Qasar - Friday, June 21, 2019 - link
well.. the fact that he didnt cite any one else with this problem, or links to forums/web pages.. kind of showed he was just trolling.. but i figured... was worth a shot to give him some sort of help....
jamescox - Saturday, June 22, 2019 - link
You seem to just be trying to spread FUD. Also, you don’t seem to know how long a nanosecond is. The CCX to CCX latency can cause slower performance for some badly written or or badly optimized multithreaded code, but it is on such a fine scale that it would just effect the average frame rate. It isn’t going to cause stuttering as you describe.

The stuttering you describe could be caused by a huge number of things. It could be the gpu or cpu thermally throttling due to inadequate cooling. If the gpu utilization goes down low, that could be due to the game using more memory than the gpu has available. That will slow to a crawl while assets are loaded across the pci express bus. So, if anyone is actually having this problem, check your temperatures, check your memory usage (both cpu and gpu), then maybe look for driver / OS issues.
playtech1 - Wednesday, June 12, 2019 - link
Good products and good prices.

Knock-out blow though? I don't think so for the consumer and gaming space, as I can buy a 9900 today for a fairly small premium over the price of a 3800x and get basically the same performance.

The 12 and 16 core chips look more difficult for Intel to respond to though, given how expensive its HEDT line is (and I say that as an owner of a 7860x).
Atari2600 - Wednesday, June 12, 2019 - link
Yeah, power and thermals are not so important in consumer/game space.

In server/HPC, Intel is in deep crap.
Phynaz - Wednesday, June 12, 2019 - link
Bahahaha. No.
eva02langley - Thursday, June 13, 2019 - link
Phhh... are you ban from WCCFtech?
Gastec - Wednesday, June 19, 2019 - link
I guess I'm neither consumer nor gamer with my i7-860 and GTX 670, G502, G110 and G13. I bought the Logitech G13 just to type better comments on Tweeter :P
Gastec - Wednesday, June 19, 2019 - link
I also turn OFF RGB whenever I can, anti-cosumerism and anti-social is written on my forehead and everyone is pointing at me on the woke streets.
just4U - Thursday, June 13, 2019 - link
I'd say it's a substantial blow to Intel. One of the reasons I picked up a 2700x was the cooler, which is pretty damn good overall.. and the buy in was substantially lower. The 3700x-3800x will only add to that incentive with increased performance (most will likely not even notice..)

Drop in the 12-16 core processors (provided there are no tradeoffs for those additional cores..) make the 9900k unappealing on all fronts. The 9700K was a totally unappealing product with it's 8c/8t package..already and after this launch won't make sense at all.
Gastec - Thursday, June 20, 2019 - link
Core i9-9900 I presume. Nowhere to be found for sale in Mordor. Only found one on Amazon.com for $439.99 reduced from $524.95, sold by "Intel" whomever that scammer is.
Hamza12786 - Thursday, June 13, 2019 - link
I Like This Site.Also Checkout<a href"https://www.khanzadatech.com/2019/05/zong-unlimite... Unlimited Free Internet</a>
Walkeer - Thursday, June 13, 2019 - link
Superb analysis, thanks a lot @Ian! very excited to have the 3900x at home already
FreckledTrout - Thursday, June 13, 2019 - link
Reading over the Zen2 microarchitecture article Im left wondering if the Windows scheduler improvements are making use of a new unmentioned RDPID feature in Zen2 to determine where threads are placed?
cooker358 - Thursday, June 13, 2019 - link
感谢分享！
Gastec - Thursday, June 13, 2019 - link
I too am curious about the latencies, particularly between the chiplets. With the clock selection down to 2 ns and Windows' 10 hopefully improved thread allocation (filling a CCX, then the next one before jumping to the 2nd chiplet) latencies should be lower. We'll just have to wait for honest extensive testing and reviews to be done. You were not planning on buying these CPUs on release day or even worse, pre-ordering them, were you? :)
jamescox - Sunday, June 16, 2019 - link
I expect the CCX to CCX latencies to be very good. There is no memory clock on the cpu chiplet, so the two on die CCX almost certainly communicate at cpu clock rather than memory clock as in Zen 1. It isn’t the same as Intel’s mesh network, but AMD’s solution will have better L3 latency within the CCX compared to Intel. Intel’s mesh network seems to be terrible for power consumption. Intel’s ring bus didn’t scale to enough cores. For their 18 core chip (if I am remembering right), they actually had 3 separate ring buses. The mesh network is obviously not workable across multiple chiplets, so it will be interesting to see what Intel does.

For the chiplet to chiplet latency, they have more than doubled the infinity fabric serdes clock with the higher than PCIe 4.0 speeds. It seems that the internal IF clock is also around doubled. It was operating at actual memory clock in Sen 1 which was half the DDR rate. They seem to be running the internal IF clock the same as the DDR rate with the option to drop back to half DDR rate. So if you are running DDR 3200, the IF clock may actually be 3200 instead of 1600 as it would be in Zen 1. If you re overclocking to DDR 4000 or something, then it may need to drop down to 2000 for the internal IF clock. If this is the way it is set up, then they may have an option to explicitly set the divider, but it is probably going to not be stable past 3.7 GHz or so. The IO die is 14 nm global foundries, so that seems like a reasonable limitation.

The CCX to CCX latency should be less important as the OS and software is better optimized for the architecture. There was quite a few cases on Zen 1 of applications performing significantly better on Linux compared to windows due to the scheduler. Most applications can be optimized a bit for this architecture also. The problem is fine grained shared memory between threads on different CCX. It generally a good idea to reduce that anyway since locking can be detrimental to performance. With Zen 2, I think application level optimizations are probably going to be a lot less necessary anyway, but a lot of the early issues were probably caused by bad multi-threaded programming. This type of architecture isn’t going away. Intel can’t compete with Epyc 2 with a monolithic die. Epyc 2 will be around 1000 square mm of silicon total. Intel can’t scale core count without moving to something similar.
frshi - Friday, June 14, 2019 - link
@Ian Cutress What about 2x16GB sticks compared to 4x8GB? I remember Zen and Zen+ were kinda picky when using 4 sticks. Any change to that on Zen 2?
RAINFIRE - Saturday, June 15, 2019 - link
Yeah - I'm curious. Can anyone speak to the (4 x 32GB) memory that Ryzen 3000 and x570 boards are supposed to support?
Holliday75 - Wednesday, June 19, 2019 - link
IF reviewers have samples at this time they are under an NDA until July 7th. Only unconfirmed leaks can provide that kind of info and its super early. A lot of these types of issues won't be known until they go retail.
AdrianMel - Sunday, June 16, 2019 - link
I would like these AMD chips to be used on laptops. Would be a breakthrough in terms of computing power, lower consumption. I think if a HBM2 or higher memory is integrated into the processor, I think it will double the computing power. Ar fi de studiat si o implementare a 2 porturi superiare thnic vechiului expresscard 54 in care sa putem introduce in laptopuri 2 placi video
jamescox - Sunday, June 16, 2019 - link
Everyone keeps bringing up HBM for cpus as if it is magical in some manner. HBM can provide high bandwidth, but it is still DRAM. He latency isn’t that great, so it isn’t really that useful as a cpu cache. If you are trying to run AVX512 code across a bunch of CPU cores, then maybe you could use the bandwidth. If you have code that can use that level of parallelism, then it will almost certainly run much more efficiently on an actual gpu. I didn’t think that expanding AVX to 512-bits was a good idea. There isn’t too much difference from a cpu perspective between 1 512-bit instruction and 2 256-bit instructions. The registers are wider, but they can have many more smaller registers that are specified in the ISA by using existing register renaming techniques. At 14 nm, the 512-bit units seem to take too much space and consume too much power. They may be more easily doable in 7 nm or below eventually, but they may still have issues running at cpu core clocks. If you have to run it at half clock (which is about where gpus are vs. cpus) then you have lost the advantage of going double the width anyway. IMO, the AVX 512 instructions were Intel’s failed attempt (Xeon Phi seems to have been a disappointment) at making a cpu act like a gpu. They have basically given that up and are now designing an actual gpu.

I went off in a bit of a tangent there, but HBM really isn’t that useful for a cpu cache. It isn’t going to be that low of latency; so it would not increase single thread performance much compared to stuff actually designed to be a low latency cache. The next generations form AMD May start using active silicon interposers, but I would doubt that they would use HBM. The interposer is most likely to be used in place of the IO die. They could place all of the large transistors needed for driving off die interfaces (reason why IO doesn’t scale well) in the active interposer. They could then stack 7 nm chips on top of the active interposer for the actual logic. Cache scales very well which is why AMD can do a $200 chip with 32 MB of L3 cache and a $500 chip with 64 MB of L3. Intel 14 nm chips top out at 38.5 MB, mostly for high priced Xeon chips. With an active interposer, they could, for example) make something like 4 or 8 memory controller chips with large SRAM caches on 7 nm while using the active interposer for the IO drivers. Many different configurations are possible with an active interposer, so it is hard to speculate. Placing HBM on the IO interposer, as the AdoredTV guy has speculated, doesn’t sound like a great idea. Two stacks of HBM deliver 512 GB/s, which would take around 10 IF links to transfer to the CPU chiplets. That would be a massive waste of power. If they do use HBM for cpu chiplets, you would want to connect it directly to the cpu chiplet; you would place the a cpu chiplet and HBM stack on the same interposer. That would have some latency advantage, but mostly for large systems like Epyc.
eek2121 - Wednesday, June 19, 2019 - link
I think what people are getting at is having an L4 Cache. Such a cache would be slower than L3, but would be much faster than DRAM (for now, DDR 5133 was recently demonstrated, that is 2566 MHz double data rate). HBM2 is a prime candidate for that because you can stick 8 Gb on a CPU for $60 and with some engineering work, it would help performance massively. 8 gb could hold practically everything needed in cache. That being said, there are engineering challenges to overcome and I doubt this will ever be a thing.

Once JEDEC approves RAM running at DDR 5600 at reasonable timings it won’t matter anyway. AMD can simply bump up the IF speed to 1:1 and with shortened RAM traces, performance penalties can be minimized.
jamescox - Saturday, June 22, 2019 - link
For an interposer based Epyc package for the next generation, I would expect perhaps they do an active interposer with all of the external interface transistors in the interposer. They could do similar things with a passive interposer also though. The passive interposer could be an intermediate between Zen 3 and Zen 4. Then they could place a large number of 7 nm+ chiplets on the interposer. As I said, it is hard to speculate, but an option that I thought of based on AdoredTV 15 chiplet rumor would be to have 4 memory controller chips, each one running 2 channels (128-bit) DDR5. Those chips would just be the memory controller logic if on an active interposer and the interfaces to the interposer connections. That isn’t much so at 7 nm and below, they could place massive L4 SRAM caches on the memory controller chips. Current ~75 square mm Zen 2 chiplets have 16 MB plus 8 cpu cores, so it could be a large amount of cache; perhaps something like 64 or 128 MB per chip. It wouldn’t be a cheap device, but AMD’s goal is to get into the high end market eventually.

The other chiplets could be 1 or two die to manage connections out to the cpu chiplets. This would just be the logic with an active interposer. With a regular interposer, it would need to have the IO transistors also, but the interfaces are quite small. A single infinity fabric switch chip handling all cpu chiplets could provide very low latency. They may have another chip with a switch to tie everything together or they could actually place a couple cpu chiplets on the interposer. Two extra cpu chiplets or one 16 core chiplet could be where the 80 core rumor came from. A possible reason to do that is to allow an HBM based gpu to be mounted on either side. That would make an exceptional HPC product with 16 cores (possible 64 threads if they go to 4 way SMT) and 2 HBM gpus. Another way to get 80 core would be to just make a 3 CCX chiplet with 12 cores. It looks like the Epyc package will not fit all 12 core die though. A mixture of 4 12-core and 4 8-core looks like it would fit, but it wouldn’t be symmetric though. That would allow a quick Zen 2+ style upgrade. Desktop might be able to go to 24 cores and Epyc to 80. The confusion could be mixing up a Zen 2+ rumor and a Zen 3 rumor or something like that. The interposer makes a lot of sense for the giant IO die that cannot be easily implemented at 7 nm. The yields probably don’t support that large of die, so you use an interposer and make a bunch of 100 square mm sized die instead.

I can’t rule out placing HBM on an IO interposer, but due to the latency not really being that much better than off package DRAM, especially at DDR5 speeds, it just doesn’t seem like they would do it.
nandnandnand - Sunday, July 7, 2019 - link
"That being said, there are engineering challenges to overcome and I doubt this will ever be a thing."

Putting large amounts of DRAM ever closer to the CPU will definitely be a thing:

https://www.darpa.mil/attachments/3DSoCProposersDa...

Intel is already moving in this direction with Foveros, and AMD is also working on it:

https://www.tomshardware.com/news/amd-3d-memory-st...

It doesn't matter how fast DDR5 is. The industry must move in this direction to grab performance and power efficiency gains.
AdrianMel - Sunday, June 16, 2019 - link
I would like these AMD chips to be used on laptops. It would be a breakthrough in computing power, low consumption. I think that if a HBM2 memory or a larger memory is integrated into the processor, I think it will double the computing power. It would be a study and implementation of 2 super ports, the old expresscard 54 in which we can insert 2 video cards in laptops
nandnandnand - Sunday, July 7, 2019 - link
AMD needs to put out some 6-8 core Zen 2 laptop chips.
peevee - Monday, June 17, 2019 - link
Does it mean that AVX2 performance doubles compared to Zen+? At least on workloads where data for the inner loop fits into L1D$ (hierarchical dense matrix multiplication etc)?
peevee - Monday, June 17, 2019 - link
"AMD manages its L3 by sharing a 16MB block per CCX, rather than enabling access to any L3 from any core."

Does it mean that for code and shared data caches, 64MB L3 on Ryzen 9 behaves essentially like 16MB cache (say, all 12/16 cores run the same code as it usually is in performance-critical client code and not 4+ different processes/VMs in parallel)? What a waste it is/would be...
jamescox - Saturday, June 22, 2019 - link
The caches on different CCXs can communicate with each other. In Zen 2, those one the same die probably communicate at core clock rather than at memory clock; there is no memory clock on the cpu chiplet. The speeds between chiplets have essentially more than doubled the clocks vs. Zen 1 and there is a possibility that they doubled the widths also. There just about isn’t any way to scale to such core counts otherwise.

An intel monolithic high core count device will have trouble competing. The latency of their mesh network will go up with more cores and it will burn a lot of power. The latency of the L3 with a mesh network will be higher than the latency within a 4-core CCX. Problems with the CCX architecture are mostly due to OS scheduler issues and badly written multithreaded code. Many applications performed significantly better on Linux compared to windows due to this.

The mesh network is also not workable across multiple chiplets. A 16-core (or even a 10 core) monolithic device would be quite large for 10 nm. They would be wasting a bunch of expensive 10 nm capacity on IO. With the large die size and questionable yields, it will be a much more expensive chip than AMD’s MCM. Also, current Intel chips top out at 38.5 MB of L3 cache on 14 nm. Those are mostly expensive Xeon processors. AMD will have a 32 MB part for $200 and a 64 MB part for $500. Even when Intel actually gets a 10 nm part on the desktop, it will likely be much more expensive. They are also going to have serious problems getting their 10 nm parts up to competitive clock speeds with the 14 nm parts. They have been tweaking 14 nm for something like 5+ years now. Pushing the clock on their problematic 10 nm process doesn’t sound promising.
peevee - Monday, June 17, 2019 - link
"One of the features of IF2 is that the clock has been decoupled from the main DRAM clock....

For Zen 2, AMD has introduced ratios to the IF2, enabling a 1:1 normal ratio or a 2:1 ratio that reduces the IF2 clock in half."

I have news for you - 2:1 is still COUPLED. False advertisement in the slides.

And besides, who in their right mind would want to halve IF clock to go from DDR3200 to even DDR4000 (with requisite higher timings)?
BMNify - Saturday, June 22, 2019 - link
the only real world test that matters in the UHD2/8K Rec. 2020/BT.2020 LIVE NHK/bbc broadast of the 2020 Summer Olympics will begin on Friday, 24 July and related video streams is can AMD Zen 2 do it can any pc core do realtime x264/x265/ffmpeg software encoding and x264/x265 compliant decoding (notice how many hw assisted encoders today dont decode to spec as seen when you re-enode them with the latest ffmpeg), how many 8k encodes and what overheads are remaining if any can even do one...
stance_changer - Sunday, June 23, 2019 - link
Does IF use PCI E? I thought it used the wiring in 2p epyc systems, and IIRC PCI E doesn't double the bus width every gen, but I would love to be proven wrong.
SlitheryDee - Friday, June 28, 2019 - link
I've been using intel for a few years now, but I must say I can't describe how much I love what AMD is doing these days. I go where the performance per dollar is generally, so the best complement I can pay them is to say my next upgrade will be based on an AMD chip.
SlyNine - Sunday, July 7, 2019 - link
So, what time exactly do these new cpus launch. I mean. The hour.
Dodozoid - Sunday, July 7, 2019 - link
Yeah, I was also trying to find that information with no success.
Do the reviewers know already or are they waiting for a release instruction from AMD?
ilux.merks - Sunday, July 7, 2019 - link
What nobody is talking about is how are the fixes for meltdown and spectre on these new amd processors?
Korguz - Sunday, July 7, 2019 - link
simple.. they dont exist, from what i have seen.. those issues.....are intels only ...

AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome

Post Your Comment

216 Comments

Back to Article

JohnLook - Monday, June 10, 2019 - link

Ian Cutress - Monday, June 10, 2019 - link

JohnLook - Monday, June 10, 2019 - link

scineram - Tuesday, June 11, 2019 - link

John_M - Tuesday, June 11, 2019 - link

Smell This - Wednesday, June 12, 2019 - link

Smell This - Wednesday, June 12, 2019 - link

Smell This - Wednesday, June 12, 2019 - link

sweetca - Thursday, June 13, 2019 - link

Smell This - Sunday, June 16, 2019 - link

Smell This - Sunday, June 16, 2019 - link

Targon - Thursday, June 13, 2019 - link

YukaKun - Monday, June 10, 2019 - link

jjj - Monday, June 10, 2019 - link

Threska - Tuesday, June 11, 2019 - link

jjj - Tuesday, June 11, 2019 - link

AlyxSharkBite - Tuesday, June 11, 2019 - link

mode_13h - Tuesday, June 11, 2019 - link

Teutorix - Tuesday, June 11, 2019 - link

Wilco1 - Tuesday, June 11, 2019 - link

nandnandnand - Tuesday, June 11, 2019 - link

Wilco1 - Wednesday, June 12, 2019 - link

FunBunny2 - Wednesday, June 12, 2019 - link

extide - Monday, June 17, 2019 - link

crazy_crank - Tuesday, June 11, 2019 - link

chada - Wednesday, June 12, 2019 - link

III-V - Wednesday, June 12, 2019 - link

LordSojar - Thursday, June 13, 2019 - link

sing_electric - Friday, June 14, 2019 - link

FunBunny2 - Thursday, June 13, 2019 - link

Ratman6161 - Friday, June 14, 2019 - link

Notmyusualid - Sunday, July 7, 2019 - link

RedGreenBlue - Tuesday, June 11, 2019 - link

RedGreenBlue - Tuesday, June 11, 2019 - link

Spoelie - Wednesday, June 12, 2019 - link

just4U - Thursday, June 13, 2019 - link

sing_electric - Friday, June 14, 2019 - link

mode_13h - Tuesday, June 11, 2019 - link

Irata - Tuesday, June 11, 2019 - link

Spunjji - Tuesday, June 11, 2019 - link

Kjella - Thursday, June 13, 2019 - link

deltaFx2 - Tuesday, June 11, 2019 - link

RedGreenBlue - Tuesday, June 11, 2019 - link

FunBunny2 - Wednesday, June 12, 2019 - link

yankeeDDL - Thursday, June 13, 2019 - link

Ratman6161 - Friday, June 14, 2019 - link

mode_13h - Tuesday, June 11, 2019 - link

azazel1024 - Tuesday, June 11, 2019 - link

Peter2k - Tuesday, June 11, 2019 - link

bobhumplick - Tuesday, June 11, 2019 - link

GreenReaper - Tuesday, June 11, 2019 - link

stephenbrooks - Thursday, June 13, 2019 - link

Santoval - Tuesday, June 11, 2019 - link

piroroadkill - Wednesday, June 12, 2019 - link

Targon - Thursday, June 13, 2019 - link

Qasar - Thursday, June 13, 2019 - link

Tunnah - Monday, June 10, 2019 - link

xrror - Monday, June 10, 2019 - link

xrror - Monday, June 10, 2019 - link

Makaveli - Tuesday, June 11, 2019 - link

scineram - Wednesday, June 12, 2019 - link

Xyler94 - Thursday, June 13, 2019 - link

Xyler94 - Thursday, June 13, 2019 - link

nevcairiel - Monday, June 10, 2019 - link

NikosD - Saturday, June 15, 2019 - link

NikosD - Saturday, June 15, 2019 - link

eastcoast_pete - Monday, June 10, 2019 - link

emn13 - Tuesday, June 11, 2019 - link

GreenReaper - Tuesday, June 11, 2019 - link

HStewart - Tuesday, June 11, 2019 - link

GreenReaper - Wednesday, June 12, 2019 - link

arashi - Wednesday, June 12, 2019 - link

Qasar - Thursday, June 13, 2019 - link

HStewart - Thursday, June 13, 2019 - link

Korguz - Thursday, June 13, 2019 - link

HStewart - Thursday, June 13, 2019 - link