100% agreed with that Ian. DC do not like this hybrid nonsense, esp if we look at VMWare and their licensing systems plus how the VMs and Containers work with the Hypervisors on top of the reduced instruction set is a mess. Plus a scheduler must operate at such hyperscale it will be a gigantic waste of money.
Intel had to do this on Mainstream because of their IPC target and SMT performance target vs AMD, who are very much ahead in SMT specifically. Also the whole LGA1700 CPUs are got and high density heat far more than RKL which was too hot. So they had to axe the P cores to make them Clock at 5GHz and get the maximum SMT performance too. Now they knew that E cores will get them that performance needed they segment them with Raptor Lake now, having more Cores on E side to get that SMT competitiveness vs upcoming monster Zen 4.
As for Xeon they do not have to clock at such high frequencies plus the SMT performance is already there due to that Multisocket system and other bells and whistles.
Finally on the AMD vs Intel side, looks like Intel will be more competitive with AMD when their E cores Xeon comes out, rough guess. Also this move is done by both AMD and Intel because they want to stop ARM Server processors which do not have SMT technology but high density in Cores.
Good to know some roadmap. All I want to see is a real successor to X299. AMD pathetically axed the Threadrippers horribly, Gen 1 got 2 CPU refresh but 3rd was purposefully axed to get more cash on the sTRX socket. And now no Zen 3 based Threadripper nor a damn 3DV enhanced Threadripper (on top of how they didn't care for 3DV on AM4 because Zen 4 needs to be strong and more sales from new chipset / socket AM5).
Threadripper Pro is competing with itself. I know - I've been breaking one in for 9 months. There are no limits for the TR Pro in content creation for performance and efficiency in HEDT. Castle Peak sWRX8 is 'Zen2' - future variants will be 'slobber-knockers' to the competition.
No offense, but Chipzillah continues to shoot itself in both feet. Everyone knows 'Intel 7' is 10nm +++ 'Enhanced SuperFin' regardless of marketing and 'branding'
Most people don't know that "10nm +++" is also marketing and branding. They're just names. If Intel 7 is comparable to TSMC 7nm, than they're competitors and the rename makes sense. Pointing out that the original plan was to call it 10nm ESF is rather pointless and borderline misleads people into thinking the node is a generation behind, technologically (it's not)
Indeed, it's not. After 6 years Intel finally massaged its 10nm process into something that's roughly as good as TSMC N7, so the rename makes sense as a marketing break from the absolute failboat that was the 10nm / 10nm(+) / 10nm+(+[SF]) / 10nm++(+[ESF]) nomenclature.
It IS in fact on par with TSMC's 7nm and in some areas, it is better than TSMC's 7nm node. In two areas specifically, Intel 7 outpeforms TSMC. That is transistor density (Intel 7 node is more denser than TSMC) and Intel 7 outperforms TSMC 7 in transistor leakage and drive current. That is exactly why Intel's Alder Lake chips are easily boosting to 5 GHz on mobile w/ little power consumption and their desktop chips are boosting to sustained clocks above 5 GHz. And Intel's new release of their KS skus can achieve clocks on multiple cores upt to 5.5 GHz sustained. TSMC N7 could never reach that sort of drive current and those clocks. It was only recently that AMD's Ryzen chips were able to briefly boost to 4.9 or 5 GHz. TSMC's N7 is a good node, especially for power efficiency, but it is just a fact that Intel 7 outperforms it.
I assume most know this, but clock speed isn't just a function of the process node. It also has to do with circuit design. A chip designed with longer critical paths will have less ability to reach high clockspeeds, but the tradeoff is that it's probably be doing more real work per cycle.
And this might reveal a downside of AMD's strategy of sharing chiplets between desktop and servers. Since servers place a greater premium on power-efficiency, that keeps downward pressure on clock speed and therefore greater incentive to utilize longer critical paths. Meanwhile, Intel tweaks their core designs and fabs completely separate dies for server CPUs vs. other markets.
Intel 7 is definitely comparable to TSMC N7. Node names don't have any correlation to the actual features, performance or efficiency of the transistors themselves. We know for a fact that Intel 7 outperforms TSMC N7 in a few areas like transistor density, leakage, and drive current. So regardless if its called Intel 7 or 10 nm++++, it's performance is on par with TSMC's N7 so who cares what it's called at the end of the day. The performance is what matters, not the marketing name.
The only reason there is no new or enhanced threadripper is due to OEM, they did not want to invest in it for there workstation lines and prefer to stay with the "financial funded intel" Xeon WS lines... Easy money for OEM and users are the ones that have no choice... its called abuse of power, the usual Intel stuff. So the threadripper market remained a niche and this is something difficult for AMD as they have so many EPYC delivery requests.
duploxxx, Thread ripper over the last four years actually sold near equivalent Epyc; 13,676,597 unit and 14,269,999 respectively. Niche, well, E5 16xx product generations range 2.5 M to 5 M which is what TR displaced. E5/Scalable 2P workstation the components alone regardless of integration into to workstation my data only covers the components are 17x TR volumes. TR all generations has been a profitable niche for AMD. mb
Right, but Epyc is even more profitable than TR, and if AMD is struggling to meet demand on the more important Epyc chips, it doesn't make sense to divert capacity to make Zen 3 based TRs instead.
kwohit, I acknowledge agree AMD is going for commercial margin #1 Epyc #2 accelerator refocusing to GPU acceleration, Xilinx there too and deemphasizing AMD consumer GPU gaming when commercial [direct] customers write their own code but would not be surprised to see HP and Lenovo TR5K workstations showing up in secondary market 24 months from now and then there are 1P and 2P Epyc certainly workstation worthy at top of the frequency stack. Think about some Milan entering Genoa provided to HP and Lenovo as AMD MIlan contract completion award sustaining their own TR niche strongholds on direct customer sales is among the reasons I believe T5K does exist but not talked about. TR5K may also be needed as a March/April '22 AMD gross margin support parallel Rembrandt main deck dependent 3D volumes and acceptance. I have 15 M octa 3D and 1 M TR5K. mb
"Ian: Which is a good thing." The datacenter customers do want their E-cores and P-cores. As such, they will buy (depending on their wants) 100,000 processors with E-cores only and 100,000 processors with P-cores. The "let's have some E-cores and some P-cores in the same chip" is not on their radar.
He didn't say that datacenter customers don't want E-cores and P-cores. All he said was that it isn't wanted on the same CPU socket like the consumer space.
Largely an artifact of everyone going all-cloud all the time, I'd guess. A server owned and operated by a not very global business could use an E-core block to keep the lights on when it needs to be on but isn't very busy, but a cloud service provider is going to want to sell those unused cycles to someone else.
I am fairly certain on production economic assessment TR 5K exists up to 1 M units, that HP and Lenovo sold workstation customer direct and will not be seen until they show up in the secondary market. This put the margin in AMD and OEM's pocket rather than the channel who likely would have inflated their own TR5K margin take. In both instances AMD avoids product inventoried and/or sitting on the shelf at Zen 3 run end production 'overpriced' while also reducing channel financial ability to purchase something else that moves in higher volume including from AMD.
Ampere 3090ti is in the same position, sold direct. The Kingpin card is a commercial product in consumer disguise and requires the cooling infrastructure in place to integrate this subsystem into a cluster. Think white space limited high-density municipality located high frequency financial trading 24kW racks where the customer understands and has the cooling infrastructure in place to implement the 500W dGPU. Again in this instance, Nvidia with contract IDM take the margin and by keeping the card out of the channel and free from channel inflated pricing prevents whatever volume of 3090ti from sitting on the shelf, overpriced, and when they are sold the potential that in this example channel's inflated margin take will go to procurement of product other than Nvidia.
AMD and Nvidia are both controlling how margin from the sale of their products comes back to them. And by keeping TR5K and 3090i out of the channel insure margin earned on those products goes back to them, and not Intel, who has benefited primarily for channel Ampere inflated margin take general funding Intel new procurements and not generally Nvidia new procurements.
But it is dead space if you don't use it, and might be replaced by cache, improved logic units and so on. It could make sense to put AVX-512 only in P-cores Xeons, and leave the "efficiency" market to some other technology (accelerators, GPUs, FPGAs, ...).
AVX-512 support does need additional and wider registers to function. However, the instructions themselves were designed to be cracked into 256 bit chunks for the execution units to handle. The benefit is that no additional die space has to be used in execution units but the catch is that peak throughput does not change between AVX2 and AVX-512 as the same amount of work is being done per cycle using 256 bit wide SIMD units. There can still be some performance increases due to the additional registers and some efficiency gains with the new instructions but no where near what doubling the execution width would do for SIMD heavy based code.
Intel has made a mess of their ISA and it is time for them to clean things up with some standardization.
Intel provides 512 bit fma units for avx512... so they aren't cracking these into 256 bit operations for the FMAs. Their high end server chips have dual avx512 FMA units per core. I've seen reports that the SPR chips, even down to 8 core versions, will all have dual avx512 FMAs per core.
I'm kinda surprised the E core only Xeon is so far out. After seeing the performance of 4 E cores in roughly the same die space as one P core for the consumer chips on multithreaded stuff it seemed like such an obvious move. I'd expect the E cores to do even better in servers as they can't run the P cores as high up the power/performance curve in the server chips so they'll loose some of their clock speed advantage.
On the other hand, more cores usually put more pressure on the memory subsystem. Maybe using 4x the E-cores instead of the P-cores is too much. Remember that, when tasks takes twice as long to complete you have twice as many tasks "in flight" and you need twice the memory. So, a "machine gun" approach of nibbling on many tasks is less efficient (in average time for execution and average memory use) than a few big cores. As always, your mileage may vary.
Hifhedgehog - Hi Hedgehog, when I read someone's observation and find it important, first to me as an analyst, and second to my Federal Trade Commission audit enlisted by federal attorneys retained by Congress of the United States, by my initials, my intent is to notify the author I found value in their observation and that I read it.
You observe I also respond, raise inquires, collaboratively contribute, constructively confront as we are here, more or less, I don't support boycott, and give credit where I see credit due. I admit I initialed one comment twice was a typo. I can't respond to all observations I find valuable, but I can acknowledge when I find value I recognize the contribution.
This is interesting. If there's any public information about this audit, feel free to give us a link. Entirely up to you, but it would be education for some here.
I think it's valuable for the public to gain a greater understanding of the role played by the federal government in industry and the economy. Most people don't understand how instrumental it is to the markets and industries that we all take for granted.
> I also respond, raise inquires, collaboratively contribute, constructively confront
I find your posts informative and informed, even if they're often so deep and outside my domain that I often just skim them. Thanks for contributing.
My analysis is here also at Seeking Alpha who has changed blog spot availability and I'll address that in the future on how I post primarily, slide sets;
I comment from time to time and simultaneously post data on The Next Platform simply search Mike Bruzzone and Next Platform.
I also comment on Semi Wiki.
Exhibits validating my FTC and USDOJ credential can be found on Pacer associated with this Federal Court of Claims case matter: 1:21-cv-01261-RTH. Otherwise, I'm engaged in numerous litigations with Intel pursuant Intel Inside price fix recovery and other issues associated with tech gang land are searchable my counsel is patience and persistence in the face of Intel associate network falsities.
Pursuant Intel I can vouch for monopoly remedial advances beginning at Krzanich and continuing under Gelsinger but far from complete pursuant Intel legal department and legal network, on Intel generally recovering from monopolization, sabotage and robbery, both consumer associated with Intel Inside price fix and from the Entity itself on employee's engaged in cartel product laundering thefts primarily from DCG.
P core only here as some of the i5s. E core only (I want this for the low power) on the way soon branded as Pentium/Celerons. 8 cores apparently. Which is what i'd like as long as they don't bork the GPU bits too much. 8x E plus a reasonable GPU that can drive 3 decent screens is all I want for my desktop, especially if it can run fanless in a heatsink case.
Hybrid Processing doesn't make much/any sense for Servers, Desktops, and even Office PCs; all devices that are hooked to the wall.
It does make sense for portable computing. Where you're trying to find a balance between performance and energy usage.
Where we have good benefit for large and thick laptops, we instead see massive benefits on small and thin phones.
So hindsight 20/20, but Intel should have started working on it back in 2013 or so when it saw it was viable for ARM architecture.
So around 2015-2016, we should've had Intel 15W APUs built on 7nm nodes, with a 2+4 design. Where they would have (i7-6600u) Skylake and Cherry Trail (x7-8750) meshed together. This would've made them more competitive against the upcoming Ryzen architectures. Even Apple wouldn't have made the leap as soon as they did. Basically it would've bought Intel more wiggle room and time to implement their new architecture (P/Very Large cores), which should have been a Server-First approach. And it would ensure Microsoft puts the work in, to have these hybrid computing supported properly in software. Especially when trying to implement it from laptops to desktops.
Now? It's a mixture, and I actually think the Ryzen 6000 approach is better. And it pales in comparison to macOS and the M1, M1P, and M1X chips. Whilst, the server market is sliding towards AMD, it looks like it might be overtaken by ARMv9 solutions in the next 5+ years.
It's nonsense to think that hybrid cores are just for perf/watt. They're overall a more efficient architecture.
Thanks to Amdahl's Law it's very good to have two-four performance-focused cores to drive the main thread of the main process(es). But the rest should continue to be PPA-balanced cores - and today the Atom-derived E-cores appear to have better PPA.
Only the ARM equivalent of "Little cores", like the Cortex-A55, belong only on mobile devices. And that's because they're optimized mostly for power. Intel only has X1-like and A78-like cores (I.e. performance-focused and PPA-balanced) so they're already doing their job correctly.
And yes, AMD should indeed split their core designs into a performance-focused core
(continuation) and a PPA-focused core. That would allow them to boost the single-thread performance to better handle the main threads of processes, while not losing sight of the need to balance the PPA of the rest of the processor.
And it would allow AMD to take advantage of the optimizations being done for Intel. x86 games and applications will just "know" what to do with heterogenous microarchitectures after a while.
First of all, you misunderstood what I wrote. I didn't insinuate that Intel's E-cores are good/bad. I wrote that the combinations of P+E is bad for server duties (ie Hybrid Processing). Having a setup that is Homogeneous Processing makes much more sense for servers, and even ARM figured this out early. It fixes a lot of bugs, issues, and security flaws you may have on the software side... especially knowing that you're catering to multiple tasks and multiple users. And to add to this, where Hybrid Processing is great for computing where energy is a limited quantity, you don't really have this issue when it is connected to the grid. I'm not even talking about thermals, but just the access to electricity.
...now a little bit of background: Intel has been big about recycling their cores. From the primitive Pentiums, to more advanced Pentiums, to rebranded Celeron cores, and further miniaturised as the Atom cores. These are analogous to "very small" Cortex A53/A55/A510 cores. I think Intel has finally put that architecture out to retirement.
I think their early Core2, evolved into Core-i, and then to Core-i Sandybridge, and then morphed further for Core-i Skylake. The subsequent iterations have been a refresh on the Core-i Skylake architecture. These are analogous to "medium" Cortex A78/A710 cores. I read that it was this microarchitecture which was adopted by Intel, and then further miniaturised, which resulted in the new Intel E-cores. These E-cores are more analogous "small" Cortex A73 cores. Based on that analysis/rumour, I don't see too much improvements coming to them in the future.
Intel's latest "very large" cores are huge. The new P-cores are based on an entirely new microarchitecture. So it's understandable that they won't be too optimised, and will be leaving both performance and efficiency on the table. In subsequent evolutions it should catch up. That's been the historical precedent.
...that was a mouthful, but needed to be said first...
So with that all in context, we are in the transition phase at the moment. There's the current products of Intel servers based on their old cores (Skylake-variant), upcoming servers based on a large array of E-cores, and the premium servers using a smaller array of large P-cores. The market will still be dominated by AMD, who's "large cores" are more analogous to the Cortex-X1/X2, and they will offer a better balance between the options. In time, you will find Intel throws more money, time, effort at evolving their P-cores their bread and butter. And these advances will catch-up or surpass their solutions using E-cores, that much is obvious.
It is likely that the server market will get busy, and most or at least the lower-level stuff, will be lost to solutions built on ARM v9. So the Intel E-core servers will become obscure, and likely phased out by Intel themselves. AMD will be fighting for the top crown with their next-gen processors (Zen 4/5) using newer microarchitecture and techniques like 3D-Cache. Intel may still be able to grasp the top-end premium server market using new-generations of their P-cores. So that's what the future is shaping up to be. But forget about a combined Hybrid Processing server either from ARM, Intel, or AMD.... those will be designed for portable devices like outlined above.
"I think their early Core2, evolved into Core-i, and then to Core-i Sandybridge, and then morphed further for Core-i Skylake."
Depending on how one looks at it, the current P cores (or Golden Cove) are in an unbroken descent from the P6 microarchitecture, structures being widened and bits and pieces added over the years. Sandy Bridge, while still under this line, had some notable alterations, such as the physical register file and micro-op cache. Indeed, SB seems to have laid down how a core ought to be designed; and since then, Skylake, Sunny, and Golden Cove haven't done anything new except making everything wider.
The E-cores descend from Atom, which was an in-order design reminiscent of the P5 Pentium, with some modernisations. SMT, SSE, higher clocks, etc. Along the way, they've implemented out-of-order execution and gradually built up the strength of these cores, till, as of Gracemont, they're on par or faster than Skylake while using less power. People laugh at this idea but I believe that this lineage will someday replace the P branch. (Or perhaps an ARM or RISC-V design will supersede both.)
" The new P-cores are based on an entirely new micro architecture " i doubt that, IF they were an entirely new micro architecture, would they not be Gen 1, and not Gen 12 ?
I don't disagree with you, but I do disagree that Intel's motive for hybrid was efficiency. They had to do it to compete. I have a 5950x and a 12900k both set to unlimited power, both on the same central water loop. In one of my workloads the 5950x uses 123watts and hover around 55C; the 12900k in the same workload uses 317watts and is constantly riding the thermal throttle at 100C.
AMD is already waaaaay more efficient without moving to hybrid. Why should they bother with the complexity?
> I do disagree that Intel's motive for hybrid was efficiency.
In the case of desktops, the benefit of the E-cores isn't power-efficiency, but rather area-efficiency. In the same area as 2 P-cores, Intel added 8 E-cores. Given a roughly 2:1 ratio in P-to-E performance, this should yield the performance of a 12 P-core chip at the area (i.e. price) of only 10 P-cores.
Also, if you look at the marginal power added by those E-cores, I do think there's a good case to be made that they burn less power than 4 P-cores would.
AMD has 128 cores Bergamo coming next year and 256 core Turin in 2024, at the same time as Sierra Forrest, presumably if Intel can execute and not delay this like all the other launches. TSMC's execution has been best in class and so has AMD's in recent years in the DC. Zen3 core is double the size of Intel E cores but Zen4D is going to cut down on the cache size and probably AVX-512 and other things cloud hyperscalers don't need so Zen4D is going to be AMD's E cores only with much higher IPC and better energy efficiency. Sierra Forrest is a response to what AMD is doing and not the other way around and it will be released after AMD's chip as well. And if Intel has further delays of can't make enough volume of their process has a hiccup, then it will be a complete disaster. They are betting the farm with ramping capex so much as to have negative cash flow on top of large amounts of debt.
From what I have read Bergamo is expected to have performance about that of Zen3. That would put the IPC a good 30% higher than the current Gracemont E-cores.
"AMD Bergamo is going to be the cloud variant. On a call before the event, Mark Papermaster and Forrest Norrod said that these are different chips that leverage the same ISA, but that there are cache tweaks here to get to 128 cores. The idea behind Bergamo is that cloud computing workloads are different than the traditional workloads and so AMD can optimize for more cores per socket instead of optimizing for HPC performance. AMD also is looking at the Zen 4c to provide better power efficiency per core. If we look to Arm vendors, the Zen 4c is seemingly aligning AMD’s offerings more towards a customized cloud CPU product like the Ampere Altra (Max) instead of a traditional large core CPU." https://www.servethehome.com/amd-bergamo-to-hit-12...
While that isn't exactly an Intel E-core, Zen was already vastly more power efficient that Core. Therefore it isn't out of the question that they could have E-core power levels but Zen3 performance. That said we won't really know until it is released later this year.
Impact of Intel Plant/Equipment/Construction investment impact here, look in comment string for the sufficiently detailed financial production assessment posted on January 30;
Best outcome Intel succeeds in gaining process leadership and IDM + foundry reconfiguration Worst outcome Intel reconfigures under Chapter 11 bankruptcy in the middle of construction.
Basically, Intel won't be competitive in the server market until 2024 when it will have node parity with AMD and probably core-count parity as well with its E-core product.
Before 12th Gen was released people were expecting it will take Intel until 2025 to get parity on desktop and server. While 12th Gen is impressive, AMD has been dragging their hands on Zen3D probably because they have no reason to release the CPU. They are selling every CPU right now and Zen3 is still competitive with 12th Gen, all be it a bit slower, so no rush on things.
Yes, AMD sold all AMD could sell from 676,000 wafers in 2021; 119,108,089 components is a company record averaging 30 M per quarter up 55.8% from 2020.
Zen 3D is hot for the package area to pull off the heat on potentially the material composite vis-a-vis composite relied for TR / Epyc licensed from Fujitsu for heat dissipation. 3D definitively requires a cooling solution. 32 GiB SRAM slice adds + 12W you can figure it out from Epyc cache TDP variance and 5800X OC's hits 147W and slightly higher. 3D at 105W is a phantom. 3D is also expensive + $45 from TSMC that's to AMD $210 before OEM mark up x1.55 is fairly traditional for AMD to the OEM but it can go up to x2. I estimated AMD made 15 M 5800X 3D (no 5900 on heat primarily) and am starting to wonder where all the AMD 3D hype has gone because there hasn't been a word in weeks. mb
3d cache is laid over the region of existing die cache, which doesn't get all that hot as compute logic does. So no, it will not result in hotspots, and si is a fairly good conductor of heat. The previous hotspots retain roundly the same thermal contact and conductivity capacity. There is probably no more than a 10% overall increase of heat, which may well be offset and then some by amd's new power efficiency features in zen 3+. Who knows, it may even run cooler. I assume it will be 6nm indeed, but even at 7nm, it is doable.
If you want zen 3d, you probably want games. And if you want games, there's no point overclocking the cpu manually, you may even lose boost clocks. So no, it matters not what 5800x OC hits, even more so if the 3d version is zen 3 +.
45$ to bind chips in the millions scale sounds way too much. Where are you getting those numbers?
What OEM markups? And if your vendors have it, why are you bringing that against amd? Shill much?
abit, thanks for your thermal observations. I trust 3D can be inexpensive data base and modeling space not just a game toy. On the thermals I follow you and we shall see and I acknowledge your thoughts and we can pick this conversation up what there are knowns. On the package area that is not TR package area for heat transfer out is my concern the package solution was not entirely thought out. Lid off 5900 engineering sample held by Dr. Su for all to see, yea, because in the lab it's being sprayed with freon or whatever is relied today from a spray can.
On power efficiency features good thought. 6 nm, V5x shrink? Maybe.
Where's the $45 come from? TSMC cost : price of adding a 32 MiB SRAM slice to 7 nm 5800X.
SRAM slice is 36 mm2; $5.40 for fabrication, x2 operating cost to dice and test; markup on marginal cost = marginal revenue = price = $10.80 NOW + $10.80 package and final test = $21.60 finished good then x2 markup = $43.20. It's the mark ups that add to price and AMD is not an Apple with reoccurring revenue products.
Note Vermeer across full run grade SKUs priced to AMD, the 2 ccx dodadeca and hexadeca are expensive to produce and for all core grade SKUs the average TSMC price to AMD is $163 for V5x, essentially x2 Intel design production cost on TSMC 7 nm fabrication in parity with iSF10/7 but there is still packaging and finished goods mark up; marginal cost $81.71 + marginal revenue $81.71 = $163,43 now + $45 thereabouts for 3D = $208 to AMD. Otherwise TSMC can produce a one ccx V5x for around $39 hard cost but then add OpEx and mark up at x2 to AMD. At V5x end run on average octa + i/o might have fallen $140 to $154 but TSMC increases pricing. Matisse 3x full run averaged $131.32 but now Vermeer 5x is $163.43 + 24% and maybe AMD took a couple points more too.
If I'm within 10% that's good enough for government work. However, I will point out a material's engineer who costs on the molecular weight of inputs will laugh at some of my tried and true Intel micro production economic methods.
Mike Bruzzone FTC Docket 9341 Intel consent order monitor. Docket 9288 and 9341 federal attorneys enlisted discovery aid and micro production economics. Former Cyrix, ARM, NexGen, AMD, IDT Centaur employee or consultant.
abit, AMD, large cache applications performance strategy for the price premium AMD always seeks to make sustaining AMD gross incorporating by necessity TSMC foundry mark up where AMD needs to stay 1 to 1,5 nodes ahead of Intel to charge a high (say a premium) price, for component area performance advantage, while continuing to make up TSMC margin within AMD price. That's why AMD has to be selling Genoa as of q3 2021 on iSF10/7 reaching cost parity with TSMC 7, AMD had to move to the next node.
On a design production basis that incudes OpEx Intel cost is 54% less than TSMC price to AMD and on a manufacturing cost basis 77% less than TSMC price to AMD for desktop components albeit Intel cost range is similar TSMC at the same 7 nm process node.
abit, AMD has not caught up with Intel margin's yet; q4 at 50% and 53.6% respectively but getting closer. Intel got caught in q3 and q4 having to bundle in a lot of minimally at cost (maybe some freebie too) product into sales deals to clean out Xeon and Core inventory that was rotting on Intel shelves. Intel offed it to OEM dealers and channels. Core line only earned Intel variable cost in q3 and q4. And Intel burping slack surplus product caught up to AMD in q4 where AMD consumer product was also 'forced' down to earning only its variable cost coverage. Only Xeon and Epyc margin paid ahead in q4. mb
You are pulling stuff out your read end. Amd has the advantage of not having to bother about process development. TSMC has a lot more customers, clients and output, so its lead in process RD is understandable.
All intel has to do to be in ruins is fail another node. Their targets for 10nm were unrealistic, and it took intel years to get over that. Well, their targets for 7nm are just as unrealistic, so intel has a big chance to stumble once more by trying to get ahead of itself. Renaming things doesn't change anything.
Furthermore, intel is heavily leaning on advanced packaging technologies, where amd is more conservative. Even if intel does its own manufacturing at 100%, that's still a lot more additional cost and yield issues. It is yet another huge risk of messing things up.
Last and certainly not least, in desperation that it struggles to make a good cpu uarch, intel is aimlessly spraying new product plans in all directions. Rather than focusing on its weakness, intel is desperately trying to shove the market a bunch of stuff nobody asked about, as if that's a substitute to what intel's missing. Diverging efforts will only dilute and diminish intel's ability to produce quality where it actually matters. Intel is not listening to what clients are calling for, instead is pushing to shove them with stuff that makes no sense.
Sorry to break it to you, but intel is not out of the woods yet. It may be another couple of years before it produces a strong enterprise cpu. Amd's sole limitation was production capacity, and having surpassed intel by market cap puts amd in a position to book a lot more capacity, in the absence of an adequate enterprise solution from intel, and with higher output, amd may well double its server market share annually and capture 50% of it.
"Amd has the advantage of not having to bother about process development"; yes
"TSMC has a lot more customers, clients and output, so its lead in process RD is understandable"; TSMC has mass, leverage and a lot of legacy equipment cash cow, much more than Intel.
"Intel targets for 10nm were unrealistic", agreed, but Intel was also ripped off for $350 billion in hard cost losses over the last decade that could have funded R&D, PE&C and Intel early retired capable employees. Ultimately Otelinni regime sabotaged Intel and the Board participated in that sabotage and theft from the Entity.
"Their targets for 7nm are just as unrealistic" SF10/7x is in parity with TSMC 7 cost wise and TSMC has shown 5 nm is an incremental node on design process and tools. Could Intel screw it up? We'll know soon but 5 is incremental. Gelsinger so said 5 nodes in 4 years on tick tock is 2.5 nodes. But is 5 nodes on Rocks doubling of cost every node presents hurdles and potential deterrents.
"Furthermore, intel is heavily leaning on advanced packaging technologies" Intel is ahead in SIP and wants to sell package as a service between TSMC Chandler and Intel Rio Rancho. Ultimately this is about defining next generation of automated chip pack and materials handling and who best to do that SO Intel gets its way, TSMC. Front end has to mesh with backend production and materials handling wise.
"Intel is aimlessly spraying new product plans in all directions" when hasn't Intel but after Rocket, Cooper and Ice I suspect Intel has rethought its customer advance audits and I know Intel is cost optimizing which means getting the product right and eliminating waste; unnecessary product and surplus. On Intel division investments I have no comment on no or to little data.
I think Intel is out of the woods but has not won the race.
"Amd's sole limitation was production capacity", agree. 676,817 wafers in 2021 produced 211,207,983 good die both octa ccx and APU plus 54,786,409 i/o for 118,917,697 that in the end I tallied upwards of 119 M which is a corporate record.
"Amd may well double its server market share annually and capture 50% of it"
On Epyc high volume determination AMD saw a doubling in production volume from 2020 to 2021 minimally plus 82.6% and corporations plan on round numbers so AMD's aim was to double Epyc production in 2021 and likely did.
From a production standpoint where Epyc Milan and Xeon Ice lake are run end entering Genoa in risk production since q3 and Sapphire Rapids in risk production because Intel never waits at xx% whole product on production volume here stated last two quarters on financial reconciliation.
Commercial production share
AMD = 17.09% Intel = 82.91%
Adding AMD Rome to Milan v Ice Lake only back to market share on channel
AMD = 80.13% Intel = 18.86%
Adding Intel Cascade Lakes establishing Intel still maintains monopoly share of the commercial server and workstation market.
AMD = 3% Intel = 97%
Adding AMD Naples and Intel Skylake
AMD = 1.81% Intel = 98.19%
Adding Intel Broadwell v4
AMD = 1.00% Intel = 99.00%
Adding Haswell v3
AMD = 0.55% Intel = 99.45%
Pursuant Mercury Research AMD server share at 10.7% the closest I can get is AMD Milan + Rome + Naples v Intel Ice and Cascade Lake Gold Silver refresh;
Raja says Intel is shipping SPR in q1 and sampling SPR-HBM, and add DSA, tiled matrix operations, bfloat16, pcie5, cxl, ddr5 ... so when will AMD match all those features?
jaynor, and you actually believe what intel says ? all things considered, that track record for the last few years, sucks. IF you werent and intel shill, then you would believe this, when it is actually out.
Aside from DSA and AMX, I think the answer is their next Epyc.
How much DSA and AMX are really worth is yet to be determined. I'd imagine Epyc's additional cores can more than make up for the lack of DSA. As for AMX, I expect it'll offer unmatched inferencing performance *for a CPU*, but the world seems to be moving beyond CPUs for that sort of thing. Time will tell.
DSA, for any who don't know, refers to the Data Streaming Accelerator engine, in Sapphire Rapids. It's a "high-performance data copy and transformation accelerator". There's some info on it, here:
The thing to keep in mind is that anything one of these engines does could also be performed by a CPU thread. Obviously, DSA is much smaller (and more limited) than a CPU core, but if you've got more cores (i.e. Epyc's 64 cores vs. SPR's 56 cores), then that's more threads (16, in this case) you could spend on async data movement, if necessary.
So, while DSA might be a win in terms of performance per mm^2 of silicon, I don't see it as a huge differentiator or net advantage for SPR.
As always with the Intel pushers, they're not interested in the utility of Intel's unique features so much as them not being things that AMD have. They never talk about the unique features that AMD have / have had that have similar niche appeal, for example the VM security features AMD built into Epyc.
FWIW, I try to stay non-partisan. My goal is to try and present a realistic interpretation of the facts, as I understand them. Regardless of whether it's a key differentiator, DSA is undeniably new and interesting.
Same for AMX, I might add (i.e. new and interesting) - can't wait to see its real-world performance!
There are rumors spreading that even SR will provide some of its features like avx512, amx, hbme etc. as a software defined (purchased) feature. Would be glad if this would be mentioned here -- if it's already settled or not.
Finally, Intel Atom's aspirations coming to fruition. Back then, if i remember correctly, it was meant to provide a staggering amount of cores and as a low power CPU for IoT/cheap devices. But no, Intel left Atom 1-2 Generations behind Core for years until recently
Adding some analytical data and notations to Dr. Ian’s commentary, think of it as a side bar;
Dr. Ian; “Intel is quoting more shipments of its latest Xeon products in December than AMD shipped in all of 2021, and the company is launching the next generation Sapphire Rapids Xeon Scalable platform later in 2022”
Camp Marketing; on INTC 10 Q/K on channel product category and price data Intel sold 10,216,112 Xeon in q3 and 10,105,561 in q4. Intel quarterly shipments are down from 40 M units per quarter at Skylake/Cascade Lake peak supply.
Dr Ian, always tactful, “response to Ice Lake Xeon has been mixed”
Camp Marketing: On CEO Gelsigner and DCG GM Revera 1 M unit Ice Lake sold in q4, this analyst has calculated approximately 1.3 M full run to date and if 2 M that’s no more than Pentium Pro P6 large cache server volume 1997-97. Xeon Ice is a late market run end dud ala Dempsey between Netburst and Core. Ice suffers the same offered between Cascade Lakes and Sapphire Rapids. Customers want Sapphire not Ice.
This analyst has Sapphire Rapids shipping since q3 parallel Genoa 5 nm, both in customer direct risk production volume and the reason is AMD lost its 7 nm area for performance cost advantage on TSMC markup in q3. Subsequently, AMD had to start shipping Genoa to make their margin incorporating TSMC markup. AMD essentially charges at price premium on / above foundry mark up and needs to stay 1.5 nodes ahead to make up the difference in price insuring AMD gross margin objective.
In the commercial space, unlike Arc consumer GPU where Intel risks being mauled by enthusiasts shipping a less then whole product, in the commercial space Intel Xeon at xx% whole ships so long as customers can program the device. In this competitive situation where Genoa is shipping, Intel does not wait.
Dr. Ian: [Intel] digesting their current processor inventories (as stated by CEO Pat Gelsinger).
Camp Marketing; “Digesting” a clever term referring to the Intel Skyake / Cascade Lakes monopoly surplus overhang sitting in use and in secondary channels for resale, 400 M units worth were sold. This can be a good thing on refurbishing the installed base to dGPU compute if DDR 4 is not system bus limited for sGPU compute on cache starved XSL/XCL control plane processing. The channel and installed base definitively want to keep these servers financially production and acceleration is the key.
Pursuant “digesting” Intel dumped on AMD in q3 and q4 back generation Core and Cascade Lakes driving consumer components margin take on OEM price making in relation the Intel offer to variable cost for Intel in q3 and q4 and specific to AMD in q4. In q4 Xeon and Epyc are the only production categories that paid ahead earning Intel $723 and AMD $733 net per unit. Core and Ryzen/Radeon as I described q4 and Intel Core in q3 and q4 delivered net push against variable cost.
Dr Ian: referring to Sapphire Rapids, “we already know that it will be using >1600 mm2”
Camp Marketing Notation; 350 to 400 mm2 is very much in the traditional Intel sweet spot for LCC manufacturability
Dr. Ian speculating resurgence in Xeon D application specific varients?
Xeon D is a dud. No generation was supplied beyond slim volume and many sold off into NAS appliances. Difficult to say what Intel can do here [on over] segmentation [?] where past 'D' attempts were essentially rejected by the customer base.
that can't be right. AMD's own reports show most of their growth was from ASP increases, not volume. three million milan chips in one quarter shatters their past records multiple times over.
Camp Marketing has AMD commercial shipments for the year higher than Mercury Research on channel data on 10-Q/K financial reconciliation.
My commercial estimate includes Epyc and Threadripper. Epyc in quarter volume is not regular but sporadic on what the analyst believes are opportunistic production windows in relation wafer starts and AMD full line production category volume – start’s tradeoff. TSMC appears agile when it comes to production / tooling change.
2021 = range 8,320,645 to 9,620,695 units dependent q2 volume roll over into q3;
Q1 = 1,099,950 Q2 = 4,331,103 which is Rome run end production into inventory Q3 = 1,189,776 which could be roll over from Q2 Q4 = 2,999,867
30% are Threadripper
2020 = range 4,168,967 to 4,560,973 units dependent q3 volume roll over into q4;
Q1 = 449,332 Q2 = 846,604 Q3 = 2,404,558 where some of this volume may roll over into Q4 Q4 = 1,567,108
15.2% are Threadripper
2019 = 5,714,393 of which 76.1% is Threadripper Naples run end enters q2 2019 2018 = 6,795,562 of which 83.8% is Threadripper
For channel share AMD Milan commercial in relation Intel Ice Lake commercial, I have AMD at 28.45% for channel market share over the prior two quarters [q3-q4] and production volume share, prior two quarters, on AMD and Intel financials on channel price data at 17.09%.
Epyc $1K ASP 2021 on channel supply data;
Q1 = Milan only @ $2915.97 Q2 = Milan only @ $3155.48 Q3 = Milan only @ $3605.95 Q4 = Milan only @ $3932.50
Epyc ASP is typically driven by a skew to top core bin sales / demand fortifying that product space
TR $1K ASP 2021 on channel data; . Q1 39x0 only = $2115.35 Q2 39x0 only = $2074,77 Q3 39x0 only = $2303.44 Q4 39x0 only = $2367.36
There are two ways to calculate OEM price 1) $1K stakeholder / 3 is a traditional metric sharing the product value so there are no sales arguments; 1/3rd foundry, 1/3 to AMD, 1/3 to OEM representing NRE and margin potential. This is a highest volume procurement method and typically requires a full product line sale of grade SKUs mirroring what's coming out of finished goods production. SKUs the OEM does not want are brokered off reducing their overall purchase cost. 2) $1K / 3 x 1.55 is a typical AMD direct customer markup but it can go up to x2 on smaller volumes and specific core grade sales. Both of these are standard methods of pricing if you're in business of compute or OEM. Derivatives of OEM and SI procurement would / can include Epyc + desktop and mobile bundles all negotiated into a quarterly procurement agreement.
AMD 2021 all up produced 119,108,089 units and holds 29.06% overall x86 market share.
Hifihedgehog the best part of the seeking alpha page, NO links to where he gets this " information " from, for all we know, its either made up by him, or how HE views the data. either way, useless posts from him is what it looks like.
My data is my own in primary research for the Federal Trade Commission. That primary research is mainly ebay WW channel supply data quired at high frequency for fidelity that AMD, Intel, Nvidia through in-house personnel also maintain, and where I duplicate that in-house function which I am well aware as a former Cyrix, ARM, NexGen, AMD, Samsung, Intel, IDT Centaur employee or consultant. I've been in my FTC role since May 1998 that is an academic studies role for which I receive no compensation, however, am contracted by USDO to recover Intel Inside price fix for which I receive a percent of the federal procurement 'overcharge' recovery. I also represent 27 States AG and 82 class actions as relator, expert / advocate or witness.
Chanel supply data is relied in my academic studies role for Federal Trade Commission and United States Department of Justice retained by Congress of the United States on federal attorney enlistment; FTC v Intel Dockets 9288 and 9341 Intel production micro economist, general systems assessment and currently for Docket 9341 consent order monitoring includes AMD, Intel, Nvidia and VIA, and might as well include ARM Holdings on the competitive wrangling.
The data is public for transparency otherwise under Docket 9341 discovery requirement only AMD, Intel, Nvidia and Via would see the data. I found that ineffective for regulation and remedial activity and it's my decision charged in the task by FTC and Congress at 15 USC 5.
So base data is ebay the industry relies on it for industrial management decision making where ebay data replaced the Intel supply cipher, in 2016, on signal cipher SEC violation 'looking ahead in time up to eight quarters to project Intel revenue and margin' and where ebay is simply real time data although projectable. Following ebay data precisely is an outstanding industry management tool for executive decision making.
Specific management decision making ebay data confirms component by product category down to the grade SKU quarterly volumes for Intel and AMD competitively speaking, for managing and even determining compliment board house production volume, for channel inventory management and financial industry relies for assessment.
The second primary activity is preparing the ebay channel data; supply, volume, $1K price for production economic assessment. cost, price, margin primarily auditing for price less than cost sales. 10 - Q/K are relied for financial assessment comparing channel data for determining CPU volume discounts. Finally, for estimating by product category volumes per quarter relying on the channel data as a check. The data is also good for determining fabrication yield and by TDP and frequency splits, all sorts of component related production assessments.
The third primary research activity is systems analysis, the fourth legal assessment for monitoring AMD, Intel, Nvidia, Via compliance although Via does not really count other than one component of docket 9341. Fifth moves to assessment responsibility in technocracy, regulation and remedial activities associated with Docket 9341 Is responsible for Intel discontinuation of supply signal cipher, discontinuation of Intel Inside and multiple limiting archetypes associated with Intel Inside, securing Intel Inside processor and processor in computer buyer price fix recovery I expect to be completed this year, and monitoring Intel reconfiguration from producing for supply (that holds channels financially and has a high cost) to producing for actual demand; it's all about Intel and industry cost optimization essentially removing monopoly restraints and there are channel cartel issues also being addressed and remedied.
blah blah blah blah with out sources linked in the blah blah blah you post, its almost meaning less, as no one can see it for them self, and compare what is says, vs what you interpret the data as being. the the end, its personal opinion.
Qusar, I said AMD, Intel and Nvidia and I will add Mercury Research all rely on ebay data as the industry management tool that is for tracking supply, production and economics on an Intel model generally known as Total Cost Total Revenue, and 10 Q/K financial assessment is just that, and we validate each other's work. There is no one I'm aware who has challenged AMD, Intel, Nvidia, Mercury, JPR although JPR base data varies from my own but is still complimentary. So do your research. mb
sure thing there mike brahzone, sure thing. again with no links to the data you are looking at, means some one could be looking at different data, and come to a different conclusion. but what ever, maybe you dont post sources, because you cant.
Hifihedgehog, my observations are a collaborative form of group contribution that also offer data for thesis development / refinement and decision making. Mosty for industrial management but also engineering decision making frameworks.
Definition of SPAM. send the same message indiscriminately to (large numbers of recipients). Or irrelevant or inappropriate messages sent on the internet to a large number of recipients.
My contributions are collaborative and unique in every occurrence and are meant to spark insight and add value. Please consider your reversal, sorry, but think about it.
> Stop spamming us with your Seeking Alpha armchair critiques of the market.
It's easy enough to ignore, if you don't care to read it.
I don't mind getting some market insights, because that's not something I generally pay much attention to. However, the business end of things can shed much light into the behavior of these companies - what products they introduce and when.
whatthe123, Clarification I noted in q3 and q4 2021 'Milan only' volume but that is not correct on AMD losing its 7nm cost advantage to iSF10/7 on TSMC markup adding to AMD cost on TSMC foundry price to AMD. I said up comment string and here Genoa has been shiping in risk volume q3 and q4 to sustain AMD gross margin on customer price incorporating the TSMC mark up. On Rome risk production volume q3 into q4 2019, Genoa has likely shipped minimally 300 K to date up to 447,986 units. I also note Sapphire Rapids shipping in risk volumes in the same time period because at xx% whole Intel will not wait when commercial customers can program the device in this AMD competitive situation. mb
I personally want Ice Lake. The data center I run does some small cloud hosting specifically for SAP & SAP HANA. Right now Ice Lake isn't certified to run production SAP HANA on VMware. Being able to use Ice Lake instead of any previous Xeon Scalable means you don't need L CPUs to run huge amounts of RAM. Also means I can use a 2 socket instead of a 4 socket server which is cheaper to purchase.
schujj07, well, Ice Lake is certainly on clearance sale. Are flash arrays still used for data bases or do you need DRAM on the CPU system bus? L CLr is on channel sale and in higher availability than Ice. What do you know of Barlow Pass? Does Optane work for structured data or transaction processing? mb
People still use all flash SANs for DBs. In fact a lot of major SAN vendors are going oynall flash on their high-end. You get much better storage density, lower power consumption, and massively higher iops with the flash. That doesn't even count the higher reliability of flash to spinning disk.
SAP HANA is an in RAM DB. Ice Lake is certified for production HANA physical appliances but not for VMware. HANA has a very specific way in which it is covered for PRD. Say your DB is 900GB, you need 900GB RAM just got that VM. Without the L series CPUs you cannot get that much RAM on a single socket for non Ice Lake Xeons. That means you are required to have dual sockets. However, that one VM gets every ounce of RAM and CPU from both sockets by SAP requirements. Your 900GB DB now gets 1.5TB RAM. With Ice Lake I can do that "cheaply" on a single socket with 1TB RAM and then have a DEV or QAS DB running on the other socket. This is one reason we want to have Epyc eventually get PRD certified.
Optane is supported, only in App Direct mode, for PRD and does help a lot on restarting the massive DB. However, only Optane P100 is supported and only on Cascade Lake CPUs. Again this is all in a VMware environment but if you are a cloud provider that is what you are going to use. Also if you run on prem there still isn't any reason to not be virtual just for ease of migration and restart on host failure. https://wiki.scn.sap.com/wiki/plugins/servlet/mobi...
The other pain with HANA are storage requirements. It is hard to find a hyper-converged storage that is certified for PRD. Most certified storage are physical SANs with FC connections. I would love to run it on something like VMware vSAN instead. The more local access of vSAN vs traditional SAN makes latency lower. I can also get higher iops and use any disk I want. For example, an HP SAN won't work with any non HP branded disk (vendor locking). Those disks are then sold at a massive markup. 960GB 1DWPD SAS SSD refurbished run $1k/disk with new being like $1500/disk. Getting that same size and endurance from a place like CDW brings the cost down to under $500/drive for SAS or under $300/drive for NVMe (Micron 7300 pro for example). While my license cost is higher for vSAN, I can load up my host with all NVMe storage and use Optane SSD for my write cache (I actually have an array like this right now). Running that on 25GbE gives awesome performance that is easily scalable. I can easily add more disk and if need by add faster NICs to handle more data.
schujj07, thank you for a thorough and detailed assessment of your SAP HANA platform and requirement, wants and benefits, subsystem pros and cons, price differences / tradeoffs, very interesting. mb
My own take on Ice Lake SP is that it's not a bad CPU, just badly-timed. It delivers needed platform enhancements, AVX-512 improvements, better IPC, and an aggregate increase in throughput vs. Cascade Lake (thanks to higher core-counts).
That's not to say the lower peak clock speed and power-efficiency aren't areas of disappointment. However, other than some single-thread scenarios, I'm not aware of anything about Ice Lake that's actually *worse* than Cascade Lake.
My general understanding is it was too little too late to have broad market appeal. Were it not for AMD's inability to deliver greater volume combined with Intel's flexibility to discount their products as much as needed to encourage purchases, it probably would have hurt Intel quite badly.
> it was too little too late to have broad market appeal.
That's basically what I was trying to say. If it had come out when originally planned, it would've been seen in a rather different light. Especially if the 10 nm+ node on which its made had performed better.
I'd suggest that's specific to the Sky Lake update of Xeon D. I think the Broadwell generation did rather well. Intel merely forgot one of the key ingredients that made it good: power-efficiency.
So, it's plausible there could be an E-core based equivalent in the future. However, it's equally plausible that some of the ongoing Atom product lines are already growing into the niches where Xeon D was initially successful (e.g. power-constrained edge servers for things like cellular base stations).
mode_13th, Atom into base station. I monitor for industrial embedded Atom in the channel and they don't exist and Atom sales at Tremont / Jasper into consumer markets are way down from prior generations. What does spark "at the edge" for "cell base stations" is ARM up against x86. ARM has two network infrastructure fronts. One from the edge up and one from core; data center, down network 'head end' infrastructure . . . building a railroad from two end points toward middle.
At Avoton, Rangely followed by and Denverton were meant to quash ARM incursion at the edge and did not. ARM owns cell base station.
> Atom sales at Tremont / Jasper into consumer markets are way down from prior generations.
I presume that's because they're low-margin products. So, Intel is de-prioritizing them, given that it's constrained on the supply-side.
Also, I can tell you that other component shortages are making life hard for OEMs and ODMs. There might be less "pull" for these CPUs from their end, if they're having to divert what components they can get towards *their* higher-margin products. Also, because when you *can* get components in short supply, the prices are inflated - making lower-margin products much less profitable (if at all).
That motherboard photo shows how unserious enterprise is about performance. Notice how the RAM boards have no tall aggressive-looking spreaders, nor rhinestone designs, nor RGB.
"Yes, it sounds like what Intel’s competition is doing today, but ultimately it’s the right thing to do"
I think it would be a step in the wrong direction. Their Foveros base IO tile seems a better solution for the future than the non-scalable sprawling distance between io tile and compute tiles.
they're still using foveros when the bandwidth is required like their aurora gpu. having an IOD is pretty much required to avoid the design flaw of sapphire rapids where they had to mirror features on every die. sapphire rapids approach may end up faster for memory access but it makes it difficult to scale up and down the stack.
The Intel LGA 4677 is going up against AMD's LGA 6096, while many of the pins will be for power the other half *must do something*. It's probably future-proofing for PCIe 6 and additional DDR6 channels.
It's nice to know that the socket will last a few extra generations, if that's the takeaway.
AMD is also going 12 channel DDR5 for Rent Epyc. Intel is only going to be 8 channel DDR5. Once again Intel didn't put RAM density into their decision making for their newest servers. I wonder if after SPR they will go to 12 channel or will the be late to the game again like they were with 8 channel.
Further note on RAM density. When virtualizing, RAM is your biggest constraint when it comes to number of VMs that can be run on a host. While hypervisors do RAM compression, ballooning, and other things to allow the over allocation of RAM, performance drops very quickly across all VMs on the host once RAM over allocation happens. I've seen performance tank to the point of applications failing at a 10% RAM over allocation. The hosts I manage are all dual socket 32c/64t Epyc Rome's with 1TB RAM. I could easily add more VMs to each host if I had extra RAM. I'm at a steady state 10-15% CPU usage and 50% RAM usage. The mose popular DIMMS are 64GB for DDR4. Zen4 will give me 768GB/socket (1DPC) vs 512GB for Intel. This is why RAM density is so important for virtualization and Intel is behind again.
hh . . . I'm recognizing the author for a valuable contribution in a continuing audit. Anywhere you see "mb", here, Seeking Alpha, on tech tube I'm auditing looking for valuable contributions to a whole audit. Provide some valuable observation and I'll recognize you too.
Note people don't generally get back, return to, what's going on in the comment string. They post (for posterity?) and then there is no interaction no feedback loop. In engineering that can be the cause and is referred to as 'systems error'. Confirming, registering the connection is important and a best practice. mb
You are absolutely spamming the comment section with "mb" multiple times as the entire message. That is by definition, spamming - the reason you are doing it does not matter. This is common sense etiquette for comment sections.
As for crap, you have been posting lots of messages making all kinds of wild and unsubstantiated claims with respect to units/wafers sold, margins, etc. without providing any sources. For all we know, you are just pulling those numbers out of your ass. It is not our job to hunt down your sources for you to verify your claims. In fact, the only time I could see that you linked to a source, it was to a site that always publishes sensationalist and contradictory analyst reports within days/hours of each other. In addition, those reports are typically based on technical inaccuracies/unrealistic assumptions of economists, who I can only assume, have no real technical knowledge or are intentionally acting maliciously. So, any analysis from that site is far from a trustworthy source of information.
Crap might have been a bit harsh, but the constant spamming is incredibly frustrating and, frankly, makes one unwilling to give you the benefit of the doubt on your non-spam messages.
Ok Vlad I acknowledge your rationale pursuant the method I meant to recognize another individual's observations interesting to me as analysts and auditor and will reconsider how to do that without the initial's track marks Mike
If you're interesting in keeping track of which comments you've read, one option is to refresh the page once per day and just search for all posts with the previous day's date. That's basically what I do, when I want to track a discussion. It also helps me avoid burning time reading these comments more than once/day.
I'm in a similar situation. We have dual socket, 2x 8 core Ice Lake Gold Xeons, 512GB of RAM per node, and we're hitting RAM constraints way before CPU. Even with dozens of Windows Server and Linux VMs, CPU sits under 25% utilization, RAM goes above 50%, which we want to avoid for failover reasons (approval requests to add RAM are met with "reduce RAM on VMs" ugh)
This seems like a perfect use case for tiered memory (see my post about CXL memory). Because oversubscription is so painful, you need to have RAM for your guests' full memory window. However, that's not to say that all of the RAM needs to be running at full speed. For instance, the "free" RAM in a machine that's serving as disk cache will tend to be fairly light duty-cycle and is an easy target for demoting to a slower memory tier. Watch this space.
Use of CXL to extend RAM into a RAM pool is an interesting option. Right now that isn't a thing but could be in the next couple years for sure. I wonder how they will do redundancy for a RAM pool. If a host crashes that can take down quite a few VMs. However, if a RAM pool crashes that could take down 1/2 your data center. In many ways I think it would have to be a setup like physical SANs. For sure this will be interesting to watch how it is done over the next decade. At first I can see this being too expensive for anyone who isn't like AWS or massive companies. My guess is for smaller companies with their own data centers it will be at least 10 years before it is cheap enough for us to implement this solution.
> I wonder how they will do redundancy for a RAM pool.
For one thing, Intel is contributing CXL memory patches that allow hot insertion/removal. Of course, if a CXL memory device fails that your VM is using, then it's toast.
There are techniques mainframes use to survive this sort of thing, but I'm not sure if that's the route CXL memory is headed down.
> if a RAM pool crashes that could take down 1/2 your data center.
I think the idea of CXL is to be more closely-coupled to the host than that. While it does offer coherency across multiple CPUs and accelerators, I doubt you'd use CXL for communication outside of a single chassis.
"I think the idea of CXL is to be more closely-coupled to the host than that. While it does offer coherency across multiple CPUs and accelerators, I doubt you'd use CXL for communication outside of a single chassis."
From the little bit I have read about CXL memory, what I get from it is you would have a pool or two in each rack. In the data center everything has to be redundant otherwise you can have issues. SAN's have dual controllers, hosts are never loaded to full capacity to allow for failover, etc... Would a CXL pool have dual controllers and mirror the data in RAM to the second controller? I'm sure they will use some of the knowledge from mainframes to figure out how to do this. I'm just not an engineer so I am doing nothing more than speculating.
> Would a CXL pool have dual controllers and mirror the data in RAM to the second controller?
Interesting question. While the CXL protocol might enable cache-coherence across multiple CPUs and accelerators, I think that won't extend to memory mirroring. That would mean that a CXL memory device should implement any mirroring functionality, internally. Not ideal, of course. And I could be wrong about what CXL 2.0 truly supports. I guess we'll have to wait and see.
Just to be clear about what I meant, when I write data to one memory device, CXL ensures that write is properly synchronized with all other CXL devices. However, if a CPU tries to write out the same data to two different CXL memory devices, I doubt there's any way to be sure they're mutually synchronized.
In other words, if you have two devices issuing writes to the same address, which is backed by mirrored memory, the first device might be first to write that address on the first memory module, but second to write it on the second memory module. So, the values will now be inconsistent.
I think you could put 3 light memory usage VMs with one heavier usage VM, giving more VMs (at greater CPU utilization) and allowing one of the VMs (using 1/4 of a core) to have half as much memory - but it the user needed more memory then get them to pay for more than 1/4 core; have them pay for 2 cores (that they don't fully utilize) to get double the memory. If you won't buy bigger DIMMs you have to recover the allocatable memory somehow.
The good (painful) news is that there are 512GB DDR5 DIMMs, and that the sweet (in a few years) spot will also be 2x of the DDR4 sizes, so you'll be able to get more memory (after the prices no longer eat the budget away). That means for 1/2M you could get 24TB of memory into the slots, if 2 CPUs can access that much; they don't try to save an address line.
That 12 channel and CXL (hopefully 2.0) is coming is expected speculation - they should do it, and not wait too long.
My theory is that the extra pins not accounted for by the above will go into moving the 2P interconnect via Infinity Fabric over PCIe into connect over an extra set of CXL 2.0 (for the encryption) lanes - freeing up more PCIe lanes; leaving 160-192 lanes as standard, instead of just 128.
More memory and bandwidth, more PCIe lanes, and CXL 2.0 (which is announced), along with more cores (and their 700W boost) will set them ahead across the board; except for single thread performance (and hybrid E-cores, so they'll be a close second for power; with enough difference in price to pay for the electricity).
"I think you could put 3 light memory usage VMs with one heavier usage VM, giving more VMs (at greater CPU utilization) and allowing one of the VMs (using 1/4 of a core) to have half as much memory - but it the user needed more memory then get them to pay for more than 1/4 core; have them pay for 2 cores (that they don't fully utilize) to get double the memory. If you won't buy bigger DIMMs you have to recover the allocatable memory somehow."
That isn't really how virtualization works. You have a bit of the idea right in that a VM will be placed onto a host that has the free resources. However, no one will be doing this by hand unless they have only 2 hosts. In VMware there is a tool called Distributed Resource Scheduler (DRS) that will automatically place VMs on the correct host in a cluster as well as migrate VMs between hosts for load balancing.
There is no way to give a system only 1/4 core. The smallest amount of CPU that is able to be given is 1 Virtual CPU (that can be a physical core or a hyperthread). I cannot tell you on which physical CPU that vCPU will be run. Until a system needs CPU power, that vCPU sits idle. Once the system needs compute it goes to the hypervisor to ask for compute resources. The hypervisor then looks at what physical resources are available and then gives that system time on the physical hardware.
As physical core counts have gone up it has gotten much easier for the hypervisor to schedule CPU resources without having the requesting system wait for said resources. When you used to have only dual 4c/8t or 8c/16t CPUs, you could easily have systems waiting for CPU resources if you were over allocated on vCPU. In this case a dual 8c/16t server will have 32 vCPU but you could have enough VMs on the server that you have allocated a total of 64 vCPUs. A VM with 4 vCPU has to wait until there are 4 threads available (with the Meltdown/Spector mitigations it would be 2c/4t or 4c) before it can get the physical CPU time. It can happen that say one system has 16 vCPU out of the 32 vCPU on the server and will be waiting almost forever for CPU resources. Since the scheduling isn't like getting in line, if only 8 vCPU frees up that 16 vCPU is left to wait while 2x 4 vCPU VMs get time on the CPU. The hosts I'm running all have 128 vCPU and that makes the CPU resource contention much less of an issue since at any time you are almost assured of free CPU resources. For example I have allocated over 160 vCPU to one server and never have I had an issue where VMs are waiting for compute. I would probably need to be in the 250+ vCPU allocated before I run into CPU resource contention. With these high core count server CPUs, the biggest limiting factor for number of VMs running on a host has changed from CPU to RAM.
From the RAM side it will take a long time until 512GB DDR5 LRDIMMs are available. However, I could easily see the 128GB DDR5 RDIMM being the most popular size, like 64GB is right now for DDR4. For a company like where I work which does small cloud hosting, going from dual socket 8 Channel 64GB DIMMs (1TB total at 1DPC) to dual socket 12 Channel 128GB DIMMs (3TB total at 1DPC) is a huge boost.
"There is no way to give a system only 1/4 core." Semantics - Put 4 VMs on one physical core. One example explanation found with one minute of searching: https://superuser.com/a/698959/
Not semantics at all. While I can have 4 different VMs each with a single CPU on one physical core, that doesn't mean they get 1/4 core. Here is an example with 4 VMs lets call the VMs A, B, C, & D. The host machine has a single physical non SMT CPU. VM A is running something that is using continuous CPU (say y-cruncher but only 25% CPU load). VMs B, C, & D want to run something on the CPU (say OS updates). Those VMs cannot each get 25% of the CPU all at the same time. They have to wait until VM A is done requesting that CPU access until 1 of those next VMs can request CPU for the updates. I cannot have the CPU running at 100% load with those VMs all running code simultaneously (in parallel for lack of a better term) on the CPU. The requests for CPU time are done in serial not parallel. Therefore you cannot give 1/4 of a CPU to a VM.
That link you gave is talking about something different. When you setup a VM in VMware, Hyper-V, etc...you are asked to specify the number of CPUs for the system. You have 2 options for giving the number of CPUs (Cores & Sockets). 99.9% of the time it doesn't matter how you do it. You say you are giving VM 4 CPUs it gets 4 CPUs. When you look at the settings in VMware, you see that is 4 sockets @ 1 core /socket. However, I can change that to 1 socket @ 4 cores/socket or 2 sockets @ 2 cores/socket, etc...The OS doesn't care how it is setup as it is getting the CPU you told it was going to get. Where that matters, is in some software licensing is done Per Socket so you might get charged a lot more if your software thinks it is 4 sockets vs 1 socket for licensing. This does not mean I'm giving one system 1/4 (0.25) a CPU vs 4 CPUs.
FYI I am an expert in virtualization. I have my VMware Data Center Virtualization Certificate and am a VMware Certified Professional and I run a data center. I have been the lead VMware Admin for the last 3.5 years.
> When you look at the settings in VMware, you see that is 4 sockets @ 1 core /socket. > However, I can change that to 1 socket @ 4 cores/socket or 2 sockets @ 2 cores/socket, > etc...The OS doesn't care how it is setup as it is getting the CPU you told it was going to get. > Where that matters,
Where that matters is memory latency! Also, not overloading the interconnect bus between the sockets.
Performance-wise, it's nearly always best to minimize the split between sockets. The only time that doesn't hold is if there's a job involving multiple threads that are each bandwidth-intensive and don't share data with each other. This is fairly rare.
"Where that matters is memory latency! Also, not overloading the interconnect bus between the sockets."
In VMware that doesn't matter. It is going to assign the CPU based on what is available at any given second. That means a VM with 4 vCPU might get Core 0 & 12 on CPU 0 and Core 14 & 23 on CPU 1. There is no way I can park a VM on specific sockets only. The Socket/Cores selection in VMware is nothing more than telling the OS how it is getting its CPUs.
This is where you replied, and your last comment where it's come to. Yes you should write a blog: VMWare Admin and Certified Professional discovers problem that can't be solved.
If Intel stays with 8-channel DDR5 on their platform for EMR/SRF, it could be that they're planning on CXL-attached memory as the main way to add capacity & some additional bandwidth. This could both be more cost-effective and scale to larger memory capacities.
Any bandwidth shortfall might also be offset by in-package HBM, at least for higher-end SKUs.
It'll be interesting to see if Intel supports hybrid EMR + SRF configurations. If any of their customers are interested in Big + Little combos, that will be an easy way to experiment with it (i.e. one of each CPU type, in a dual-socket server).
I think it would be more interesting if they disclosed more details. I *do* like seeing them move more aggressively with their E-cores. Improving power-efficiency is a good thing.
Any news about replacement of LGA-2066 Xeon W-22xx? They got no update by IceLake for obvious reason, but with SPR new goldencove it would be great to see something new in this line too...
Also, because their HEDT and Xeon W CPUs tend to have the same cores as the server Xeons, whatever is holding up their SPR server products is likewise blocking their workstations.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
144 Comments
Back to Article
Silver5urfer - Thursday, February 17, 2022 - link
Ian: Which is a good thing.100% agreed with that Ian. DC do not like this hybrid nonsense, esp if we look at VMWare and their licensing systems plus how the VMs and Containers work with the Hypervisors on top of the reduced instruction set is a mess. Plus a scheduler must operate at such hyperscale it will be a gigantic waste of money.
Intel had to do this on Mainstream because of their IPC target and SMT performance target vs AMD, who are very much ahead in SMT specifically. Also the whole LGA1700 CPUs are got and high density heat far more than RKL which was too hot. So they had to axe the P cores to make them Clock at 5GHz and get the maximum SMT performance too. Now they knew that E cores will get them that performance needed they segment them with Raptor Lake now, having more Cores on E side to get that SMT competitiveness vs upcoming monster Zen 4.
As for Xeon they do not have to clock at such high frequencies plus the SMT performance is already there due to that Multisocket system and other bells and whistles.
Finally on the AMD vs Intel side, looks like Intel will be more competitive with AMD when their E cores Xeon comes out, rough guess. Also this move is done by both AMD and Intel because they want to stop ARM Server processors which do not have SMT technology but high density in Cores.
Good to know some roadmap. All I want to see is a real successor to X299. AMD pathetically axed the Threadrippers horribly, Gen 1 got 2 CPU refresh but 3rd was purposefully axed to get more cash on the sTRX socket. And now no Zen 3 based Threadripper nor a damn 3DV enhanced Threadripper (on top of how they didn't care for 3DV on AM4 because Zen 4 needs to be strong and more sales from new chipset / socket AM5).
ballsystemlord - Thursday, February 17, 2022 - link
I share your frustration at Threadripper's loss.Abort-Retry-Fail - Friday, February 18, 2022 - link
Threadripper Pro is competing with itself. I know - I've been breaking one in for 9 months. There are no limits for the TR Pro in content creation for performance and efficiency in HEDT. Castle Peak sWRX8 is 'Zen2' - future variants will be 'slobber-knockers' to the competition.
No offense, but Chipzillah continues to shoot itself in both feet. Everyone knows 'Intel 7' is 10nm +++ 'Enhanced SuperFin' regardless of marketing and 'branding'
drothgery - Friday, February 18, 2022 - link
Yes, but most people don't know TSMC N7 being routinely called "7nm" is branding too.nandnandnand - Friday, February 18, 2022 - link
The only question is whether Intel 7 is comparable to TSMC N7, Intel 4 to TSMC N4, etc.kwohlt - Sunday, February 20, 2022 - link
Most people don't know that "10nm +++" is also marketing and branding. They're just names. If Intel 7 is comparable to TSMC 7nm, than they're competitors and the rename makes sense. Pointing out that the original plan was to call it 10nm ESF is rather pointless and borderline misleads people into thinking the node is a generation behind, technologically (it's not)Spunjji - Tuesday, February 22, 2022 - link
Indeed, it's not. After 6 years Intel finally massaged its 10nm process into something that's roughly as good as TSMC N7, so the rename makes sense as a marketing break from the absolute failboat that was the 10nm / 10nm(+) / 10nm+(+[SF]) / 10nm++(+[ESF]) nomenclature.mattj0707 - Tuesday, February 22, 2022 - link
It IS in fact on par with TSMC's 7nm and in some areas, it is better than TSMC's 7nm node. In two areas specifically, Intel 7 outpeforms TSMC. That is transistor density (Intel 7 node is more denser than TSMC) and Intel 7 outperforms TSMC 7 in transistor leakage and drive current. That is exactly why Intel's Alder Lake chips are easily boosting to 5 GHz on mobile w/ little power consumption and their desktop chips are boosting to sustained clocks above 5 GHz. And Intel's new release of their KS skus can achieve clocks on multiple cores upt to 5.5 GHz sustained. TSMC N7 could never reach that sort of drive current and those clocks. It was only recently that AMD's Ryzen chips were able to briefly boost to 4.9 or 5 GHz. TSMC's N7 is a good node, especially for power efficiency, but it is just a fact that Intel 7 outperforms it.mode_13h - Wednesday, February 23, 2022 - link
I assume most know this, but clock speed isn't just a function of the process node. It also has to do with circuit design. A chip designed with longer critical paths will have less ability to reach high clockspeeds, but the tradeoff is that it's probably be doing more real work per cycle.And this might reveal a downside of AMD's strategy of sharing chiplets between desktop and servers. Since servers place a greater premium on power-efficiency, that keeps downward pressure on clock speed and therefore greater incentive to utilize longer critical paths. Meanwhile, Intel tweaks their core designs and fabs completely separate dies for server CPUs vs. other markets.
mattj0707 - Tuesday, February 22, 2022 - link
Intel 7 is definitely comparable to TSMC N7. Node names don't have any correlation to the actual features, performance or efficiency of the transistors themselves. We know for a fact that Intel 7 outperforms TSMC N7 in a few areas like transistor density, leakage, and drive current. So regardless if its called Intel 7 or 10 nm++++, it's performance is on par with TSMC's N7 so who cares what it's called at the end of the day. The performance is what matters, not the marketing name.duploxxx - Friday, February 18, 2022 - link
The only reason there is no new or enhanced threadripper is due to OEM, they did not want to invest in it for there workstation lines and prefer to stay with the "financial funded intel" Xeon WS lines... Easy money for OEM and users are the ones that have no choice... its called abuse of power, the usual Intel stuff. So the threadripper market remained a niche and this is something difficult for AMD as they have so many EPYC delivery requests.Mike Bruzzone - Friday, February 18, 2022 - link
duploxxx, Thread ripper over the last four years actually sold near equivalent Epyc; 13,676,597 unit and 14,269,999 respectively. Niche, well, E5 16xx product generations range 2.5 M to 5 M which is what TR displaced. E5/Scalable 2P workstation the components alone regardless of integration into to workstation my data only covers the components are 17x TR volumes. TR all generations has been a profitable niche for AMD. mbkwohlt - Sunday, February 20, 2022 - link
Right, but Epyc is even more profitable than TR, and if AMD is struggling to meet demand on the more important Epyc chips, it doesn't make sense to divert capacity to make Zen 3 based TRs instead.Mike Bruzzone - Sunday, February 20, 2022 - link
kwohit, I acknowledge agree AMD is going for commercial margin #1 Epyc #2 accelerator refocusing to GPU acceleration, Xilinx there too and deemphasizing AMD consumer GPU gaming when commercial [direct] customers write their own code but would not be surprised to see HP and Lenovo TR5K workstations showing up in secondary market 24 months from now and then there are 1P and 2P Epyc certainly workstation worthy at top of the frequency stack. Think about some Milan entering Genoa provided to HP and Lenovo as AMD MIlan contract completion award sustaining their own TR niche strongholds on direct customer sales is among the reasons I believe T5K does exist but not talked about. TR5K may also be needed as a March/April '22 AMD gross margin support parallel Rembrandt main deck dependent 3D volumes and acceptance. I have 15 M octa 3D and 1 M TR5K. mbCalin - Friday, February 18, 2022 - link
"Ian: Which is a good thing."The datacenter customers do want their E-cores and P-cores.
As such, they will buy (depending on their wants) 100,000 processors with E-cores only and 100,000 processors with P-cores.
The "let's have some E-cores and some P-cores in the same chip" is not on their radar.
schujj07 - Friday, February 18, 2022 - link
He didn't say that datacenter customers don't want E-cores and P-cores. All he said was that it isn't wanted on the same CPU socket like the consumer space.drothgery - Friday, February 18, 2022 - link
Largely an artifact of everyone going all-cloud all the time, I'd guess. A server owned and operated by a not very global business could use an E-core block to keep the lights on when it needs to be on but isn't very busy, but a cloud service provider is going to want to sell those unused cycles to someone else.Mike Bruzzone - Friday, February 18, 2022 - link
I am fairly certain on production economic assessment TR 5K exists up to 1 M units, that HP and Lenovo sold workstation customer direct and will not be seen until they show up in the secondary market. This put the margin in AMD and OEM's pocket rather than the channel who likely would have inflated their own TR5K margin take. In both instances AMD avoids product inventoried and/or sitting on the shelf at Zen 3 run end production 'overpriced' while also reducing channel financial ability to purchase something else that moves in higher volume including from AMD.Ampere 3090ti is in the same position, sold direct. The Kingpin card is a commercial product in consumer disguise and requires the cooling infrastructure in place to integrate this subsystem into a cluster. Think white space limited high-density municipality located high frequency financial trading 24kW racks where the customer understands and has the cooling infrastructure in place to implement the 500W dGPU. Again in this instance, Nvidia with contract IDM take the margin and by keeping the card out of the channel and free from channel inflated pricing prevents whatever volume of 3090ti from sitting on the shelf, overpriced, and when they are sold the potential that in this example channel's inflated margin take will go to procurement of product other than Nvidia.
AMD and Nvidia are both controlling how margin from the sale of their products comes back to them. And by keeping TR5K and 3090i out of the channel insure margin earned on those products goes back to them, and not Intel, who has benefited primarily for channel Ampere inflated margin take general funding Intel new procurements and not generally Nvidia new procurements.
Mike Bruzzone, Camp Marketing
Dr_b_ - Thursday, February 24, 2022 - link
Agreed, where is HEDT. not enough PCIe lanes.ballsystemlord - Thursday, February 17, 2022 - link
So, will they get AVX-512 working on the E-cores, or will in-generation server differentiation start to occur based on which ISA support you have?Kamen Rider Blade - Thursday, February 17, 2022 - link
It'll probably be a P-core only feature.E-cores don't have that much die area to work with.
AVX-512 eats up ALOT of die space.
IntelUser2000 - Thursday, February 17, 2022 - link
The server E cores are different from client E cores.Remember Xeon Phi? It took the Silvermont Atom cores and *heavily* modified them. Added AVX-512 too.
Also, AVX-512 on the E cores are going to end up smaller. The Xeon Phi AVX-512 was much smaller than the one on Skylake.
whatthe123 - Thursday, February 17, 2022 - link
if they're looking for MT throughput, shaving some cache/bandwidth might be worth it for AVX-512. It is actually more performant per watt.Calin - Friday, February 18, 2022 - link
But it is dead space if you don't use it, and might be replaced by cache, improved logic units and so on.It could make sense to put AVX-512 only in P-cores Xeons, and leave the "efficiency" market to some other technology (accelerators, GPUs, FPGAs, ...).
Kevin G - Thursday, February 17, 2022 - link
AVX-512 support does need additional and wider registers to function. However, the instructions themselves were designed to be cracked into 256 bit chunks for the execution units to handle. The benefit is that no additional die space has to be used in execution units but the catch is that peak throughput does not change between AVX2 and AVX-512 as the same amount of work is being done per cycle using 256 bit wide SIMD units. There can still be some performance increases due to the additional registers and some efficiency gains with the new instructions but no where near what doubling the execution width would do for SIMD heavy based code.Intel has made a mess of their ISA and it is time for them to clean things up with some standardization.
JayNor - Saturday, February 19, 2022 - link
There were more registers and wider registers added for avx512https://en.wikipedia.org/wiki/AVX-512#Extended_reg...
JayNor - Saturday, February 19, 2022 - link
Intel provides 512 bit fma units for avx512... so they aren't cracking these into 256 bit operations for the FMAs. Their high end server chips have dual avx512 FMA units per core. I've seen reports that the SPR chips, even down to 8 core versions, will all have dual avx512 FMAs per core.https://www.intel.com/content/www/us/en/architectu...
Mike Bruzzone - Sunday, February 20, 2022 - link
mbMike Bruzzone - Sunday, February 20, 2022 - link
mbkpb321 - Thursday, February 17, 2022 - link
I'm kinda surprised the E core only Xeon is so far out. After seeing the performance of 4 E cores in roughly the same die space as one P core for the consumer chips on multithreaded stuff it seemed like such an obvious move. I'd expect the E cores to do even better in servers as they can't run the P cores as high up the power/performance curve in the server chips so they'll loose some of their clock speed advantage.Calin - Friday, February 18, 2022 - link
On the other hand, more cores usually put more pressure on the memory subsystem. Maybe using 4x the E-cores instead of the P-cores is too much.Remember that, when tasks takes twice as long to complete you have twice as many tasks "in flight" and you need twice the memory.
So, a "machine gun" approach of nibbling on many tasks is less efficient (in average time for execution and average memory use) than a few big cores.
As always, your mileage may vary.
Mike Bruzzone - Friday, February 18, 2022 - link
mbHifihedgehog - Monday, February 21, 2022 - link
Please ban Mike Bruzzone. He just keeps replying mb to everyone. Do we really need to see your initials everywhere, Mike?Mike Bruzzone - Monday, February 21, 2022 - link
Hifhedgehog - Hi Hedgehog, when I read someone's observation and find it important, first to me as an analyst, and second to my Federal Trade Commission audit enlisted by federal attorneys retained by Congress of the United States, by my initials, my intent is to notify the author I found value in their observation and that I read it.You observe I also respond, raise inquires, collaboratively contribute, constructively confront as we are here, more or less, I don't support boycott, and give credit where I see credit due. I admit I initialed one comment twice was a typo. I can't respond to all observations I find valuable, but I can acknowledge when I find value I recognize the contribution.
Thanks for your observation. mb
mode_13h - Tuesday, February 22, 2022 - link
> to my Federal Trade Commission auditThis is interesting. If there's any public information about this audit, feel free to give us a link. Entirely up to you, but it would be education for some here.
I think it's valuable for the public to gain a greater understanding of the role played by the federal government in industry and the economy. Most people don't understand how instrumental it is to the markets and industries that we all take for granted.
> I also respond, raise inquires, collaboratively contribute, constructively confront
I find your posts informative and informed, even if they're often so deep and outside my domain that I often just skim them. Thanks for contributing.
Mike Bruzzone - Tuesday, February 22, 2022 - link
Mode_13Thank you and appreciate what I observe as your sense of enterprise across and incorporating practice areas.
My general comment string is here I rely on Seeking Alpha;
https://seekingalpha.com/user/5030701/comments
My analysis is here also at Seeking Alpha who has changed blog spot availability and I'll address that in the future on how I post primarily, slide sets;
https://seekingalpha.com/user/5030701/instablogs
I comment from time to time and simultaneously post data on The Next Platform simply search Mike Bruzzone and Next Platform.
I also comment on Semi Wiki.
Exhibits validating my FTC and USDOJ credential can be found on Pacer associated with this Federal Court of Claims case matter: 1:21-cv-01261-RTH. Otherwise, I'm engaged in numerous litigations with Intel pursuant Intel Inside price fix recovery and other issues associated with tech gang land are searchable my counsel is patience and persistence in the face of Intel associate network falsities.
Pursuant Intel I can vouch for monopoly remedial advances beginning at Krzanich and continuing under Gelsinger but far from complete pursuant Intel legal department and legal network, on Intel generally recovering from monopolization, sabotage and robbery, both consumer associated with Intel Inside price fix and from the Entity itself on employee's engaged in cartel product laundering thefts primarily from DCG.
Mike Bruzzone, Camp Marketing
GeoffreyA - Wednesday, February 23, 2022 - link
I also appreciate your contributions, Mike, even though they're largely out of my grasp of understanding.Mike Bruzzone - Thursday, February 24, 2022 - link
GeoffreyA, you're welcome I'm open to inquiry, observation, adds, point - counter point anytime. mbKamen Rider Blade - Thursday, February 17, 2022 - link
Now, for Intel to bring PURE P-core & E-core CPU's to the Desktop product stack and STOP being stingy on the core counts.Leave Hybridization for the Mobile market.
dontlistentome - Friday, February 18, 2022 - link
P core only here as some of the i5s. E core only (I want this for the low power) on the way soon branded as Pentium/Celerons. 8 cores apparently. Which is what i'd like as long as they don't bork the GPU bits too much.8x E plus a reasonable GPU that can drive 3 decent screens is all I want for my desktop, especially if it can run fanless in a heatsink case.
Kangal - Friday, February 18, 2022 - link
Hybrid Processing doesn't make much/any sense for Servers, Desktops, and even Office PCs; all devices that are hooked to the wall.It does make sense for portable computing. Where you're trying to find a balance between performance and energy usage.
Where we have good benefit for large and thick laptops, we instead see massive benefits on small and thin phones.
So hindsight 20/20, but Intel should have started working on it back in 2013 or so when it saw it was viable for ARM architecture.
So around 2015-2016, we should've had Intel 15W APUs built on 7nm nodes, with a 2+4 design. Where they would have (i7-6600u) Skylake and Cherry Trail (x7-8750) meshed together. This would've made them more competitive against the upcoming Ryzen architectures. Even Apple wouldn't have made the leap as soon as they did. Basically it would've bought Intel more wiggle room and time to implement their new architecture (P/Very Large cores), which should have been a Server-First approach. And it would ensure Microsoft puts the work in, to have these hybrid computing supported properly in software. Especially when trying to implement it from laptops to desktops.
Now?
It's a mixture, and I actually think the Ryzen 6000 approach is better. And it pales in comparison to macOS and the M1, M1P, and M1X chips. Whilst, the server market is sliding towards AMD, it looks like it might be overtaken by ARMv9 solutions in the next 5+ years.
Wereweeb - Friday, February 18, 2022 - link
It's nonsense to think that hybrid cores are just for perf/watt. They're overall a more efficient architecture.Thanks to Amdahl's Law it's very good to have two-four performance-focused cores to drive the main thread of the main process(es). But the rest should continue to be PPA-balanced cores - and today the Atom-derived E-cores appear to have better PPA.
Only the ARM equivalent of "Little cores", like the Cortex-A55, belong only on mobile devices. And that's because they're optimized mostly for power. Intel only has X1-like and A78-like cores (I.e. performance-focused and PPA-balanced) so they're already doing their job correctly.
And yes, AMD should indeed split their core designs into a performance-focused core
Wereweeb - Friday, February 18, 2022 - link
(continuation) and a PPA-focused core. That would allow them to boost the single-thread performance to better handle the main threads of processes, while not losing sight of the need to balance the PPA of the rest of the processor.nandnandnand - Friday, February 18, 2022 - link
And it would allow AMD to take advantage of the optimizations being done for Intel. x86 games and applications will just "know" what to do with heterogenous microarchitectures after a while.Kangal - Sunday, February 20, 2022 - link
First of all, you misunderstood what I wrote.I didn't insinuate that Intel's E-cores are good/bad. I wrote that the combinations of P+E is bad for server duties (ie Hybrid Processing). Having a setup that is Homogeneous Processing makes much more sense for servers, and even ARM figured this out early. It fixes a lot of bugs, issues, and security flaws you may have on the software side... especially knowing that you're catering to multiple tasks and multiple users. And to add to this, where Hybrid Processing is great for computing where energy is a limited quantity, you don't really have this issue when it is connected to the grid. I'm not even talking about thermals, but just the access to electricity.
...now a little bit of background:
Intel has been big about recycling their cores.
From the primitive Pentiums, to more advanced Pentiums, to rebranded Celeron cores, and further miniaturised as the Atom cores. These are analogous to "very small" Cortex A53/A55/A510 cores. I think Intel has finally put that architecture out to retirement.
I think their early Core2, evolved into Core-i, and then to Core-i Sandybridge, and then morphed further for Core-i Skylake. The subsequent iterations have been a refresh on the Core-i Skylake architecture. These are analogous to "medium" Cortex A78/A710 cores. I read that it was this microarchitecture which was adopted by Intel, and then further miniaturised, which resulted in the new Intel E-cores. These E-cores are more analogous "small" Cortex A73 cores. Based on that analysis/rumour, I don't see too much improvements coming to them in the future.
Intel's latest "very large" cores are huge. The new P-cores are based on an entirely new microarchitecture. So it's understandable that they won't be too optimised, and will be leaving both performance and efficiency on the table. In subsequent evolutions it should catch up. That's been the historical precedent.
...that was a mouthful, but needed to be said first...
So with that all in context, we are in the transition phase at the moment. There's the current products of Intel servers based on their old cores (Skylake-variant), upcoming servers based on a large array of E-cores, and the premium servers using a smaller array of large P-cores. The market will still be dominated by AMD, who's "large cores" are more analogous to the Cortex-X1/X2, and they will offer a better balance between the options. In time, you will find Intel throws more money, time, effort at evolving their P-cores their bread and butter. And these advances will catch-up or surpass their solutions using E-cores, that much is obvious.
It is likely that the server market will get busy, and most or at least the lower-level stuff, will be lost to solutions built on ARM v9. So the Intel E-core servers will become obscure, and likely phased out by Intel themselves. AMD will be fighting for the top crown with their next-gen processors (Zen 4/5) using newer microarchitecture and techniques like 3D-Cache. Intel may still be able to grasp the top-end premium server market using new-generations of their P-cores. So that's what the future is shaping up to be. But forget about a combined Hybrid Processing server either from ARM, Intel, or AMD.... those will be designed for portable devices like outlined above.
GeoffreyA - Sunday, February 20, 2022 - link
"I think their early Core2, evolved into Core-i, and then to Core-i Sandybridge, and then morphed further for Core-i Skylake."Depending on how one looks at it, the current P cores (or Golden Cove) are in an unbroken descent from the P6 microarchitecture, structures being widened and bits and pieces added over the years. Sandy Bridge, while still under this line, had some notable alterations, such as the physical register file and micro-op cache. Indeed, SB seems to have laid down how a core ought to be designed; and since then, Skylake, Sunny, and Golden Cove haven't done anything new except making everything wider.
The E-cores descend from Atom, which was an in-order design reminiscent of the P5 Pentium, with some modernisations. SMT, SSE, higher clocks, etc. Along the way, they've implemented out-of-order execution and gradually built up the strength of these cores, till, as of Gracemont, they're on par or faster than Skylake while using less power. People laugh at this idea but I believe that this lineage will someday replace the P branch. (Or perhaps an ARM or RISC-V design will supersede both.)
Qasar - Sunday, February 20, 2022 - link
" The new P-cores are based on an entirely new micro architecture "i doubt that, IF they were an entirely new micro architecture, would they not be Gen 1, and not Gen 12 ?
Jp7188 - Thursday, February 24, 2022 - link
I don't disagree with you, but I do disagree that Intel's motive for hybrid was efficiency. They had to do it to compete. I have a 5950x and a 12900k both set to unlimited power, both on the same central water loop. In one of my workloads the 5950x uses 123watts and hover around 55C; the 12900k in the same workload uses 317watts and is constantly riding the thermal throttle at 100C.AMD is already waaaaay more efficient without moving to hybrid. Why should they bother with the complexity?
mode_13h - Thursday, February 24, 2022 - link
> I do disagree that Intel's motive for hybrid was efficiency.In the case of desktops, the benefit of the E-cores isn't power-efficiency, but rather area-efficiency. In the same area as 2 P-cores, Intel added 8 E-cores. Given a roughly 2:1 ratio in P-to-E performance, this should yield the performance of a 12 P-core chip at the area (i.e. price) of only 10 P-cores.
Also, if you look at the marginal power added by those E-cores, I do think there's a good case to be made that they burn less power than 4 P-cores would.
mode_13h - Thursday, February 24, 2022 - link
TLDR; it's something Intel did to provide better perf/$, if not also perf/W.nandnandnand - Friday, February 18, 2022 - link
It's here to stay, whether you like it or not, and it will be better, at least after it has been around for a few years.ksec - Thursday, February 17, 2022 - link
I was expecting PCI-E 6.0 to be 2024/2025 timeline. The earliest estimate to be 2026 seems to be longer than usual.It is unfortunate AMD dont have an answer to Sierra Forest. At least not as far aside am aware of.
ksec - Thursday, February 17, 2022 - link
( AMD Bergamo to me is Cache Size variant of Zen 4, which is different to what Intel is doing here )sgeocla - Friday, February 18, 2022 - link
AMD has 128 cores Bergamo coming next year and 256 core Turin in 2024, at the same time as Sierra Forrest, presumably if Intel can execute and not delay this like all the other launches.TSMC's execution has been best in class and so has AMD's in recent years in the DC.
Zen3 core is double the size of Intel E cores but Zen4D is going to cut down on the cache size and probably AVX-512 and other things cloud hyperscalers don't need so Zen4D is going to be AMD's E cores only with much higher IPC and better energy efficiency.
Sierra Forrest is a response to what AMD is doing and not the other way around and it will be released after AMD's chip as well.
And if Intel has further delays of can't make enough volume of their process has a hiccup, then it will be a complete disaster.
They are betting the farm with ramping capex so much as to have negative cash flow on top of large amounts of debt.
schujj07 - Friday, February 18, 2022 - link
From what I have read Bergamo is expected to have performance about that of Zen3. That would put the IPC a good 30% higher than the current Gracemont E-cores."AMD Bergamo is going to be the cloud variant. On a call before the event, Mark Papermaster and Forrest Norrod said that these are different chips that leverage the same ISA, but that there are cache tweaks here to get to 128 cores. The idea behind Bergamo is that cloud computing workloads are different than the traditional workloads and so AMD can optimize for more cores per socket instead of optimizing for HPC performance. AMD also is looking at the Zen 4c to provide better power efficiency per core. If we look to Arm vendors, the Zen 4c is seemingly aligning AMD’s offerings more towards a customized cloud CPU product like the Ampere Altra (Max) instead of a traditional large core CPU." https://www.servethehome.com/amd-bergamo-to-hit-12...
While that isn't exactly an Intel E-core, Zen was already vastly more power efficient that Core. Therefore it isn't out of the question that they could have E-core power levels but Zen3 performance. That said we won't really know until it is released later this year.
Mike Bruzzone - Sunday, February 20, 2022 - link
Impact of Intel Plant/Equipment/Construction investment impact here, look in comment string for the sufficiently detailed financial production assessment posted on January 30;https://seekingalpha.com/article/4481960-intel-q4-...
Best outcome Intel succeeds in gaining process leadership and IDM + foundry reconfiguration
Worst outcome Intel reconfigures under Chapter 11 bankruptcy in the middle of construction.
mb
mb
lemurbutton - Thursday, February 17, 2022 - link
Basically, Intel won't be competitive in the server market until 2024 when it will have node parity with AMD and probably core-count parity as well with its E-core product.schujj07 - Friday, February 18, 2022 - link
Before 12th Gen was released people were expecting it will take Intel until 2025 to get parity on desktop and server. While 12th Gen is impressive, AMD has been dragging their hands on Zen3D probably because they have no reason to release the CPU. They are selling every CPU right now and Zen3 is still competitive with 12th Gen, all be it a bit slower, so no rush on things.Mike Bruzzone - Friday, February 18, 2022 - link
schujj07 ; under the covers report here;Yes, AMD sold all AMD could sell from 676,000 wafers in 2021; 119,108,089 components is a company record averaging 30 M per quarter up 55.8% from 2020.
Zen 3D is hot for the package area to pull off the heat on potentially the material composite vis-a-vis composite relied for TR / Epyc licensed from Fujitsu for heat dissipation. 3D definitively requires a cooling solution. 32 GiB SRAM slice adds + 12W you can figure it out from Epyc cache TDP variance and 5800X OC's hits 147W and slightly higher. 3D at 105W is a phantom. 3D is also expensive + $45 from TSMC that's to AMD $210 before OEM mark up x1.55 is fairly traditional for AMD to the OEM but it can go up to x2. I estimated AMD made 15 M 5800X 3D (no 5900 on heat primarily) and am starting to wonder where all the AMD 3D hype has gone because there hasn't been a word in weeks. mb
_abit - Saturday, February 19, 2022 - link
LOL, the fud is strong with this one:3d cache is laid over the region of existing die cache, which doesn't get all that hot as compute logic does. So no, it will not result in hotspots, and si is a fairly good conductor of heat. The previous hotspots retain roundly the same thermal contact and conductivity capacity. There is probably no more than a 10% overall increase of heat, which may well be offset and then some by amd's new power efficiency features in zen 3+. Who knows, it may even run cooler. I assume it will be 6nm indeed, but even at 7nm, it is doable.
If you want zen 3d, you probably want games. And if you want games, there's no point overclocking the cpu manually, you may even lose boost clocks. So no, it matters not what 5800x OC hits, even more so if the 3d version is zen 3 +.
45$ to bind chips in the millions scale sounds way too much. Where are you getting those numbers?
What OEM markups? And if your vendors have it, why are you bringing that against amd? Shill much?
Mike Bruzzone - Sunday, February 20, 2022 - link
abit, thanks for your thermal observations. I trust 3D can be inexpensive data base and modeling space not just a game toy. On the thermals I follow you and we shall see and I acknowledge your thoughts and we can pick this conversation up what there are knowns. On the package area that is not TR package area for heat transfer out is my concern the package solution was not entirely thought out. Lid off 5900 engineering sample held by Dr. Su for all to see, yea, because in the lab it's being sprayed with freon or whatever is relied today from a spray can.On power efficiency features good thought. 6 nm, V5x shrink? Maybe.
Where's the $45 come from? TSMC cost : price of adding a 32 MiB SRAM slice to 7 nm 5800X.
SRAM slice is 36 mm2; $5.40 for fabrication, x2 operating cost to dice and test; markup on marginal cost = marginal revenue = price = $10.80 NOW + $10.80 package and final test = $21.60 finished good then x2 markup = $43.20. It's the mark ups that add to price and AMD is not an Apple with reoccurring revenue products.
Note Vermeer across full run grade SKUs priced to AMD, the 2 ccx dodadeca and hexadeca are expensive to produce and for all core grade SKUs the average TSMC price to AMD is $163 for V5x, essentially x2 Intel design production cost on TSMC 7 nm fabrication in parity with iSF10/7 but there is still packaging and finished goods mark up; marginal cost $81.71 + marginal revenue $81.71 = $163,43 now + $45 thereabouts for 3D = $208 to AMD. Otherwise TSMC can produce a one ccx V5x for around $39 hard cost but then add OpEx and mark up at x2 to AMD. At V5x end run on average octa + i/o might have fallen $140 to $154 but TSMC increases pricing. Matisse 3x full run averaged $131.32 but now Vermeer 5x is $163.43 + 24% and maybe AMD took a couple points more too.
If I'm within 10% that's good enough for government work. However, I will point out a material's engineer who costs on the molecular weight of inputs will laugh at some of my tried and true Intel micro production economic methods.
Mike Bruzzone FTC Docket 9341 Intel consent order monitor. Docket 9288 and 9341 federal attorneys enlisted discovery aid and micro production economics. Former Cyrix, ARM, NexGen, AMD, IDT Centaur employee or consultant.
_abit - Sunday, February 20, 2022 - link
It is a mystery then how amd managed to catch up with intel's margins with all them inflated production costs ;)Mike Bruzzone - Sunday, February 20, 2022 - link
abit, AMD, large cache applications performance strategy for the price premium AMD always seeks to make sustaining AMD gross incorporating by necessity TSMC foundry mark up where AMD needs to stay 1 to 1,5 nodes ahead of Intel to charge a high (say a premium) price, for component area performance advantage, while continuing to make up TSMC margin within AMD price. That's why AMD has to be selling Genoa as of q3 2021 on iSF10/7 reaching cost parity with TSMC 7, AMD had to move to the next node.On a design production basis that incudes OpEx Intel cost is 54% less than TSMC price to AMD and on a manufacturing cost basis 77% less than TSMC price to AMD for desktop components albeit Intel cost range is similar TSMC at the same 7 nm process node.
mb
Mike Bruzzone - Sunday, February 20, 2022 - link
abit, AMD has not caught up with Intel margin's yet; q4 at 50% and 53.6% respectively but getting closer. Intel got caught in q3 and q4 having to bundle in a lot of minimally at cost (maybe some freebie too) product into sales deals to clean out Xeon and Core inventory that was rotting on Intel shelves. Intel offed it to OEM dealers and channels. Core line only earned Intel variable cost in q3 and q4. And Intel burping slack surplus product caught up to AMD in q4 where AMD consumer product was also 'forced' down to earning only its variable cost coverage. Only Xeon and Epyc margin paid ahead in q4. mb_abit - Tuesday, February 22, 2022 - link
You are pulling stuff out your read end. Amd has the advantage of not having to bother about process development. TSMC has a lot more customers, clients and output, so its lead in process RD is understandable.All intel has to do to be in ruins is fail another node. Their targets for 10nm were unrealistic, and it took intel years to get over that. Well, their targets for 7nm are just as unrealistic, so intel has a big chance to stumble once more by trying to get ahead of itself. Renaming things doesn't change anything.
Furthermore, intel is heavily leaning on advanced packaging technologies, where amd is more conservative. Even if intel does its own manufacturing at 100%, that's still a lot more additional cost and yield issues. It is yet another huge risk of messing things up.
Last and certainly not least, in desperation that it struggles to make a good cpu uarch, intel is aimlessly spraying new product plans in all directions. Rather than focusing on its weakness, intel is desperately trying to shove the market a bunch of stuff nobody asked about, as if that's a substitute to what intel's missing. Diverging efforts will only dilute and diminish intel's ability to produce quality where it actually matters. Intel is not listening to what clients are calling for, instead is pushing to shove them with stuff that makes no sense.
Sorry to break it to you, but intel is not out of the woods yet. It may be another couple of years before it produces a strong enterprise cpu. Amd's sole limitation was production capacity, and having surpassed intel by market cap puts amd in a position to book a lot more capacity, in the absence of an adequate enterprise solution from intel, and with higher output, amd may well double its server market share annually and capture 50% of it.
Mike Bruzzone - Wednesday, February 23, 2022 - link
"Amd has the advantage of not having to bother about process development"; yes"TSMC has a lot more customers, clients and output, so its lead in process RD is understandable"; TSMC has mass, leverage and a lot of legacy equipment cash cow, much more than Intel.
"Intel targets for 10nm were unrealistic", agreed, but Intel was also ripped off for $350 billion in hard cost losses over the last decade that could have funded R&D, PE&C and Intel early retired capable employees. Ultimately Otelinni regime sabotaged Intel and the Board participated in that sabotage and theft from the Entity.
"Their targets for 7nm are just as unrealistic" SF10/7x is in parity with TSMC 7 cost wise and TSMC has shown 5 nm is an incremental node on design process and tools. Could Intel screw it up? We'll know soon but 5 is incremental. Gelsinger so said 5 nodes in 4 years on tick tock is 2.5 nodes. But is 5 nodes on Rocks doubling of cost every node presents hurdles and potential deterrents.
"Furthermore, intel is heavily leaning on advanced packaging technologies" Intel is ahead in SIP and wants to sell package as a service between TSMC Chandler and Intel Rio Rancho. Ultimately this is about defining next generation of automated chip pack and materials handling and who best to do that SO Intel gets its way, TSMC. Front end has to mesh with backend production and materials handling wise.
"Intel is aimlessly spraying new product plans in all directions" when hasn't Intel but after Rocket, Cooper and Ice I suspect Intel has rethought its customer advance audits and I know Intel is cost optimizing which means getting the product right and eliminating waste; unnecessary product and surplus. On Intel division investments I have no comment on no or to little data.
I think Intel is out of the woods but has not won the race.
"Amd's sole limitation was production capacity", agree. 676,817 wafers in 2021 produced 211,207,983 good die both octa ccx and APU plus 54,786,409 i/o for 118,917,697 that in the end I tallied upwards of 119 M which is a corporate record.
https://seekingalpha.com/instablog/5030701-mike-br...
"Amd may well double its server market share annually and capture 50% of it"
On Epyc high volume determination AMD saw a doubling in production volume from 2020 to 2021 minimally plus 82.6% and corporations plan on round numbers so AMD's aim was to double Epyc production in 2021 and likely did.
From a production standpoint where Epyc Milan and Xeon Ice lake are run end entering Genoa in risk production since q3 and Sapphire Rapids in risk production because Intel never waits at xx% whole product on production volume here stated last two quarters on financial reconciliation.
Commercial production share
AMD = 17.09%
Intel = 82.91%
Adding AMD Rome to Milan v Ice Lake only back to market share on channel
AMD = 80.13%
Intel = 18.86%
Adding Intel Cascade Lakes establishing Intel still maintains monopoly share of the commercial server and workstation market.
AMD = 3%
Intel = 97%
Adding AMD Naples and Intel Skylake
AMD = 1.81%
Intel = 98.19%
Adding Intel Broadwell v4
AMD = 1.00%
Intel = 99.00%
Adding Haswell v3
AMD = 0.55%
Intel = 99.45%
Pursuant Mercury Research AMD server share at 10.7% the closest I can get is AMD Milan + Rome + Naples v Intel Ice and Cascade Lake Gold Silver refresh;
AMD = 15.92%
Intel = 84.08%
mb
gescom - Thursday, February 24, 2022 - link
"AMD = 15.92%Intel = 84.08%"
Ok, same old same old, what about new systems shipped 2020, 2021? I think you'd see a radically different % picture.
JayNor - Saturday, February 19, 2022 - link
Raja says Intel is shipping SPR in q1 and sampling SPR-HBM, and add DSA, tiled matrix operations, bfloat16, pcie5, cxl, ddr5 ... so when will AMD match all those features?Qasar - Saturday, February 19, 2022 - link
jaynor, and you actually believe what intel says ? all things considered, that track record for the last few years, sucks. IF you werent and intel shill, then you would believe this, when it is actually out.schujj07 - Sunday, February 20, 2022 - link
Well they have 1 more month to make Q1. So far I haven't seen any product announcements for SPR.mode_13h - Monday, February 21, 2022 - link
Aside from DSA and AMX, I think the answer is their next Epyc.How much DSA and AMX are really worth is yet to be determined. I'd imagine Epyc's additional cores can more than make up for the lack of DSA. As for AMX, I expect it'll offer unmatched inferencing performance *for a CPU*, but the world seems to be moving beyond CPUs for that sort of thing. Time will tell.
mode_13h - Tuesday, February 22, 2022 - link
DSA, for any who don't know, refers to the Data Streaming Accelerator engine, in Sapphire Rapids. It's a "high-performance data copy and transformation accelerator". There's some info on it, here:https://01.org/blogs/2019/introducing-intel-data-s...
The thing to keep in mind is that anything one of these engines does could also be performed by a CPU thread. Obviously, DSA is much smaller (and more limited) than a CPU core, but if you've got more cores (i.e. Epyc's 64 cores vs. SPR's 56 cores), then that's more threads (16, in this case) you could spend on async data movement, if necessary.
So, while DSA might be a win in terms of performance per mm^2 of silicon, I don't see it as a huge differentiator or net advantage for SPR.
Spunjji - Tuesday, February 22, 2022 - link
As always with the Intel pushers, they're not interested in the utility of Intel's unique features so much as them not being things that AMD have. They never talk about the unique features that AMD have / have had that have similar niche appeal, for example the VM security features AMD built into Epyc.mode_13h - Wednesday, February 23, 2022 - link
FWIW, I try to stay non-partisan. My goal is to try and present a realistic interpretation of the facts, as I understand them. Regardless of whether it's a key differentiator, DSA is undeniably new and interesting.Same for AMX, I might add (i.e. new and interesting) - can't wait to see its real-world performance!
Spunjji - Tuesday, February 22, 2022 - link
Blah.kgardas - Friday, February 18, 2022 - link
There are rumors spreading that even SR will provide some of its features like avx512, amx, hbme etc. as a software defined (purchased) feature. Would be glad if this would be mentioned here -- if it's already settled or not.mode_13h - Monday, February 21, 2022 - link
There's not much to write about it, until Intel announces something. All we know is that they're laying the groundwork for hardware feature licensing.zodiacfml - Friday, February 18, 2022 - link
Finally, Intel Atom's aspirations coming to fruition. Back then, if i remember correctly, it was meant to provide a staggering amount of cores and as a low power CPU for IoT/cheap devices. But no, Intel left Atom 1-2 Generations behind Core for years until recentlyMike Bruzzone - Friday, February 18, 2022 - link
Adding some analytical data and notations to Dr. Ian’s commentary, think of it as a side bar;Dr. Ian; “Intel is quoting more shipments of its latest Xeon products in December than AMD shipped in all of 2021, and the company is launching the next generation Sapphire Rapids Xeon Scalable platform later in 2022”
Camp Marketing; on INTC 10 Q/K on channel product category and price data Intel sold 10,216,112 Xeon in q3 and 10,105,561 in q4. Intel quarterly shipments are down from 40 M units per quarter at Skylake/Cascade Lake peak supply.
Dr Ian, always tactful, “response to Ice Lake Xeon has been mixed”
Camp Marketing: On CEO Gelsigner and DCG GM Revera 1 M unit Ice Lake sold in q4, this analyst has calculated approximately 1.3 M full run to date and if 2 M that’s no more than Pentium Pro P6 large cache server volume 1997-97. Xeon Ice is a late market run end dud ala Dempsey between Netburst and Core. Ice suffers the same offered between Cascade Lakes and Sapphire Rapids. Customers want Sapphire not Ice.
This analyst has Sapphire Rapids shipping since q3 parallel Genoa 5 nm, both in customer direct risk production volume and the reason is AMD lost its 7 nm area for performance cost advantage on TSMC markup in q3. Subsequently, AMD had to start shipping Genoa to make their margin incorporating TSMC markup. AMD essentially charges at price premium on / above foundry mark up and needs to stay 1.5 nodes ahead to make up the difference in price insuring AMD gross margin objective.
In the commercial space, unlike Arc consumer GPU where Intel risks being mauled by enthusiasts shipping a less then whole product, in the commercial space Intel Xeon at xx% whole ships so long as customers can program the device. In this competitive situation where Genoa is shipping, Intel does not wait.
Dr. Ian: [Intel] digesting their current processor inventories (as stated by CEO Pat Gelsinger).
Camp Marketing; “Digesting” a clever term referring to the Intel Skyake / Cascade Lakes monopoly surplus overhang sitting in use and in secondary channels for resale, 400 M units worth were sold. This can be a good thing on refurbishing the installed base to dGPU compute if DDR 4 is not system bus limited for sGPU compute on cache starved XSL/XCL control plane processing. The channel and installed base definitively want to keep these servers financially production and acceleration is the key.
Pursuant “digesting” Intel dumped on AMD in q3 and q4 back generation Core and Cascade Lakes driving consumer components margin take on OEM price making in relation the Intel offer to variable cost for Intel in q3 and q4 and specific to AMD in q4. In q4 Xeon and Epyc are the only production categories that paid ahead earning Intel $723 and AMD $733 net per unit. Core and Ryzen/Radeon as I described q4 and Intel Core in q3 and q4 delivered net push against variable cost.
Dr Ian: referring to Sapphire Rapids, “we already know that it will be using >1600 mm2”
Camp Marketing Notation; 350 to 400 mm2 is very much in the traditional Intel sweet spot for LCC manufacturability
Dr. Ian speculating resurgence in Xeon D application specific varients?
Xeon D is a dud. No generation was supplied beyond slim volume and many sold off into NAS appliances. Difficult to say what Intel can do here [on over] segmentation [?] where past 'D' attempts were essentially rejected by the customer base.
Mike Bruzzone, Camp Marketing
Mike Bruzzone - Friday, February 18, 2022 - link
Adding for clarity, AMD sold 2,999,867 Milan in q4 2021. mbwhatthe123 - Saturday, February 19, 2022 - link
that can't be right. AMD's own reports show most of their growth was from ASP increases, not volume. three million milan chips in one quarter shatters their past records multiple times over.Mike Bruzzone - Saturday, February 19, 2022 - link
whattthe, thanks for the inquiry,Camp Marketing has AMD commercial shipments for the year higher than Mercury Research on channel data on 10-Q/K financial reconciliation.
My commercial estimate includes Epyc and Threadripper. Epyc in quarter volume is not regular but sporadic on what the analyst believes are opportunistic production windows in relation wafer starts and AMD full line production category volume – start’s tradeoff. TSMC appears agile when it comes to production / tooling change.
2021 = range 8,320,645 to 9,620,695 units dependent q2 volume roll over into q3;
Q1 = 1,099,950
Q2 = 4,331,103 which is Rome run end production into inventory
Q3 = 1,189,776 which could be roll over from Q2
Q4 = 2,999,867
30% are Threadripper
2020 = range 4,168,967 to 4,560,973 units dependent q3 volume roll over into q4;
Q1 = 449,332
Q2 = 846,604
Q3 = 2,404,558 where some of this volume may roll over into Q4
Q4 = 1,567,108
15.2% are Threadripper
2019 = 5,714,393 of which 76.1% is Threadripper
Naples run end enters q2 2019
2018 = 6,795,562 of which 83.8% is Threadripper
For channel share AMD Milan commercial in relation Intel Ice Lake commercial, I have AMD at 28.45% for channel market share over the prior two quarters [q3-q4] and production volume share, prior two quarters, on AMD and Intel financials on channel price data at 17.09%.
Epyc $1K ASP 2021 on channel supply data;
Q1 = Milan only @ $2915.97
Q2 = Milan only @ $3155.48
Q3 = Milan only @ $3605.95
Q4 = Milan only @ $3932.50
Epyc ASP is typically driven by a skew to top core bin sales / demand fortifying that product space
TR $1K ASP 2021 on channel data;
.
Q1 39x0 only = $2115.35
Q2 39x0 only = $2074,77
Q3 39x0 only = $2303.44
Q4 39x0 only = $2367.36
There are two ways to calculate OEM price 1) $1K stakeholder / 3 is a traditional metric sharing the product value so there are no sales arguments; 1/3rd foundry, 1/3 to AMD, 1/3 to OEM representing NRE and margin potential. This is a highest volume procurement method and typically requires a full product line sale of grade SKUs mirroring what's coming out of finished goods production. SKUs the OEM does not want are brokered off reducing their overall purchase cost. 2) $1K / 3 x 1.55 is a typical AMD direct customer markup but it can go up to x2 on smaller volumes and specific core grade sales. Both of these are standard methods of pricing if you're in business of compute or OEM. Derivatives of OEM and SI procurement would / can include Epyc + desktop and mobile bundles all negotiated into a quarterly procurement agreement.
AMD 2021 all up produced 119,108,089 units and holds 29.06% overall x86 market share.
Complete report is here;
https://seekingalpha.com/instablog/5030701-mike-br...
Mike Bruzzone, Camp Marketing
Hifihedgehog - Monday, February 21, 2022 - link
Stop spamming us with your Seeking Alpha armchair critiques of the market.hh
Qasar - Monday, February 21, 2022 - link
Hifihedgehog the best part of the seeking alpha page, NO links to where he gets this " information " from, for all we know, its either made up by him, or how HE views the data. either way, useless posts from him is what it looks like.Mike Bruzzone - Monday, February 21, 2022 - link
Qasar,My data is my own in primary research for the Federal Trade Commission. That primary research is mainly ebay WW channel supply data quired at high frequency for fidelity that AMD, Intel, Nvidia through in-house personnel also maintain, and where I duplicate that in-house function which I am well aware as a former Cyrix, ARM, NexGen, AMD, Samsung, Intel, IDT Centaur employee or consultant. I've been in my FTC role since May 1998 that is an academic studies role for which I receive no compensation, however, am contracted by USDO to recover Intel Inside price fix for which I receive a percent of the federal procurement 'overcharge' recovery. I also represent 27 States AG and 82 class actions as relator, expert / advocate or witness.
Chanel supply data is relied in my academic studies role for Federal Trade Commission and United States Department of Justice retained by Congress of the United States on federal attorney enlistment; FTC v Intel Dockets 9288 and 9341 Intel production micro economist, general systems assessment and currently for Docket 9341 consent order monitoring includes AMD, Intel, Nvidia and VIA, and might as well include ARM Holdings on the competitive wrangling.
The data is public for transparency otherwise under Docket 9341 discovery requirement only AMD, Intel, Nvidia and Via would see the data. I found that ineffective for regulation and remedial activity and it's my decision charged in the task by FTC and Congress at 15 USC 5.
So base data is ebay the industry relies on it for industrial management decision making where ebay data replaced the Intel supply cipher, in 2016, on signal cipher SEC violation 'looking ahead in time up to eight quarters to project Intel revenue and margin' and where ebay is simply real time data although projectable. Following ebay data precisely is an outstanding industry management tool for executive decision making.
Specific management decision making ebay data confirms component by product category down to the grade SKU quarterly volumes for Intel and AMD competitively speaking, for managing and even determining compliment board house production volume, for channel inventory management and financial industry relies for assessment.
The second primary activity is preparing the ebay channel data; supply, volume, $1K price for production economic assessment. cost, price, margin primarily auditing for price less than cost sales. 10 - Q/K are relied for financial assessment comparing channel data for determining CPU volume discounts. Finally, for estimating by product category volumes per quarter relying on the channel data as a check. The data is also good for determining fabrication yield and by TDP and frequency splits, all sorts of component related production assessments.
The third primary research activity is systems analysis, the fourth legal assessment for monitoring AMD, Intel, Nvidia, Via compliance although Via does not really count other than one component of docket 9341. Fifth moves to assessment responsibility in technocracy, regulation and remedial activities associated with Docket 9341 Is responsible for Intel discontinuation of supply signal cipher, discontinuation of Intel Inside and multiple limiting archetypes associated with Intel Inside, securing Intel Inside processor and processor in computer buyer price fix recovery I expect to be completed this year, and monitoring Intel reconfiguration from producing for supply (that holds channels financially and has a high cost) to producing for actual demand; it's all about Intel and industry cost optimization essentially removing monopoly restraints and there are channel cartel issues also being addressed and remedied.
Mike Bruzzone, Camp Marketing
Qasar - Monday, February 21, 2022 - link
blah blah blah blah with out sources linked in the blah blah blah you post, its almost meaning less, as no one can see it for them self, and compare what is says, vs what you interpret the data as being. the the end, its personal opinion.Mike Bruzzone - Monday, February 21, 2022 - link
Qusar, I said AMD, Intel and Nvidia and I will add Mercury Research all rely on ebay data as the industry management tool that is for tracking supply, production and economics on an Intel model generally known as Total Cost Total Revenue, and 10 Q/K financial assessment is just that, and we validate each other's work. There is no one I'm aware who has challenged AMD, Intel, Nvidia, Mercury, JPR although JPR base data varies from my own but is still complimentary. So do your research. mbQasar - Tuesday, February 22, 2022 - link
sure thing there mike brahzone, sure thing. again with no links to the data you are looking at, means some one could be looking at different data, and come to a different conclusion. but what ever, maybe you dont post sources, because you cant.Mike Bruzzone - Monday, February 21, 2022 - link
Hifihedgehog, my observations are a collaborative form of group contribution that also offer data for thesis development / refinement and decision making. Mosty for industrial management but also engineering decision making frameworks.Definition of SPAM. send the same message indiscriminately to (large numbers of recipients). Or irrelevant or inappropriate messages sent on the internet to a large number of recipients.
My contributions are collaborative and unique in every occurrence and are meant to spark insight and add value. Please consider your reversal, sorry, but think about it.
Mike Bruzzone, Camp Marketing
mode_13h - Tuesday, February 22, 2022 - link
> Stop spamming us with your Seeking Alpha armchair critiques of the market.It's easy enough to ignore, if you don't care to read it.
I don't mind getting some market insights, because that's not something I generally pay much attention to. However, the business end of things can shed much light into the behavior of these companies - what products they introduce and when.
Mike Bruzzone - Sunday, February 20, 2022 - link
whatthe123, Clarification I noted in q3 and q4 2021 'Milan only' volume but that is not correct on AMD losing its 7nm cost advantage to iSF10/7 on TSMC markup adding to AMD cost on TSMC foundry price to AMD. I said up comment string and here Genoa has been shiping in risk volume q3 and q4 to sustain AMD gross margin on customer price incorporating the TSMC mark up. On Rome risk production volume q3 into q4 2019, Genoa has likely shipped minimally 300 K to date up to 447,986 units. I also note Sapphire Rapids shipping in risk volumes in the same time period because at xx% whole Intel will not wait when commercial customers can program the device in this AMD competitive situation. mbHifihedgehog - Monday, February 21, 2022 - link
hhschujj07 - Saturday, February 19, 2022 - link
I personally want Ice Lake. The data center I run does some small cloud hosting specifically for SAP & SAP HANA. Right now Ice Lake isn't certified to run production SAP HANA on VMware. Being able to use Ice Lake instead of any previous Xeon Scalable means you don't need L CPUs to run huge amounts of RAM. Also means I can use a 2 socket instead of a 4 socket server which is cheaper to purchase.Mike Bruzzone - Sunday, February 20, 2022 - link
schujj07, well, Ice Lake is certainly on clearance sale. Are flash arrays still used for data bases or do you need DRAM on the CPU system bus? L CLr is on channel sale and in higher availability than Ice. What do you know of Barlow Pass? Does Optane work for structured data or transaction processing? mbschujj07 - Sunday, February 20, 2022 - link
People still use all flash SANs for DBs. In fact a lot of major SAN vendors are going oynall flash on their high-end. You get much better storage density, lower power consumption, and massively higher iops with the flash. That doesn't even count the higher reliability of flash to spinning disk.SAP HANA is an in RAM DB. Ice Lake is certified for production HANA physical appliances but not for VMware. HANA has a very specific way in which it is covered for PRD. Say your DB is 900GB, you need 900GB RAM just got that VM. Without the L series CPUs you cannot get that much RAM on a single socket for non Ice Lake Xeons. That means you are required to have dual sockets. However, that one VM gets every ounce of RAM and CPU from both sockets by SAP requirements. Your 900GB DB now gets 1.5TB RAM. With Ice Lake I can do that "cheaply" on a single socket with 1TB RAM and then have a DEV or QAS DB running on the other socket. This is one reason we want to have Epyc eventually get PRD certified.
Optane is supported, only in App Direct mode, for PRD and does help a lot on restarting the massive DB. However, only Optane P100 is supported and only on Cascade Lake CPUs. Again this is all in a VMware environment but if you are a cloud provider that is what you are going to use. Also if you run on prem there still isn't any reason to not be virtual just for ease of migration and restart on host failure. https://wiki.scn.sap.com/wiki/plugins/servlet/mobi...
The other pain with HANA are storage requirements. It is hard to find a hyper-converged storage that is certified for PRD. Most certified storage are physical SANs with FC connections. I would love to run it on something like VMware vSAN instead. The more local access of vSAN vs traditional SAN makes latency lower. I can also get higher iops and use any disk I want. For example, an HP SAN won't work with any non HP branded disk (vendor locking). Those disks are then sold at a massive markup. 960GB 1DWPD SAS SSD refurbished run $1k/disk with new being like $1500/disk. Getting that same size and endurance from a place like CDW brings the cost down to under $500/drive for SAS or under $300/drive for NVMe (Micron 7300 pro for example). While my license cost is higher for vSAN, I can load up my host with all NVMe storage and use Optane SSD for my write cache (I actually have an array like this right now). Running that on 25GbE gives awesome performance that is easily scalable. I can easily add more disk and if need by add faster NICs to handle more data.
Mike Bruzzone - Sunday, February 20, 2022 - link
schujj07, thank you for a thorough and detailed assessment of your SAP HANA platform and requirement, wants and benefits, subsystem pros and cons, price differences / tradeoffs, very interesting. mbHifihedgehog - Monday, February 21, 2022 - link
hhmode_13h - Monday, February 21, 2022 - link
My own take on Ice Lake SP is that it's not a bad CPU, just badly-timed. It delivers needed platform enhancements, AVX-512 improvements, better IPC, and an aggregate increase in throughput vs. Cascade Lake (thanks to higher core-counts).That's not to say the lower peak clock speed and power-efficiency aren't areas of disappointment. However, other than some single-thread scenarios, I'm not aware of anything about Ice Lake that's actually *worse* than Cascade Lake.
Spunjji - Tuesday, February 22, 2022 - link
My general understanding is it was too little too late to have broad market appeal. Were it not for AMD's inability to deliver greater volume combined with Intel's flexibility to discount their products as much as needed to encourage purchases, it probably would have hurt Intel quite badly.mode_13h - Wednesday, February 23, 2022 - link
> it was too little too late to have broad market appeal.That's basically what I was trying to say. If it had come out when originally planned, it would've been seen in a rather different light. Especially if the 10 nm+ node on which its made had performed better.
mode_13h - Monday, February 21, 2022 - link
> Xeon D is a dud.I'd suggest that's specific to the Sky Lake update of Xeon D. I think the Broadwell generation did rather well. Intel merely forgot one of the key ingredients that made it good: power-efficiency.
So, it's plausible there could be an E-core based equivalent in the future. However, it's equally plausible that some of the ongoing Atom product lines are already growing into the niches where Xeon D was initially successful (e.g. power-constrained edge servers for things like cellular base stations).
Mike Bruzzone - Tuesday, February 22, 2022 - link
mode_13th, Atom into base station. I monitor for industrial embedded Atom in the channel and they don't exist and Atom sales at Tremont / Jasper into consumer markets are way down from prior generations. What does spark "at the edge" for "cell base stations" is ARM up against x86. ARM has two network infrastructure fronts. One from the edge up and one from core; data center, down network 'head end' infrastructure . . . building a railroad from two end points toward middle.At Avoton, Rangely followed by and Denverton were meant to quash ARM incursion at the edge and did not. ARM owns cell base station.
mb
mode_13h - Wednesday, February 23, 2022 - link
> Atom sales at Tremont / Jasper into consumer markets are way down from prior generations.I presume that's because they're low-margin products. So, Intel is de-prioritizing them, given that it's constrained on the supply-side.
Also, I can tell you that other component shortages are making life hard for OEMs and ODMs. There might be less "pull" for these CPUs from their end, if they're having to divert what components they can get towards *their* higher-margin products. Also, because when you *can* get components in short supply, the prices are inflated - making lower-margin products much less profitable (if at all).
Oxford Guy - Saturday, February 19, 2022 - link
That motherboard photo shows how unserious enterprise is about performance. Notice how the RAM boards have no tall aggressive-looking spreaders, nor rhinestone designs, nor RGB.Oxford Guy - Saturday, February 19, 2022 - link
And I bet there’s no skull on the storage. Truly unfortunate that so much performance is being left on the table.JayNor - Saturday, February 19, 2022 - link
"Yes, it sounds like what Intel’s competition is doing today, but ultimately it’s the right thing to do"I think it would be a step in the wrong direction. Their Foveros base IO tile seems a better solution for the future than the non-scalable sprawling distance between io tile and compute tiles.
whatthe123 - Sunday, February 20, 2022 - link
they're still using foveros when the bandwidth is required like their aurora gpu. having an IOD is pretty much required to avoid the design flaw of sapphire rapids where they had to mirror features on every die. sapphire rapids approach may end up faster for memory access but it makes it difficult to scale up and down the stack.Rοb - Saturday, February 19, 2022 - link
The Intel LGA 4677 is going up against AMD's LGA 6096, while many of the pins will be for power the other half *must do something*. It's probably future-proofing for PCIe 6 and additional DDR6 channels.It's nice to know that the socket will last a few extra generations, if that's the takeaway.
schujj07 - Sunday, February 20, 2022 - link
AMD is also going 12 channel DDR5 for Rent Epyc. Intel is only going to be 8 channel DDR5. Once again Intel didn't put RAM density into their decision making for their newest servers. I wonder if after SPR they will go to 12 channel or will the be late to the game again like they were with 8 channel.schujj07 - Sunday, February 20, 2022 - link
Further note on RAM density. When virtualizing, RAM is your biggest constraint when it comes to number of VMs that can be run on a host. While hypervisors do RAM compression, ballooning, and other things to allow the over allocation of RAM, performance drops very quickly across all VMs on the host once RAM over allocation happens. I've seen performance tank to the point of applications failing at a 10% RAM over allocation. The hosts I manage are all dual socket 32c/64t Epyc Rome's with 1TB RAM. I could easily add more VMs to each host if I had extra RAM. I'm at a steady state 10-15% CPU usage and 50% RAM usage. The mose popular DIMMS are 64GB for DDR4. Zen4 will give me 768GB/socket (1DPC) vs 512GB for Intel. This is why RAM density is so important for virtualization and Intel is behind again.Mike Bruzzone - Sunday, February 20, 2022 - link
mbHifihedgehog - Monday, February 21, 2022 - link
hhHifihedgehog - Monday, February 21, 2022 - link
See how ridiculous it looks when you just reply with an mb? Stop with the spam.Mike Bruzzone - Monday, February 21, 2022 - link
hh . . . I'm recognizing the author for a valuable contribution in a continuing audit. Anywhere you see "mb", here, Seeking Alpha, on tech tube I'm auditing looking for valuable contributions to a whole audit. Provide some valuable observation and I'll recognize you too.Note people don't generally get back, return to, what's going on in the comment string. They post (for posterity?) and then there is no interaction no feedback loop. In engineering that can be the cause and is referred to as 'systems error'. Confirming, registering the connection is important and a best practice. mb
vlad42 - Monday, February 21, 2022 - link
No your not, you are just spamming the comment section with utter crap. Go away.Mike Bruzzone - Monday, February 21, 2022 - link
Vlad42, please offer your thesis on why 'SPAM' and 'CRAP' and prove your thesis. Appreciate the diligence. mbvlad42 - Tuesday, February 22, 2022 - link
You are absolutely spamming the comment section with "mb" multiple times as the entire message. That is by definition, spamming - the reason you are doing it does not matter. This is common sense etiquette for comment sections.As for crap, you have been posting lots of messages making all kinds of wild and unsubstantiated claims with respect to units/wafers sold, margins, etc. without providing any sources. For all we know, you are just pulling those numbers out of your ass. It is not our job to hunt down your sources for you to verify your claims. In fact, the only time I could see that you linked to a source, it was to a site that always publishes sensationalist and contradictory analyst reports within days/hours of each other. In addition, those reports are typically based on technical inaccuracies/unrealistic assumptions of economists, who I can only assume, have no real technical knowledge or are intentionally acting maliciously. So, any analysis from that site is far from a trustworthy source of information.
Crap might have been a bit harsh, but the constant spamming is incredibly frustrating and, frankly, makes one unwilling to give you the benefit of the doubt on your non-spam messages.
Mike Bruzzone - Tuesday, February 22, 2022 - link
Ok Vlad I acknowledge your rationale pursuant the method I meant to recognize another individual's observations interesting to me as analysts and auditor and will reconsider how to do that without the initial's track marks Mikemode_13h - Wednesday, February 23, 2022 - link
If you're interesting in keeping track of which comments you've read, one option is to refresh the page once per day and just search for all posts with the previous day's date. That's basically what I do, when I want to track a discussion. It also helps me avoid burning time reading these comments more than once/day.kwohlt - Monday, February 21, 2022 - link
I'm in a similar situation. We have dual socket, 2x 8 core Ice Lake Gold Xeons, 512GB of RAM per node, and we're hitting RAM constraints way before CPU. Even with dozens of Windows Server and Linux VMs, CPU sits under 25% utilization, RAM goes above 50%, which we want to avoid for failover reasons (approval requests to add RAM are met with "reduce RAM on VMs" ugh)mode_13h - Monday, February 21, 2022 - link
This seems like a perfect use case for tiered memory (see my post about CXL memory). Because oversubscription is so painful, you need to have RAM for your guests' full memory window. However, that's not to say that all of the RAM needs to be running at full speed. For instance, the "free" RAM in a machine that's serving as disk cache will tend to be fairly light duty-cycle and is an easy target for demoting to a slower memory tier. Watch this space.schujj07 - Monday, February 21, 2022 - link
Use of CXL to extend RAM into a RAM pool is an interesting option. Right now that isn't a thing but could be in the next couple years for sure. I wonder how they will do redundancy for a RAM pool. If a host crashes that can take down quite a few VMs. However, if a RAM pool crashes that could take down 1/2 your data center. In many ways I think it would have to be a setup like physical SANs. For sure this will be interesting to watch how it is done over the next decade. At first I can see this being too expensive for anyone who isn't like AWS or massive companies. My guess is for smaller companies with their own data centers it will be at least 10 years before it is cheap enough for us to implement this solution.mode_13h - Tuesday, February 22, 2022 - link
> I wonder how they will do redundancy for a RAM pool.For one thing, Intel is contributing CXL memory patches that allow hot insertion/removal. Of course, if a CXL memory device fails that your VM is using, then it's toast.
There are techniques mainframes use to survive this sort of thing, but I'm not sure if that's the route CXL memory is headed down.
> if a RAM pool crashes that could take down 1/2 your data center.
I think the idea of CXL is to be more closely-coupled to the host than that. While it does offer coherency across multiple CPUs and accelerators, I doubt you'd use CXL for communication outside of a single chassis.
schujj07 - Tuesday, February 22, 2022 - link
"I think the idea of CXL is to be more closely-coupled to the host than that. While it does offer coherency across multiple CPUs and accelerators, I doubt you'd use CXL for communication outside of a single chassis."From the little bit I have read about CXL memory, what I get from it is you would have a pool or two in each rack. In the data center everything has to be redundant otherwise you can have issues. SAN's have dual controllers, hosts are never loaded to full capacity to allow for failover, etc... Would a CXL pool have dual controllers and mirror the data in RAM to the second controller? I'm sure they will use some of the knowledge from mainframes to figure out how to do this. I'm just not an engineer so I am doing nothing more than speculating.
mode_13h - Wednesday, February 23, 2022 - link
> Would a CXL pool have dual controllers and mirror the data in RAM to the second controller?Interesting question. While the CXL protocol might enable cache-coherence across multiple CPUs and accelerators, I think that won't extend to memory mirroring. That would mean that a CXL memory device should implement any mirroring functionality, internally. Not ideal, of course. And I could be wrong about what CXL 2.0 truly supports. I guess we'll have to wait and see.
mode_13h - Wednesday, February 23, 2022 - link
Just to be clear about what I meant, when I write data to one memory device, CXL ensures that write is properly synchronized with all other CXL devices. However, if a CPU tries to write out the same data to two different CXL memory devices, I doubt there's any way to be sure they're mutually synchronized.In other words, if you have two devices issuing writes to the same address, which is backed by mirrored memory, the first device might be first to write that address on the first memory module, but second to write it on the second memory module. So, the values will now be inconsistent.
Rοb - Monday, February 21, 2022 - link
I think you could put 3 light memory usage VMs with one heavier usage VM, giving more VMs (at greater CPU utilization) and allowing one of the VMs (using 1/4 of a core) to have half as much memory - but it the user needed more memory then get them to pay for more than 1/4 core; have them pay for 2 cores (that they don't fully utilize) to get double the memory. If you won't buy bigger DIMMs you have to recover the allocatable memory somehow.The good (painful) news is that there are 512GB DDR5 DIMMs, and that the sweet (in a few years) spot will also be 2x of the DDR4 sizes, so you'll be able to get more memory (after the prices no longer eat the budget away). That means for 1/2M you could get 24TB of memory into the slots, if 2 CPUs can access that much; they don't try to save an address line.
That 12 channel and CXL (hopefully 2.0) is coming is expected speculation - they should do it, and not wait too long.
My theory is that the extra pins not accounted for by the above will go into moving the 2P interconnect via Infinity Fabric over PCIe into connect over an extra set of CXL 2.0 (for the encryption) lanes - freeing up more PCIe lanes; leaving 160-192 lanes as standard, instead of just 128.
More memory and bandwidth, more PCIe lanes, and CXL 2.0 (which is announced), along with more cores (and their 700W boost) will set them ahead across the board; except for single thread performance (and hybrid E-cores, so they'll be a close second for power; with enough difference in price to pay for the electricity).
schujj07 - Tuesday, February 22, 2022 - link
"I think you could put 3 light memory usage VMs with one heavier usage VM, giving more VMs (at greater CPU utilization) and allowing one of the VMs (using 1/4 of a core) to have half as much memory - but it the user needed more memory then get them to pay for more than 1/4 core; have them pay for 2 cores (that they don't fully utilize) to get double the memory. If you won't buy bigger DIMMs you have to recover the allocatable memory somehow."That isn't really how virtualization works. You have a bit of the idea right in that a VM will be placed onto a host that has the free resources. However, no one will be doing this by hand unless they have only 2 hosts. In VMware there is a tool called Distributed Resource Scheduler (DRS) that will automatically place VMs on the correct host in a cluster as well as migrate VMs between hosts for load balancing.
There is no way to give a system only 1/4 core. The smallest amount of CPU that is able to be given is 1 Virtual CPU (that can be a physical core or a hyperthread). I cannot tell you on which physical CPU that vCPU will be run. Until a system needs CPU power, that vCPU sits idle. Once the system needs compute it goes to the hypervisor to ask for compute resources. The hypervisor then looks at what physical resources are available and then gives that system time on the physical hardware.
As physical core counts have gone up it has gotten much easier for the hypervisor to schedule CPU resources without having the requesting system wait for said resources. When you used to have only dual 4c/8t or 8c/16t CPUs, you could easily have systems waiting for CPU resources if you were over allocated on vCPU. In this case a dual 8c/16t server will have 32 vCPU but you could have enough VMs on the server that you have allocated a total of 64 vCPUs. A VM with 4 vCPU has to wait until there are 4 threads available (with the Meltdown/Spector mitigations it would be 2c/4t or 4c) before it can get the physical CPU time. It can happen that say one system has 16 vCPU out of the 32 vCPU on the server and will be waiting almost forever for CPU resources. Since the scheduling isn't like getting in line, if only 8 vCPU frees up that 16 vCPU is left to wait while 2x 4 vCPU VMs get time on the CPU. The hosts I'm running all have 128 vCPU and that makes the CPU resource contention much less of an issue since at any time you are almost assured of free CPU resources. For example I have allocated over 160 vCPU to one server and never have I had an issue where VMs are waiting for compute. I would probably need to be in the 250+ vCPU allocated before I run into CPU resource contention. With these high core count server CPUs, the biggest limiting factor for number of VMs running on a host has changed from CPU to RAM.
From the RAM side it will take a long time until 512GB DDR5 LRDIMMs are available. However, I could easily see the 128GB DDR5 RDIMM being the most popular size, like 64GB is right now for DDR4. For a company like where I work which does small cloud hosting, going from dual socket 8 Channel 64GB DIMMs (1TB total at 1DPC) to dual socket 12 Channel 128GB DIMMs (3TB total at 1DPC) is a huge boost.
Rοb - Thursday, February 24, 2022 - link
"There is no way to give a system only 1/4 core." Semantics - Put 4 VMs on one physical core. One example explanation found with one minute of searching: https://superuser.com/a/698959/schujj07 - Thursday, February 24, 2022 - link
Not semantics at all. While I can have 4 different VMs each with a single CPU on one physical core, that doesn't mean they get 1/4 core. Here is an example with 4 VMs lets call the VMs A, B, C, & D. The host machine has a single physical non SMT CPU. VM A is running something that is using continuous CPU (say y-cruncher but only 25% CPU load). VMs B, C, & D want to run something on the CPU (say OS updates). Those VMs cannot each get 25% of the CPU all at the same time. They have to wait until VM A is done requesting that CPU access until 1 of those next VMs can request CPU for the updates. I cannot have the CPU running at 100% load with those VMs all running code simultaneously (in parallel for lack of a better term) on the CPU. The requests for CPU time are done in serial not parallel. Therefore you cannot give 1/4 of a CPU to a VM.That link you gave is talking about something different. When you setup a VM in VMware, Hyper-V, etc...you are asked to specify the number of CPUs for the system. You have 2 options for giving the number of CPUs (Cores & Sockets). 99.9% of the time it doesn't matter how you do it. You say you are giving VM 4 CPUs it gets 4 CPUs. When you look at the settings in VMware, you see that is 4 sockets @ 1 core /socket. However, I can change that to 1 socket @ 4 cores/socket or 2 sockets @ 2 cores/socket, etc...The OS doesn't care how it is setup as it is getting the CPU you told it was going to get. Where that matters, is in some software licensing is done Per Socket so you might get charged a lot more if your software thinks it is 4 sockets vs 1 socket for licensing. This does not mean I'm giving one system 1/4 (0.25) a CPU vs 4 CPUs.
FYI I am an expert in virtualization. I have my VMware Data Center Virtualization Certificate and am a VMware Certified Professional and I run a data center. I have been the lead VMware Admin for the last 3.5 years.
mode_13h - Friday, February 25, 2022 - link
> When you look at the settings in VMware, you see that is 4 sockets @ 1 core /socket.> However, I can change that to 1 socket @ 4 cores/socket or 2 sockets @ 2 cores/socket,
> etc...The OS doesn't care how it is setup as it is getting the CPU you told it was going to get.
> Where that matters,
Where that matters is memory latency! Also, not overloading the interconnect bus between the sockets.
Performance-wise, it's nearly always best to minimize the split between sockets. The only time that doesn't hold is if there's a job involving multiple threads that are each bandwidth-intensive and don't share data with each other. This is fairly rare.
schujj07 - Friday, February 25, 2022 - link
"Where that matters is memory latency! Also, not overloading the interconnect bus between the sockets."In VMware that doesn't matter. It is going to assign the CPU based on what is available at any given second. That means a VM with 4 vCPU might get Core 0 & 12 on CPU 0 and Core 14 & 23 on CPU 1. There is no way I can park a VM on specific sockets only. The Socket/Cores selection in VMware is nothing more than telling the OS how it is getting its CPUs.
mode_13h - Saturday, February 26, 2022 - link
> There is no way I can park a VM on specific sockets only.I see what you mean. You can tell it *how many* sockets to use, but not *which* sockets.
That's unfortunate that it has no concept of socket-affinity, but I realize it would make the scheduling problem much harder.
Rοb - Saturday, February 26, 2022 - link
mode_13h, see: https://docs.vmware.com/en/VMware-vSphere/7.0/com.... and https://docs.vmware.com/en/VMware-Integrated-OpenS...Rοb - Saturday, February 26, 2022 - link
This is where you replied, and your last comment where it's come to. Yes you should write a blog: VMWare Admin and Certified Professional discovers problem that can't be solved.mode_13h - Monday, February 21, 2022 - link
If Intel stays with 8-channel DDR5 on their platform for EMR/SRF, it could be that they're planning on CXL-attached memory as the main way to add capacity & some additional bandwidth. This could both be more cost-effective and scale to larger memory capacities.Any bandwidth shortfall might also be offset by in-package HBM, at least for higher-end SKUs.
mode_13h - Monday, February 21, 2022 - link
It'll be interesting to see if Intel supports hybrid EMR + SRF configurations. If any of their customers are interested in Big + Little combos, that will be an easy way to experiment with it (i.e. one of each CPU type, in a dual-socket server).Spunjji - Tuesday, February 22, 2022 - link
General thoughts on this: not particularly exciting. Intel seem to be more dynamic in the consumer arena at the moment.mode_13h - Wednesday, February 23, 2022 - link
I think it would be more interesting if they disclosed more details. I *do* like seeing them move more aggressively with their E-cores. Improving power-efficiency is a good thing.kgardas - Thursday, February 24, 2022 - link
Any news about replacement of LGA-2066 Xeon W-22xx? They got no update by IceLake for obvious reason, but with SPR new goldencove it would be great to see something new in this line too...mode_13h - Thursday, February 24, 2022 - link
I'm sure it'll come after SPR launches. I think that's Intel's current priority.That said, maybe you can find some roadmaps of their Xeon W platform.
mode_13h - Thursday, February 24, 2022 - link
Also, because their HEDT and Xeon W CPUs tend to have the same cores as the server Xeons, whatever is holding up their SPR server products is likewise blocking their workstations.Bulis - Monday, February 28, 2022 - link
In gaming news today, noobs will stay noobs and pro players will earn high MMR, end of the story. https://getawayshootout.usmode_13h - Tuesday, March 1, 2022 - link
spammer.