Neat. The first question which enters my head is what are the software requirements to make use of the accelerators? i.e. will you need rewritten compression binaries, network drivers specifically making use of the hashing acceleration hardware, new database software.. Or are any of these features going to simply make existing software much more efficient?
It required some effort, but it is fairly straightforward. The patches are merged into the main branch and available in the latest release under an experimental flag.
Looks like a decent set of blocks, and a nice way to differentiate from AMD. (Also looks like the kind of stuff RISC/UNIX and mainframe platforms have been doing for years - especially Oracle, but also IBM.)
Data Streaming and the crypto/network acceleration stuff are done via DMA.
The most interesting case is the "In-Memory Analytics" where I can't tell from the article what Intel is ACTUALLY doing (in memory compression?) It is possible (unclear but certainly possible) that the A15 has genuine PiM hardware. I'm still hoping that we'll eventually get decent M2 cross-sections, or an A16 cross-section which shed light on this. I discuss my thinking about this in vol5 (Packaging) of my M1 books at https://github.com/name99-org/AArch64-Explore (all massively updated about a week ago).
Intel's doing a lot better now that they're not spending all their cash buying back shares of their stock. They should have some really nice products rolling out of the new fabs we paid for in a couple years.
I need the finer details of the NGINX test. There's a couple things I don't like initially. - Is the accelerator outside of the kernel being maintained by the regular OS patching or Intel? - Is this going to be in the regular build of OpenSSL or will it require an Intel-specific one?
It's nice and all to see this, but vendor locking for something as ubiquitous as a web server worries me.
Sapphire Rapids is rather an odd duck. It's an 'unreleased' CPU that's been in the hands of major customers for months to years (depending on iteration) in volume, a 'general release' server CPU that's effectively a semicustom design built as a collaboration between a handful of HPC users for very specific tasks, and a 'Xeon' that's more of a side offshoot of the Xeon line... except the Willow Cove Xeons are MIA so from the outside it appears to fill that hole in the lineup, even if it was never intended to fulfil that role.
I question Sapphire Rapids can fill the mainstream channel on XSL/XCL (dirt cheap) platform standardization specific SR hardware ahead of software. How can OEMs provide a boxed plug and play 'off the shelf' solution if the platform needs an entire CS department to code optimized applications. Beyond Intel direct 'business of compute' customers how many years for proof of use cases establishing 'safe' general platform validation? mb
The only server that moved in the channel in high velocity volume in q3 is Rome.
Here's the channel % SKU volume, sales and trade-in trend;
Noteworthy Rome sells off 27.7% since July 2.
Rome 64C at 31.4% of available today < 6.8% since July 2 48C at 2.5% < 74% 32C at 23.3% < 49% 24C at 15% + 19% 16C at 19% = flat 12C at 1% < 49% 8C at 7.7% < 50%
Very robust sales trend and the objective appears plain < 50% inventoried sales press.
Milan overall available increases 11.8% since July 2.
Milan 64C at 46.3% of available today + 32% since July 2 56C at 2.9% = flat 48C at 3.2% = flat 32C at 16.4% < 23.2% The major mover is 7543P 28C at 3.2% + 220% 24C at 13.6% = flat 16C at 13.2% < 3% The major mover is 7313 8C at 1.2% + 50%
Milan sales a little flattish with the channel focused on clearing Rome.
Xeon Ice in total increases 23.7% since July 2.
P40C at 3.2% < 20% P38C at 3.0% + 53% P36C at 7.2% < 20% The major mover is 8360Y P32C at 19.8% + 15.6% G32C at 6.5% + 29% G28C at 12.8% + 52% G24C at 7.1% = 6% G20C at 0.27% = flat G18C at 2.4% = flat G16C at 6.1% + 39% G 12C at 1% + 166% G 8C at 7.2% + 25% S20C at 2.2% + 13% S16C at 2.8% + 31% S12C at 7.7% + 11.5% S10C at 0.6% + 66% S8C at 5.6% + 40% All W at 2% + 36%
Ice sales overall seem a little flattish.
Specific Intel Xeon Skylake and Cascade lakes;
On trade-in something replaces, and the question is always what?
Within Xeon Skylake to Cascade Lakes stack?
Note the XSL/XCL movers since July are XCL 28C and dreg Silver and Bronze 8C the middle is trading but its maintenance. Rome took the q3 enterprise budget made everything else look like it was standing still.
Migrates from Skylake and Cascades to Xeon Ice? Migrates to AMD? Migrates to ARM?
From the framing at the start of this article, is the scientist referring to Sapphire Rapids platforms?
I had thought CXL to be a relatively new technology. But if it's ever to show up in a commercial customer's data center, it would need years of bespoke deployment in HPC machines.
@Watersb, Yes and no. Aurora was originally supposed to be a Xeon Phi supercomputer... "Originally announced in April 2015, Aurora was planned to be delivered in 2018 and have a peak performance of 180 petaFLOPS. The system was expected to be the world's most powerful system at the time. The system was intended to be built by Cray based on Intel's 3rd generation Xeon Phi (Knights Hill microarchitecture). In November 2017 Intel announced that Aurora has been shifted to 2021 and will be scaled up to 1 exaFLOPS. The system will likely become the first supercomputer in the United States to break the exaFLOPS barrier. As part of the announcement Knights Hill was canceled and instead be replaced by a "new platform and new microarchitecture specifically designed for exascale"."
Alas here we are in 2022 and no Aurora, They actually built an AMD/Nvidia testbed while waiting for it ... But AMD has launched frontier, and will probably launch El-Captain before Aurora gets out the door.
So No, I think they are talking in general about Aurora, but Yes Intel's canceled the product meant for the supercomputer, then delayed its replacement sapphire rapids and Xe... Till March of next year?
Meanwhile, El Capitan seems on schedule for 2023 with Genoa and MI300.
The question is, does it make sense to add accelerator units to a general processor, or should they be part of a separate chip/card? I wonder if there's any advantage vs offloading everything to CUDA or similar solution.
This is really interesting, but raises a lot of questions. If the machine is used for running containers/vm’s, do the virtualized workloads have access to the accelerators? This would apply to a cloud environment where a customer is renting just a portion of the machine, say 4 cores, as well as for internal use where the customer owns the hardware but choosing to run containers for security and scalability reasons. If these blocks are usable by vm’s/containers, is there a way to monitor and control resource utilization so one vm/container doesn’t hog the acceleration blocks at the expense of other vm’s/containers on the same system?
The other question I have is around cost. While individually these blocks aren’t huge, collectively they’re not small either. That’s going to have at least some impact on the number of chip/wafer Intel is able to produce and will inevitably drive the cost of these chips up (all else being equal). Are these acceleration blocks going to be implemented across the entire product stack, or will customers be able to pick and choose what they need a-la cart and have the option of a discounted version of SR if they don’t need/want any of the acceleration blocks?
Sapphire Rapids is an example of Intel doing too many things at once just like 10nm plans. 1st time making Xeon with chiplets, custom process(enhanced super fin), DDR5, PCI-E5, CXL and many others. Plus intel was plagued with non tech folks making decisions which made things worse. Hopefully with Pat being back since early 2021, decision making is done by right folks. It all depends on how Intel executes on Granite Rapids/Sierra Forest/Falcon Shores. Only time will tell. For now Intel is operating on niche segments while AMD is taking market share for generic x86 servers which has been the cash cow for Intel. I wonder where things will be in 2 years time frame.
Interesting exhibit. It would have been great to see more details around the various tests conducted. In my opinion, if one cannot recreate the tests (aside from the hardware features), then posting numbers don't make much sense.
Example - ClickhouseDB - What is the scale factor used for testing? Any reason why only Q4.1 was the focus? Why not present results from entire Star Schema benchmark?
TensorFlow+Resnet - Which pre-trained model was used? It would have been helpful if link to the data set and the model was also published.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
39 Comments
Back to Article
BushLin - Wednesday, September 28, 2022 - link
Neat. The first question which enters my head is what are the software requirements to make use of the accelerators? i.e. will you need rewritten compression binaries, network drivers specifically making use of the hashing acceleration hardware, new database software..Or are any of these features going to simply make existing software much more efficient?
Jorgp2 - Wednesday, September 28, 2022 - link
?Most software packages support it out of the box, they're not really new things.
Alexey Milovidov - Wednesday, October 12, 2022 - link
You can check the pull request to ClickHouse adding support for QPL:https://github.com/ClickHouse/ClickHouse/pull/3665...
It required some effort, but it is fairly straightforward.
The patches are merged into the main branch and available in the latest release under an experimental flag.
JVlocal - Friday, October 21, 2022 - link
Bonjour les BTS SNIR 2JVlocal - Friday, October 21, 2022 - link
La réponse à la question D c'est dauphinJVlocal - Friday, October 21, 2022 - link
Félicitation, vous avez tout lu, maintenant attendez sagementJVlocal - Friday, October 21, 2022 - link
Bonjour les BTS SNIR 2SarahKerrigan - Wednesday, September 28, 2022 - link
Looks like a decent set of blocks, and a nice way to differentiate from AMD. (Also looks like the kind of stuff RISC/UNIX and mainframe platforms have been doing for years - especially Oracle, but also IBM.)name99 - Thursday, September 29, 2022 - link
And, for that matter, Apple:AMX of course even has the same name!
Data Streaming and the crypto/network acceleration stuff are done via DMA.
The most interesting case is the "In-Memory Analytics" where I can't tell from the article what Intel is ACTUALLY doing (in memory compression?)
It is possible (unclear but certainly possible) that the A15 has genuine PiM hardware. I'm still hoping that we'll eventually get decent M2 cross-sections, or an A16 cross-section which shed light on this. I discuss my thinking about this in vol5 (Packaging) of my M1 books at https://github.com/name99-org/AArch64-Explore (all massively updated about a week ago).
evilpaul666 - Thursday, September 29, 2022 - link
Intel's doing a lot better now that they're not spending all their cash buying back shares of their stock. They should have some really nice products rolling out of the new fabs we paid for in a couple years.YukaKun - Thursday, September 29, 2022 - link
I need the finer details of the NGINX test. There's a couple things I don't like initially.- Is the accelerator outside of the kernel being maintained by the regular OS patching or Intel?
- Is this going to be in the regular build of OpenSSL or will it require an Intel-specific one?
It's nice and all to see this, but vendor locking for something as ubiquitous as a web server worries me.
Regards.
edzieba - Thursday, September 29, 2022 - link
Sapphire Rapids is rather an odd duck.It's an 'unreleased' CPU that's been in the hands of major customers for months to years (depending on iteration) in volume, a 'general release' server CPU that's effectively a semicustom design built as a collaboration between a handful of HPC users for very specific tasks, and a 'Xeon' that's more of a side offshoot of the Xeon line... except the Willow Cove Xeons are MIA so from the outside it appears to fill that hole in the lineup, even if it was never intended to fulfil that role.
Bruzzone - Thursday, September 29, 2022 - link
I question Sapphire Rapids can fill the mainstream channel on XSL/XCL (dirt cheap) platform standardization specific SR hardware ahead of software. How can OEMs provide a boxed plug and play 'off the shelf' solution if the platform needs an entire CS department to code optimized applications. Beyond Intel direct 'business of compute' customers how many years for proof of use cases establishing 'safe' general platform validation? mbBruzzone - Thursday, September 29, 2022 - link
New Xeon Ice in relation Epyc Milan and Rome WW competitive channel data here at slides #3 through #9.See AMD jab, cross and crush Intel.
See Intel Skylake Cascade lakes replacement that dog piles AMD.
AMD server channel share today is 68% Rome + Milan in relation Intel Ice.
https://seekingalpha.com/instablog/5030701-mike-br...
The only server that moved in the channel in high velocity volume in q3 is Rome.
Here's the channel % SKU volume, sales and trade-in trend;
Noteworthy Rome sells off 27.7% since July 2.
Rome 64C at 31.4% of available today < 6.8% since July 2
48C at 2.5% < 74%
32C at 23.3% < 49%
24C at 15% + 19%
16C at 19% = flat
12C at 1% < 49%
8C at 7.7% < 50%
Very robust sales trend and the objective appears plain < 50% inventoried sales press.
Milan overall available increases 11.8% since July 2.
Milan 64C at 46.3% of available today + 32% since July 2
56C at 2.9% = flat
48C at 3.2% = flat
32C at 16.4% < 23.2%
The major mover is 7543P
28C at 3.2% + 220%
24C at 13.6% = flat
16C at 13.2% < 3%
The major mover is 7313
8C at 1.2% + 50%
Milan sales a little flattish with the channel focused on clearing Rome.
Xeon Ice in total increases 23.7% since July 2.
P40C at 3.2% < 20%
P38C at 3.0% + 53%
P36C at 7.2% < 20%
The major mover is 8360Y
P32C at 19.8% + 15.6%
G32C at 6.5% + 29%
G28C at 12.8% + 52%
G24C at 7.1% = 6%
G20C at 0.27% = flat
G18C at 2.4% = flat
G16C at 6.1% + 39%
G 12C at 1% + 166%
G 8C at 7.2% + 25%
S20C at 2.2% + 13%
S16C at 2.8% + 31%
S12C at 7.7% + 11.5%
S10C at 0.6% + 66%
S8C at 5.6% + 40%
All W at 2% + 36%
Ice sales overall seem a little flattish.
Specific Intel Xeon Skylake and Cascade lakes;
On trade-in something replaces, and the question is always what?
Within Xeon Skylake to Cascade Lakes stack?
Note the XSL/XCL movers since July are XCL 28C and dreg Silver and Bronze 8C the middle is trading but its maintenance. Rome took the q3 enterprise budget made everything else look like it was standing still.
Migrates from Skylake and Cascades to Xeon Ice?
Migrates to AMD?
Migrates to ARM?
4C channel available today represents 2.38% increasing 29% in the prior 11 weeks.
6C = 1.92% + 7.3% is returning
8C = 11.25% < 24.4% is selling
10C = 6.86% < 17.3% is selling
12C = 9.77% < 9.7%
14C = 14.43% + 6.1%
16C = 12.96% + 7.2%
18C = 6.59% < 5.1%
20C = 9.59% + 6%
22C = 2.6% + 12% returns
24C = 8.63% + 3.9%
26C = 2.03% + 1,1%
28C = 10.98% < 4.2% selling
mb
Qasar - Saturday, October 1, 2022 - link
more fake made up data from the fraudster, who posts things that no one can verify OR confirm.Ryan Smith - Saturday, October 8, 2022 - link
That's enough. Both of you.watersb - Thursday, September 29, 2022 - link
An HPC scientist calls out Intel for years of delay. (Twitter thread via Hacker News today)https://twitter.com/nicolehemsoth/status/157523361...
From the framing at the start of this article, is the scientist referring to Sapphire Rapids platforms?
I had thought CXL to be a relatively new technology. But if it's ever to show up in a commercial customer's data center, it would need years of bespoke deployment in HPC machines.
Eagle07 - Friday, September 30, 2022 - link
@Watersb, Yes and no. Aurora was originally supposed to be a Xeon Phi supercomputer..."Originally announced in April 2015, Aurora was planned to be delivered in 2018 and have a peak performance of 180 petaFLOPS. The system was expected to be the world's most powerful system at the time. The system was intended to be built by Cray based on Intel's 3rd generation Xeon Phi (Knights Hill microarchitecture). In November 2017 Intel announced that Aurora has been shifted to 2021 and will be scaled up to 1 exaFLOPS. The system will likely become the first supercomputer in the United States to break the exaFLOPS barrier. As part of the announcement Knights Hill was canceled and instead be replaced by a "new platform and new microarchitecture specifically designed for exascale"."
Alas here we are in 2022 and no Aurora, They actually built an AMD/Nvidia testbed while waiting for it ... But AMD has launched frontier, and will probably launch El-Captain before Aurora gets out the door.
Eagle07 - Friday, September 30, 2022 - link
So No, I think they are talking in general about Aurora, but Yes Intel's canceled the product meant for the supercomputer, then delayed its replacement sapphire rapids and Xe... Till March of next year?Meanwhile, El Capitan seems on schedule for 2023 with Genoa and MI300.
Silma - Thursday, September 29, 2022 - link
The question is, does it make sense to add accelerator units to a general processor, or should they be part of a separate chip/card? I wonder if there's any advantage vs offloading everything to CUDA or similar solution.m53 - Thursday, September 29, 2022 - link
AMX is looking good.noobmaster69 - Thursday, September 29, 2022 - link
This is really interesting, but raises a lot of questions. If the machine is used for running containers/vm’s, do the virtualized workloads have access to the accelerators? This would apply to a cloud environment where a customer is renting just a portion of the machine, say 4 cores, as well as for internal use where the customer owns the hardware but choosing to run containers for security and scalability reasons. If these blocks are usable by vm’s/containers, is there a way to monitor and control resource utilization so one vm/container doesn’t hog the acceleration blocks at the expense of other vm’s/containers on the same system?The other question I have is around cost. While individually these blocks aren’t huge, collectively they’re not small either. That’s going to have at least some impact on the number of chip/wafer Intel is able to produce and will inevitably drive the cost of these chips up (all else being equal). Are these acceleration blocks going to be implemented across the entire product stack, or will customers be able to pick and choose what they need a-la cart and have the option of a discounted version of SR if they don’t need/want any of the acceleration blocks?
trivik12 - Thursday, September 29, 2022 - link
Sapphire Rapids is an example of Intel doing too many things at once just like 10nm plans. 1st time making Xeon with chiplets, custom process(enhanced super fin), DDR5, PCI-E5, CXL and many others. Plus intel was plagued with non tech folks making decisions which made things worse. Hopefully with Pat being back since early 2021, decision making is done by right folks. It all depends on how Intel executes on Granite Rapids/Sierra Forest/Falcon Shores. Only time will tell. For now Intel is operating on niche segments while AMD is taking market share for generic x86 servers which has been the cash cow for Intel. I wonder where things will be in 2 years time frame.Threska - Thursday, September 29, 2022 - link
Intel does more than CPUs, so it helps them to being able to do more.schujj07 - Thursday, September 29, 2022 - link
Intel has been selling off a lot of their other non CPU sides over the last 2 years.scrizz - Friday, September 30, 2022 - link
By a lot you mean NAND and mobile comms?quadibloc - Thursday, September 29, 2022 - link
I don't suppose AMD could afford to purchase IBM's mainframe business. That would be one way to compete with this.nandnandnand - Thursday, September 29, 2022 - link
I like how comments were deleted but there are spam comments.Ryan Smith - Friday, September 30, 2022 - link
Comments get checked a few times a day. I can't watch them 24/7.dataman24 - Saturday, October 15, 2022 - link
Interesting exhibit. It would have been great to see more details around the various tests conducted.In my opinion, if one cannot recreate the tests (aside from the hardware features), then posting numbers don't make much sense.
Example -
ClickhouseDB -
What is the scale factor used for testing?
Any reason why only Q4.1 was the focus?
Why not present results from entire Star Schema benchmark?
TensorFlow+Resnet -
Which pre-trained model was used?
It would have been helpful if link to the data set and the model was also published.