Comments for Intel Goes Full XPU: Falcon Shores to Combine x86 and Xe For Supercomputers

Intel Goes Full XPU: Falcon Shores to Combine x86 and Xe For Supercomputers

by Ryan Smith on 2/17/2022 2:30 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

28 Comments

Back to Article

name99 - Thursday, February 17, 2022 - link
"All of this hardware, in turn, is overseen at the software level by Intel’s oneAPI software stack, which is designed to abstract away many of the hardware differences to allow easier multi-architecture development."

Except that, apparently, it doesn't.
I was recently part of a Twitter thread complaining that the primary reason Gaudi (and earlier Nervana) were gaining little traction was the absence of decent SW. I raised oneAPI and was told that it did nothing that was actually of interest to the serious large-scale NNU teams.

I know very little about oneAPI or NNU hardware, so I have no comment. But it does feel like oneAPI (like so much from Intel these days) exists a lot more on marketing slides and a lot less in actual customers hands.
edzieba - Monday, February 21, 2022 - link
A twitter complaints thread is probably not the best source for API adoption information.
mode_13h - Sunday, February 27, 2022 - link
> it does feel like oneAPI (like so much from Intel these days) exists a lot more
> on marketing slides and a lot less in actual customers hands.

Not exactly. At some level, it's a CPU/GPU API, being built upon SYCL and Data-Parallel C++. Their OpenVINO framework has a backend for it that supports their GPUs, which I know because we're using it.

Surprisingly, it seems OpenVINO still lacks backends for any Nervana or Habana ASIC, but they do have some level of support for Movidius and their GNA.

https://docs.openvino.ai/latest/openvino_docs_IE_D...
ballsystemlord - Thursday, February 17, 2022 - link
Sort of like Intel will be on 10nm within the next year? ;)
michael2k - Friday, February 18, 2022 - link
What do you mean? They’ve been on 10nm forever and just renamed their latest 10nm node I7
ballsystemlord - Sunday, March 6, 2022 - link
I was poking fun at Intel's recent slip with respect to the launch of their 10nm node. I was contrasting their lack of staying on schedule there with their current plans being likely to slip.
JayNor - Thursday, February 17, 2022 - link
The servethehome article, "Intel AXG on Falcon Shores Arctic Sound-M and Future Products at Investor Meeting 2022", mentioned that in a prior interview Raja made a note "LightBender", in reference to Falcon Shores' leading IO ... his interpretation that this meant Silicon Photonics.

I believe late 2024 to 2025 does match up with an in-package photonics roadmap Intel's Blum has presented.
Kevin G - Friday, February 18, 2022 - link
Silicon photonics is one of those aces Intel has had up their sleeves for awhile. Thus far they've only been leveraging that to produce some optical transceivers for Cisco.

Intel has been buying up networking companies with an eye of using their own advanced manufacturing and packaging to move those acquisitions forward. Intel's tile strategy will pay off here as the switch IO can be built using their silicon photonics as a discrete tile while the switch ASIC portion will be their own die. Then put in a few x86 dies for management and some memory controller dies to round out the product for the networking segment.

For the HPC side of things, those same tile used in the networking segment can be fully integrated as well for more internode bandwidth and lower latency. The switch ASIC portion wouldn't necessarily have to be Ethernet based as it could be something unique like PCIe over optics that leverages CXL to build a flat memory space and handle coherency.
name99 - Friday, February 18, 2022 - link
Or Silicon Photonics is the next Optane, a theoretically promising technology that Intel will fsck up through a combination of management incompetence, attempted market segmentation, tying it to intel products rather than allowing it to become generic technology, and so on...
emvonline - Saturday, February 19, 2022 - link
I think the optane analogy is that Optane works, but few people want it and are willing to pay for it. So it doesnt matter and no one cares. Many people have Si photonics technology. Do people want it and are they willing to pay for it. Was Optane mentioned at the investor meeting? I didnt hear it
brucethemoose - Monday, February 21, 2022 - link
My impression of Optane is that its neither here nor there.

It doesn't have the durability or speed or even the cost/GB to replace slow DRAM, and SLC flash is good enough(TM) for the vast majority of non-volatile storage. The niche its useful in is really small.

Photonics, on the other hand, has a much wider range of potential usefulness... depending on how it shakes out.
mode_13h - Monday, February 28, 2022 - link
> a theoretically promising technology that Intel will fsck up

I'm a fan of Optane, but how much of its failure can really be attributed to Intel's failure to execute? It simply didn't deliver the promised endurance, meaning it really can't be used like slow DRAM.

The bigger failure was the 3D part. From what I understand, the latest gen Optane is still only 4 layers, which represents the biggest hurdle it has in competing with NAND. That makes it simply too expensive for most storage applications. Is that also due to bad management, or simply a technology that ran out of gas before the competition?
mode_13h - Monday, February 28, 2022 - link
> Then put in a few x86 dies for management

Does the networking industry want x86 for this? From what I understand, networking gear is one area that MIPS and POWER cores have remained entrenched. If I were using those and looking for something different, I'd go to ARM or RISC-V, rather than x86.

And let's not forget that Altera FPGAs still have ARM cores. So, it's not as if Intel is completely blinded by x86's shortcomings.

> it could be something unique like PCIe over optics that leverages CXL
> to build a flat memory space and handle coherency.

Uh, that seems like a mishmash of buzzwords. But I get the idea of wanting CXL protocol over photonics. If you wanted to do that, I'm sure it'd make more sense to swap out the PHY layer, than try to run PCIe over optical, given how much of PCIe 6.0 signalling seems designed around the limitations and issues of copper.
KurtL - Friday, February 18, 2022 - link
This is an evolution that has been years in the making. First there were the USA pre-exascale systems based on NVIDIA GPUs and IBM POWER8 and POWER9 processors that integrated the CPU in the NVLINK fabric to create a joint memory space. Now you see a similar architecture with Saphire Rapids + Ponte Vecchio in Aurora and AMD Trento + MI250X in Frontier and LUMI. On the other hand Fugaku with its A64fx processors has shown what people in HPC already know for quite a while: there are algorithms that cannot benefit a lot from vector or matrix computations acceleration but can still benefit a lot from GPU-like memory bandwidth. And not only the logically separate memory spaces in current x86-based GPU systems but also the physical separation in systems with unified memory limits the gains that can be obtained in many applications. Now that scalar cores and GPU cores can more easily be combined on a single chip or using the tile/chiplet approach, it is only natural that both are combined in a single socket and linked to joint memory controllers. You can already see the benefits of such an approach in the Apple M1, a chip that in some applications can play way above its league because of the close integration of all compute resources both saving power (as data transport is expensive power wise) and making it more efficient to combine the various compute resources.
mode_13h - Monday, February 28, 2022 - link
> You can already see the benefits of such an approach in the Apple M1,
> a chip that in some applications can play way above its league because
> of the close integration of all compute resources both saving power
> (as data transport is expensive power wise)

Source? Everyone seems to be holding up the M1 as an example of this or that, but I see these claims backed by little or no data.

I doubt the M1's graphics performance benefits specifically from tight integration between the CPU and GPU, but rather from simply having so much memory bandwidth at the disposal of a powerful iGPU. Lots of focus goes into optimizing APIs and games to work well across separate memory spaces. If it were such a huge liability, we'd see dGPUs hit a wall, before they could reach very high performance levels.
duploxxx - Friday, February 18, 2022 - link
Intel is expecting it to offer better than 5x the performance-per-watt and 5x the memory capacity of their current platforms.

that is not so difficult knowing 2 years to come and measuring from its existing Xeon platform which fails to deliver performance... Cascade R and Ice Lake are underperforming and Every one is waiting for the so hyped Sapphire rappids which is already delayed a few times.. lets hope it will still show up 2H2022
Calin - Tuesday, February 22, 2022 - link
Their current platform have "low-memory" and "high-memory" processors (the "L" version I think)
The Xeon Platinum 8360h supports 1.12TB, the 8360HL supports 4.5TB.
So, the HL already has 4x memory capacity of the 8360H. Going to 5x is groundbreaking
mode_13h - Monday, February 28, 2022 - link
> that is not so difficult knowing 2 years to come

We already know roughly what the memory technology landscape will look like, in that time. So, please tell us where they plan to get 5x capacity AND 5x bandwidth (relative to Ice Lake SP).
Rudde - Friday, February 18, 2022 - link
The memory bandwidth can be easily achieved with HBM3, but the memory capacity would require a lot of stacks, I imagine. To that end they could use a hybrid approach with Optane for capacity and HBM3 for bandwidth. The perf/W and core density could be achieved by using E-cores.
Calin - Tuesday, February 22, 2022 - link
The "Efficiency" cores have better performance per watt, but not so much better. For a 5x, you need to go to simpler hardware (think 4,000 of GPU-like simple cores instead of 64 large cores).
Also, memory access (read, write, synchronization, caches, ...) cost a lot of energy - in some cases comparable to processing itself.
mode_13h - Monday, February 28, 2022 - link
> The perf/W and core density could be achieved by using E-cores.

They already said it's integrating Xe-like GPU cores. That's where they're getting the big performance boost.
michael2k - Friday, February 18, 2022 - link
So is it me or does this sound like a unified memory pool that both cpu and GPU have equal access to?
mode_13h - Monday, February 28, 2022 - link
Yes, if the GPU and CPU are being joined in-package, that would be the obvious thing to do.

However, CXL means you can still have coherent memory pools, even with them residing in different devices.
erinadreno - Saturday, February 19, 2022 - link
Is it just me or Intel just got obsessed with water in the past 5 years?
PeachNCream - Monday, February 21, 2022 - link
Maybe longer. The * Bridge series chips crossed over water and that was the first thing that came to mind off the top of my head. I'm not going to do the research, but I bet there are other bodies of water in Intel's history as well. I'm sure someone with more knowledge of CPU history could chime in, but I do get where you're going with this.
JKflipflop98 - Wednesday, March 9, 2022 - link
The code names are based on where the part is designed. If it's designed in Oregon, it get a pacific NW name. If it's drawn up in Israel it gets a much drier handle.
mode_13h - Monday, February 28, 2022 - link
Yes... bridges, wells, streams, creaks, lakes, rapids, coves, sounds. I think the bigger point is that these are geographical features that are plentiful, making them ripe for use as codenames.

Some non-water naming themes they've used involve *mont, *pass, *forest, knights*, and I'm sure there are others that don't come to mind.

I like how AMD used the theme of islands, where they had a family named after a group of islands and the individual members were named after individual islands. I guess there aren't a ton of island chains, though. Still, seems like they could've continued that a bit longer.
mode_13h - Sunday, February 27, 2022 - link
> 5x increase in memory capacity, and a 5x increase in memory bandwidth

I wish they were clearer about what they're using as a basis for comparison. Cynically, one could say 5x the capacity of current GPUs and 5x the bandwidth of current Xeons.

Intel Goes Full XPU: Falcon Shores to Combine x86 and Xe For Supercomputers

Post Your Comment

28 Comments

Back to Article

name99 - Thursday, February 17, 2022 - link

edzieba - Monday, February 21, 2022 - link

mode_13h - Sunday, February 27, 2022 - link

ballsystemlord - Thursday, February 17, 2022 - link

michael2k - Friday, February 18, 2022 - link

ballsystemlord - Sunday, March 6, 2022 - link

JayNor - Thursday, February 17, 2022 - link

Kevin G - Friday, February 18, 2022 - link

name99 - Friday, February 18, 2022 - link

emvonline - Saturday, February 19, 2022 - link

brucethemoose - Monday, February 21, 2022 - link

mode_13h - Monday, February 28, 2022 - link

mode_13h - Monday, February 28, 2022 - link

KurtL - Friday, February 18, 2022 - link

mode_13h - Monday, February 28, 2022 - link

duploxxx - Friday, February 18, 2022 - link

Calin - Tuesday, February 22, 2022 - link

mode_13h - Monday, February 28, 2022 - link

Rudde - Friday, February 18, 2022 - link

Calin - Tuesday, February 22, 2022 - link

mode_13h - Monday, February 28, 2022 - link

michael2k - Friday, February 18, 2022 - link

mode_13h - Monday, February 28, 2022 - link

erinadreno - Saturday, February 19, 2022 - link

PeachNCream - Monday, February 21, 2022 - link

JKflipflop98 - Wednesday, March 9, 2022 - link

mode_13h - Monday, February 28, 2022 - link

mode_13h - Sunday, February 27, 2022 - link

Log in

Don't have an account? Sign up now