They'll be using gen12 (AKA Xe) Intel GPUs -- must be leaps and bounds more efficient than stuffing xeons in racks, and about as efficient as using NVidia GPUs ; )
It's the first time I have heard that statement. The Aurora was probably late due to the unmentionable technical difficulties, so they bumped the project straight to the next stage of delivery.
Yes, exactly. I'd be surprised if there was enough transparency in their purchasing process to support that statement. We'll probably never know exactly why they chose Intel.
I would to find some specs on these machines. It sounds like a combination of new technology coming
Xeons based on new architexture New GPU architexture which could be reason why Phi is going away New Optane memory which could explain why Intel does not need Micron anymore.
It was Micron that broke the relationship with Intel, not the other way around. Micron and Intel are chasing different markets for NAND and 3D XPoint. Micron wants to make money through volume and Intel can afford to not worry about the margins of the actual parts but instead think of it as a part of the value of their entire platform.
Phi went away because of a convergence of deep learning and HPC, I believe, and because Intel could never get much traction in its use outside supercomputers. Intel was obviously already developing a discrete GPU, but in an interview with an Intel scientist I read soon after the existence of the new A21 supercomputer came out (without many public details) the guy said that Intel had been planning on what was going into the new A21 machine for a while but that they have moved its schedule up in response to the cancelation of the Aurora and the shutdown of the Xeon Phi line.
Interesting article but a couple of comments on this
1. Sunny Cove should have single thread performance increase also because of enhancement of multiple execution units
2. Gen 11 graphics are replacements for exist iGPU's and have MX130 or higher performance. Gen 12 is what Xe graphics are and should have much higher performance. I believe also there will be consumer level ones also. Must be significant performance increase to shut down Phi processors
3. it would foolish to think Intel was not working on it on fabs for Optane changes. I think Micron did not fit there needs.
It doesn't matter what Intel was or wasn't doing. The choice was entirely Micron's to buy the fab. Micron only bought out the fab because they wanted to. Now I am sure that at some point it was clear between the two partners that they wouldn't work together any more, but that doesn't mean that Micron had to buy the plant. Micron bought the plant because they believe in their own future for 3D XPoint using their technology that is divergent from Intel's.
A telling fact is that it is Micron that is changing from floating gate to charge trap for their 3D NAND while Intel is continuing with floating gate. Intel's perspective just did not fit with Micron's interests and Micron has plenty of money now to go their own way.
The selling point of the Phi processors was their ability to run unmodified x86 code and also to supposedly benefit somewhat from slight modification to standard code. But I think to really use them as accelerators the modification that needed to be done was along the lines of what was needed for a GPU. GPUs outperformed Xeon Phi overall, I guess, because commercial customers never latched onto Xeon Phi much. Once Intel saw that happening they would have known they needed a replacement for Xeon Phi. But I think the arrival of deep learning sounded the death knell for Phi earlier than it would have otherwise been sounded.
As far as the performance of Xe, you have to take into account that what Intel can shove down people's throats is not the only factor to consider. The DOE were most likely not completely satisfied with the way the Aurora machine was shaping up. That would have spurred Intel to change strategies and dump Phi even if they had an inaccurate idea at that point in time of the performance they would get out of Xe.
@Yojimbo, your guess re the Phi is quite close -- while it could run unmodified x86 code, its proper utilization required programming techniques arguably more convoluted than those for GPUs, thus it was regularly outperformed by similarly-TDP'd GPUs. Abstractly speaking, Phi had the worst of both x86 and GPU worlds.
Depends on the generation of Xeon Phi. The first wave did have to leverage specialized compilers to get any sort of acceleration: the 512 bit vector instructions were only found alongside those Pentium 1 based cores.
The second generation of Knight's Landing was far superior with normal SSE and AVX implementations plus AVX-512. So as long as you were not running original Xeon Phi code on Knight's Landing, you had backwards compatibility with ordinary x86 software. Not a terrible thing for legacy code but Intel did miss their mark by finally getting the vision of backwards compatibility right on the second generation of products.
The kicker for Knight's Landing was that it had 16 GB of HMC memory which requiring tuning code around its unique NUMA model. Otherwise it was bandwidth starved with only six DDR4 memory channels feeding up to 72 Airmont cores.
True. KNC should have been what KNL was, but then again there's a causality link MIC->AVX512, so it seems Intel had to figure out what they wanted first, and that costed them them the product line.
Phi got killed off because it couldn't compete with GPUs in perf/watt or perf/mm^2 (and thereby probably also perf/$). The only reason anybody ever had for justifying Xeon Phi was to run legacy multi-threaded code. If you're using modern libraries/software, then it couldn't compete with GPUs.
Intel had to fail at an x86-based GPU-compute competitor, before they could move beyond it. I think the internal politics of x86 were too strong, at Intel.
Look at the expected deployment date. It is surely using yet-to-be-released/announced products.
Actually, I think you have it backwards. Phi going away is what finally made room for their compute-oriented GPU product. As long as Phi continued to look viable, they were probably reluctant to put resources behind a more purist GPU approach.
Xeon Phi died because of Intel's 10 nm delays. Even now, Intel is only pumping out a handful of small 10 nm parts (71 mm^2) which have a good portion of their die disabled (graphics). On 14 nm, Knight's Landing was 683 mm^2 with the successor Knight's Hill being a similarly large chip but on a 10 nm process. By the time Intel is able to ship Knight's Hill, they could end up shipping after the successor to nVidia's Volta architecture arrives. Had Intel shipped Knight's Hill last year as originally envisioned, they would be far more competitive.
Or maybe Phi got killed off with replacement Xe series that is more efficient and also can help in graphics marked which Phi was never really designed for.
"As it turns out, Intel’s approach was considered as the most efficient one for the country’s first Exascale supercomputer."
Not sure about that. It's more a matter of politics. Intel and Cray were awarded the original Aurora contract, but the DOE was apparently not happy with the way the system was shaping up, probably because of the deep learning performance of Xeon Phi Knights Hill chips that were supposed to go into the system, which was poor comparatively with GPUs. The DOE wanted an accelerated supercomputer but in general wants to spread out its purchases between at least two architectures. That throws out an IBM/NVIDIA system, and my guess is it also throws out anything relying on NVIDIA accelerators. Intel was already developing a discrete GPU and said they could get it out the door by 2021. There is a competition among countries going on to get to exascale first, and Intel was now in position to negotiate to take the money set aside for Aurora in addition to more money added to it to obtain a contract for a delayed and expanded system that would be the first American system to reach exascale.
I imagine if they weren't Intel they wouldn't have been able to get the DOE to commit to something like that. After all, Intel isn't known for their GPUs. Intel is really on the hot seat to deliver here, I'd imagine.
<quote>The DOE wanted an accelerated supercomputer but in general wants to spread out its purchases between at least two architectures. That throws out an IBM/NVIDIA system, and my guess is it also throws out anything relying on NVIDIA accelerators. </quote>I'm sure you meant to say that throws in the IBM/NVidia (and maybe AMD/NVidia) systems? ;)
Summit and such, along with the Aurora were part of the same CORAL multi-laboratory, pre-exascale contract (A as in Argonne), which then was reformulated for the later delivery of the Intel/Cray system at which point the performance target was adjusted.
Certainly the government, for national security reasons, doesn't want there to be just one company that can supply the parts for a supercomputer. However, I wonder if Cray will make products similar to Aurora available to other customers, and give them the choice between Intel and NVIDIA GPUs. I can't imagine all customers being happy with spending millions and being limited to one choice for GPUs.
Very likely. Cray wouldn't just develop an interconnect for one customer and the Xeons used will be the next step for Intel in general. So the components must be able to be used to serve the various software stacks customers have. Intel probably makes the case for their coherent chip-to-chip interconnect with their own accelerators, however, just like NVidia currently does with the NVLink combined with Power chips.
I would wager that Cray will make NVIDIA GPUs available in their commercial Shasta systems. Perlmutter, for example, is Shasta-based supercomputer, to be delivered in 2020, that includes NVIDIA GPU compute nodes. Cray seemed to enter into a close partnership with Intel under the Xeon Phi program and it bit them in the ass. Since then, they seem to have diversified their strategy a bit.
No, it throws them out. It's not that the DOE does not want IBM/NVIDIA machines at all, it's that they don't want to rely exclusively on any one architecture, and if they had Summit, Sierra, and Aurora all IBM/NVIDIA systems then they would be purchasing only one architecture under the program, and even though A21 is delayed to a later generation, since Intel's strategy is currently in flux it would really limit their purchasing options for Crossroads and NERSC-9. If they ended up with NVIDIA accelerators for all supercomputers delivered between 2018 and 2021 it would be going well against their philosophy.
Oh, I didn't mean to imply that Aurora were supposed to be a Power system. Intel/Cray just missed the first delivery stage due to various challenges and upgraded the plan for a later stage delivery.
When they tout exaFLOPS, are we certain they're talking about fp64? Or could they be fudging things and really talking about fp16 or some specialized deep learning-flavored datatype?
Just run the numbers and you'll see what I mean. Nvidia's V100 clocks about 7 TFLOPS of fp64. Let's say Intel's biggest Xe manages about 10 (could be higher, but probably less than 20, and I'm just talking about ballparks, here). So, you'd need 100k of those to reach exaFLOPS. I'm pretty sure that's well bigger than we've ever seen. Can it be done for $500 M? Hmmm... At $5k per GPU, my guess is it'd be a stretch (some of that amount has to go for CPUs, RAM, storage, Optane DIMMs, power, racks, networking, etc.).
I think it more likely they're talking about fp16 performance.
A couple of things to keep in mind, there is a performance leak about Gen 11 graphics which are the Integrated graphic replace of 1 TFLOP GPU inside notebook and Xe are the Gen12 external graphics.
My guess for Gen 12 graphics there will be multiple levels of GPU - from internal graphics for notebooks to consumer cards, higher end game cards and professional cards.
Keep also in mind, that these Cray computers are not just GPU's and also have Covey Lake based Xeon's in the picture. So these CPU's also have AVX 512 or possible something even better in the picture..
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
25 Comments
Back to Article
webdoctors - Thursday, March 21, 2019 - link
Do you know this ? "most efficient one for the country’s first Exascale supercomputer."Curious what benchmarks was used to derive that statement.
blu42 - Thursday, March 21, 2019 - link
They'll be using gen12 (AKA Xe) Intel GPUs -- must be leaps and bounds more efficient than stuffing xeons in racks, and about as efficient as using NVidia GPUs ; )TeXWiller - Thursday, March 21, 2019 - link
It's the first time I have heard that statement. The Aurora was probably late due to the unmentionable technical difficulties, so they bumped the project straight to the next stage of delivery.mode_13h - Friday, March 22, 2019 - link
Yes, exactly. I'd be surprised if there was enough transparency in their purchasing process to support that statement. We'll probably never know exactly why they chose Intel.HStewart - Thursday, March 21, 2019 - link
I would to find some specs on these machines. It sounds like a combination of new technology comingXeons based on new architexture
New GPU architexture which could be reason why Phi is going away
New Optane memory which could explain why Intel does not need Micron anymore.
Yojimbo - Thursday, March 21, 2019 - link
It was Micron that broke the relationship with Intel, not the other way around. Micron and Intel are chasing different markets for NAND and 3D XPoint. Micron wants to make money through volume and Intel can afford to not worry about the margins of the actual parts but instead think of it as a part of the value of their entire platform.Phi went away because of a convergence of deep learning and HPC, I believe, and because Intel could never get much traction in its use outside supercomputers. Intel was obviously already developing a discrete GPU, but in an interview with an Intel scientist I read soon after the existence of the new A21 supercomputer came out (without many public details) the guy said that Intel had been planning on what was going into the new A21 machine for a while but that they have moved its schedule up in response to the cancelation of the Aurora and the shutdown of the Xeon Phi line.
Laubzega - Thursday, March 21, 2019 - link
More information here: https://www.nextplatform.com/2019/03/18/intel-to-t...HStewart - Thursday, March 21, 2019 - link
Interesting article but a couple of comments on this1. Sunny Cove should have single thread performance increase also because of enhancement of multiple execution units
2. Gen 11 graphics are replacements for exist iGPU's and have MX130 or higher performance. Gen 12 is what Xe graphics are and should have much higher performance. I believe also there will be consumer level ones also. Must be significant performance increase to shut down Phi processors
3. it would foolish to think Intel was not working on it on fabs for Optane changes. I think Micron did not fit there needs.
Yojimbo - Friday, March 22, 2019 - link
It doesn't matter what Intel was or wasn't doing. The choice was entirely Micron's to buy the fab. Micron only bought out the fab because they wanted to. Now I am sure that at some point it was clear between the two partners that they wouldn't work together any more, but that doesn't mean that Micron had to buy the plant. Micron bought the plant because they believe in their own future for 3D XPoint using their technology that is divergent from Intel's.A telling fact is that it is Micron that is changing from floating gate to charge trap for their 3D NAND while Intel is continuing with floating gate. Intel's perspective just did not fit with Micron's interests and Micron has plenty of money now to go their own way.
The selling point of the Phi processors was their ability to run unmodified x86 code and also to supposedly benefit somewhat from slight modification to standard code. But I think to really use them as accelerators the modification that needed to be done was along the lines of what was needed for a GPU. GPUs outperformed Xeon Phi overall, I guess, because commercial customers never latched onto Xeon Phi much. Once Intel saw that happening they would have known they needed a replacement for Xeon Phi. But I think the arrival of deep learning sounded the death knell for Phi earlier than it would have otherwise been sounded.
As far as the performance of Xe, you have to take into account that what Intel can shove down people's throats is not the only factor to consider. The DOE were most likely not completely satisfied with the way the Aurora machine was shaping up. That would have spurred Intel to change strategies and dump Phi even if they had an inaccurate idea at that point in time of the performance they would get out of Xe.
blu42 - Friday, March 22, 2019 - link
@Yojimbo, your guess re the Phi is quite close -- while it could run unmodified x86 code, its proper utilization required programming techniques arguably more convoluted than those for GPUs, thus it was regularly outperformed by similarly-TDP'd GPUs. Abstractly speaking, Phi had the worst of both x86 and GPU worlds.Kevin G - Monday, March 25, 2019 - link
Depends on the generation of Xeon Phi. The first wave did have to leverage specialized compilers to get any sort of acceleration: the 512 bit vector instructions were only found alongside those Pentium 1 based cores.The second generation of Knight's Landing was far superior with normal SSE and AVX implementations plus AVX-512. So as long as you were not running original Xeon Phi code on Knight's Landing, you had backwards compatibility with ordinary x86 software. Not a terrible thing for legacy code but Intel did miss their mark by finally getting the vision of backwards compatibility right on the second generation of products.
The kicker for Knight's Landing was that it had 16 GB of HMC memory which requiring tuning code around its unique NUMA model. Otherwise it was bandwidth starved with only six DDR4 memory channels feeding up to 72 Airmont cores.
blu42 - Wednesday, March 27, 2019 - link
True. KNC should have been what KNL was, but then again there's a causality link MIC->AVX512, so it seems Intel had to figure out what they wanted first, and that costed them them the product line.mode_13h - Friday, March 22, 2019 - link
Phi got killed off because it couldn't compete with GPUs in perf/watt or perf/mm^2 (and thereby probably also perf/$). The only reason anybody ever had for justifying Xeon Phi was to run legacy multi-threaded code. If you're using modern libraries/software, then it couldn't compete with GPUs.Intel had to fail at an x86-based GPU-compute competitor, before they could move beyond it. I think the internal politics of x86 were too strong, at Intel.
mode_13h - Friday, March 22, 2019 - link
Look at the expected deployment date. It is surely using yet-to-be-released/announced products.Actually, I think you have it backwards. Phi going away is what finally made room for their compute-oriented GPU product. As long as Phi continued to look viable, they were probably reluctant to put resources behind a more purist GPU approach.
Kevin G - Monday, March 25, 2019 - link
Xeon Phi died because of Intel's 10 nm delays. Even now, Intel is only pumping out a handful of small 10 nm parts (71 mm^2) which have a good portion of their die disabled (graphics). On 14 nm, Knight's Landing was 683 mm^2 with the successor Knight's Hill being a similarly large chip but on a 10 nm process. By the time Intel is able to ship Knight's Hill, they could end up shipping after the successor to nVidia's Volta architecture arrives. Had Intel shipped Knight's Hill last year as originally envisioned, they would be far more competitive.HStewart - Tuesday, March 26, 2019 - link
Or maybe Phi got killed off with replacement Xe series that is more efficient and also can help in graphics marked which Phi was never really designed for.Yojimbo - Thursday, March 21, 2019 - link
"As it turns out, Intel’s approach was considered as the most efficient one for the country’s first Exascale supercomputer."Not sure about that. It's more a matter of politics. Intel and Cray were awarded the original Aurora contract, but the DOE was apparently not happy with the way the system was shaping up, probably because of the deep learning performance of Xeon Phi Knights Hill chips that were supposed to go into the system, which was poor comparatively with GPUs. The DOE wanted an accelerated supercomputer but in general wants to spread out its purchases between at least two architectures. That throws out an IBM/NVIDIA system, and my guess is it also throws out anything relying on NVIDIA accelerators. Intel was already developing a discrete GPU and said they could get it out the door by 2021. There is a competition among countries going on to get to exascale first, and Intel was now in position to negotiate to take the money set aside for Aurora in addition to more money added to it to obtain a contract for a delayed and expanded system that would be the first American system to reach exascale.
I imagine if they weren't Intel they wouldn't have been able to get the DOE to commit to something like that. After all, Intel isn't known for their GPUs. Intel is really on the hot seat to deliver here, I'd imagine.
TeXWiller - Thursday, March 21, 2019 - link
<quote>The DOE wanted an accelerated supercomputer but in general wants to spread out its purchases between at least two architectures. That throws out an IBM/NVIDIA system, and my guess is it also throws out anything relying on NVIDIA accelerators. </quote>I'm sure you meant to say that throws in the IBM/NVidia (and maybe AMD/NVidia) systems? ;)Summit and such, along with the Aurora were part of the same CORAL multi-laboratory, pre-exascale contract (A as in Argonne), which then was reformulated for the later delivery of the Intel/Cray system at which point the performance target was adjusted.
Ktracho - Thursday, March 21, 2019 - link
Certainly the government, for national security reasons, doesn't want there to be just one company that can supply the parts for a supercomputer. However, I wonder if Cray will make products similar to Aurora available to other customers, and give them the choice between Intel and NVIDIA GPUs. I can't imagine all customers being happy with spending millions and being limited to one choice for GPUs.TeXWiller - Thursday, March 21, 2019 - link
Very likely. Cray wouldn't just develop an interconnect for one customer and the Xeons used will be the next step for Intel in general. So the components must be able to be used to serve the various software stacks customers have. Intel probably makes the case for their coherent chip-to-chip interconnect with their own accelerators, however, just like NVidia currently does with the NVLink combined with Power chips.Yojimbo - Friday, March 22, 2019 - link
I would wager that Cray will make NVIDIA GPUs available in their commercial Shasta systems. Perlmutter, for example, is Shasta-based supercomputer, to be delivered in 2020, that includes NVIDIA GPU compute nodes. Cray seemed to enter into a close partnership with Intel under the Xeon Phi program and it bit them in the ass. Since then, they seem to have diversified their strategy a bit.Yojimbo - Friday, March 22, 2019 - link
No, it throws them out. It's not that the DOE does not want IBM/NVIDIA machines at all, it's that they don't want to rely exclusively on any one architecture, and if they had Summit, Sierra, and Aurora all IBM/NVIDIA systems then they would be purchasing only one architecture under the program, and even though A21 is delayed to a later generation, since Intel's strategy is currently in flux it would really limit their purchasing options for Crossroads and NERSC-9. If they ended up with NVIDIA accelerators for all supercomputers delivered between 2018 and 2021 it would be going well against their philosophy.TeXWiller - Friday, March 22, 2019 - link
Oh, I didn't mean to imply that Aurora were supposed to be a Power system. Intel/Cray just missed the first delivery stage due to various challenges and upgraded the plan for a later stage delivery.mode_13h - Friday, March 22, 2019 - link
When they tout exaFLOPS, are we certain they're talking about fp64? Or could they be fudging things and really talking about fp16 or some specialized deep learning-flavored datatype?Just run the numbers and you'll see what I mean. Nvidia's V100 clocks about 7 TFLOPS of fp64. Let's say Intel's biggest Xe manages about 10 (could be higher, but probably less than 20, and I'm just talking about ballparks, here). So, you'd need 100k of those to reach exaFLOPS. I'm pretty sure that's well bigger than we've ever seen. Can it be done for $500 M? Hmmm... At $5k per GPU, my guess is it'd be a stretch (some of that amount has to go for CPUs, RAM, storage, Optane DIMMs, power, racks, networking, etc.).
I think it more likely they're talking about fp16 performance.
HStewart - Tuesday, March 26, 2019 - link
A couple of things to keep in mind, there is a performance leak about Gen 11 graphics which are the Integrated graphic replace of 1 TFLOP GPU inside notebook and Xe are the Gen12 external graphics.https://hothardware.com/news/intel-gen-11-gpu-benc...
My guess for Gen 12 graphics there will be multiple levels of GPU - from internal graphics for notebooks to consumer cards, higher end game cards and professional cards.
Keep also in mind, that these Cray computers are not just GPU's and also have Covey Lake based Xeon's in the picture. So these CPU's also have AVX 512 or possible something even better in the picture..