My guess is the late arrival of Sapphire Rapids and Genoa have something to do with the delay. But it may also have to do with the US government's new restrictions on NVIDIA's China dealings. There were news reports that said "The company said the ban, which affects its A100 and H100 chips designed to speed up machine learning tasks, could interfere with completion of developing the H100, the flagship chip it announced this year." Perhaps NVIDIA's Chinese operations were key in the design and/or testing of the DGX servers.
Didn't Tom's hardware just have an article yesterday that Nvidia was diverting as many A100/H100 as possible to China before the ban takes effect? Maybe that explains this unusual release order.
Could be. You think the Chinese are more interested in the HGX than the DGX? Supposedly, though, they wanted to release H100 earlier, and it hasn't been component shortages that have delayed the release. Why don't they have enough chips for both DGX and for server partners by now? Unless they were able to delay production with TSMC in some economically beneficial way (no guarantee of that).
compared to the 4090 the H100 pcie looks pretty bad for FP-Vector stuff. at 83tflops the former is almost twice as fast... or is the 4090 artificially limited for fp64?
The 4090 only has a small nunber of fp64 units for code compatability reasons. Its FP64 performance us abysmal because the hardware isn't on the chip.
As far as theoretical max FP32 performance, the 4090 is of course a much better price/performance proposition. The two cards are for different segments. The H100 has HBM memory, 80 GB of memory, Nvlink capability, comes with 5 years of software licensing, and has been validated for servers, something that takes a significant outlay of money. You wouldn't buy an H100 PCIe for your desktop system if you're looking to do some ml training. you'd get a 4090. but if you wanted to move training to your datacenter, linking the compute resources across your entire organization and perhapaps scaling it to multiple gpus then you'd get H100s and the difference in cost would be well worth it. It would take a lot of expertise to get the 4090 to work like the h100 does out of the box and you wouldn't have access to nvidia's ai enterprise licenses or their support.
It's interesting that for fp64 these are still nowhere near competitive with the MI250, particularly per watt. Guess all the new supercomputers using AMD were good decisions.
Supercomputing is about more than FP64 these days (though that is still most important), and it's unclear what the real world application performance difference is using the MI250 architecture, which seems to double up on FP64 execution units. I'm not sure exactly what they are doing, but it may be similar to the doubled up FP32 units NVIDIA implemented on Ampere. That didn't result in a proportional increase of performance in games, though in rendering tasks it seemed to maintain a better computational efficiency than for gaming. If that is what AMD did, I don't know how it plays out in HPC.
But yes, NVIDIA isn't willing to downgrade their tensor cores for the benefit of FP64. NVIDIA makes billions of dollars per year from AI-related data center business. Supercomputing is a drop in the bucket compared to that. AMD and Intel, on the other hand, don't have much of a data center GPU compute business and are looking to break into the game through winning supercomputer contracts. Whether those decisions to go with AMD and Intel were good ideas, though, will take some more time to determine, I think.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
8 Comments
Back to Article
Yojimbo - Tuesday, September 20, 2022 - link
My guess is the late arrival of Sapphire Rapids and Genoa have something to do with the delay. But it may also have to do with the US government's new restrictions on NVIDIA's China dealings. There were news reports that said "The company said the ban, which affects its A100 and H100 chips designed to speed up machine learning tasks, could interfere with completion of developing the H100, the flagship chip it announced this year." Perhaps NVIDIA's Chinese operations were key in the design and/or testing of the DGX servers.quorm - Tuesday, September 20, 2022 - link
Didn't Tom's hardware just have an article yesterday that Nvidia was diverting as many A100/H100 as possible to China before the ban takes effect? Maybe that explains this unusual release order.Yojimbo - Tuesday, September 20, 2022 - link
Could be. You think the Chinese are more interested in the HGX than the DGX? Supposedly, though, they wanted to release H100 earlier, and it hasn't been component shortages that have delayed the release. Why don't they have enough chips for both DGX and for server partners by now? Unless they were able to delay production with TSMC in some economically beneficial way (no guarantee of that).bernstein - Tuesday, September 20, 2022 - link
compared to the 4090 the H100 pcie looks pretty bad for FP-Vector stuff. at 83tflops the former is almost twice as fast... or is the 4090 artificially limited for fp64?Yojimbo - Tuesday, September 20, 2022 - link
The 4090 only has a small nunber of fp64 units for code compatability reasons. Its FP64 performance us abysmal because the hardware isn't on the chip.As far as theoretical max FP32 performance, the 4090 is of course a much better price/performance proposition. The two cards are for different segments. The H100 has HBM memory, 80 GB of memory, Nvlink capability, comes with 5 years of software licensing, and has been validated for servers, something that takes a significant outlay of money. You wouldn't buy an H100 PCIe for your desktop system if you're looking to do some ml training. you'd get a 4090. but if you wanted to move training to your datacenter, linking the compute resources across your entire organization and perhapaps scaling it to multiple gpus then you'd get H100s and the difference in cost would be well worth it. It would take a lot of expertise to get the 4090 to work like the h100 does out of the box and you wouldn't have access to nvidia's ai enterprise licenses or their support.
quorm - Tuesday, September 20, 2022 - link
It's interesting that for fp64 these are still nowhere near competitive with the MI250, particularly per watt. Guess all the new supercomputers using AMD were good decisions.Yojimbo - Tuesday, September 20, 2022 - link
Supercomputing is about more than FP64 these days (though that is still most important), and it's unclear what the real world application performance difference is using the MI250 architecture, which seems to double up on FP64 execution units. I'm not sure exactly what they are doing, but it may be similar to the doubled up FP32 units NVIDIA implemented on Ampere. That didn't result in a proportional increase of performance in games, though in rendering tasks it seemed to maintain a better computational efficiency than for gaming. If that is what AMD did, I don't know how it plays out in HPC.But yes, NVIDIA isn't willing to downgrade their tensor cores for the benefit of FP64. NVIDIA makes billions of dollars per year from AI-related data center business. Supercomputing is a drop in the bucket compared to that. AMD and Intel, on the other hand, don't have much of a data center GPU compute business and are looking to break into the game through winning supercomputer contracts. Whether those decisions to go with AMD and Intel were good ideas, though, will take some more time to determine, I think.
gue2212 - Friday, October 28, 2022 - link
I thought I had heard Nvidia's Grace ARM CPUs would go with Hopper!?