2x the HPC perf-per-watt than Grace Hopper (unclear by what metric) sneaky benchmark since Grace is a monstrous 72 core ARM V2 CPU and Nvidia always uses the ultimate hardline figures for their 700 watt TDP since its simultaneously using all the FP64 cores, FP32 cores and all the tensor cores (almost no application does that). MI300A has 308 CU which do the rated teraflops FP64 performance and the exactly equivalent rated teraflops FP32 performance means higher rated FP64 performance at the cost of lower rated FP32 performance and the FP64 cores in H100's implementation take a lot less space area.
Also 72 V2 cores vs 24 Zen3 cores obviously the V2s are going to consume more power a heck a lot of power then MI300A's 24 Zwn3 cores.
Thanks for the correction. I just read the white paper. But as correctness goes, I'm far more accurate than the original poster, yet you didn't note that.
> Nvidia always uses the ultimate hardline figures for their 700 watt TDP
Datacenter CPUs and GPUs actually stay in their power envelope, no matter whose they are. This is accomplished by clock-throttling, as necessary.
> simultaneously using all the FP64 cores, FP32 cores and all the tensor cores
Highly doubtful, since the fp64 "cores" are probably bolt-ons to the fp32 ones and they almost certainly share the same vector registers.
> MI300A has 308 CU which do the rated teraflops FP64 performance and the exactly > equivalent rated teraflops FP32 performance means higher rated FP64 performance > at the cost of lower rated FP32 performance
What's odd is that this was true of MI200, per my understanding, but AMD has a slide showing the MI300 has 2x the vector fp32 TFLOPS of its vector fp64 TFLOPS. It's only the tensor operations where they're the same. However, that's basically a non-issue, since AI workloads would use lower-precision types like tf32, fp16/bf16, or 8-bit, where MI300 offers even more performance (4x, 8x, and 16x, respectively).
The scalability of MFT systems is not only a technical advantage but also a cost-effective solution. Organizations can scale their how to send video through email file transfer capabilities in alignment with business growth, avoiding unnecessary expenditures on extensive infrastructure upgrades.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
8 Comments
Back to Article
snakeeater3301 - Wednesday, December 6, 2023 - link
2x the HPC perf-per-watt than Grace Hopper (unclear by what metric) sneaky benchmark since Grace is a monstrous 72 core ARM V2 CPU and Nvidia always uses the ultimate hardline figures for their 700 watt TDP since its simultaneously using all the FP64 cores, FP32 cores and all the tensor cores (almost no application does that). MI300A has 308 CU which do the rated teraflops FP64 performance and the exactly equivalent rated teraflops FP32 performance means higher rated FP64 performance at the cost of lower rated FP32 performance and the FP64 cores in H100's implementation take a lot less space area.Also 72 V2 cores vs 24 Zen3 cores obviously the V2s are going to consume more power a heck a lot of power then MI300A's 24 Zwn3 cores.
mdriftmeyer - Wednesday, December 6, 2023 - link
MI300A are Zen4C Chiplets.Ryan Smith - Thursday, December 7, 2023 - link
No. They are standard Zen 4 CCDs. 8 core CCDs, 3 CCDs in total.They are certainly not clocked as high as they can go, though. Only 3.7GHz or so.
mdriftmeyer - Thursday, December 7, 2023 - link
Thanks for the correction. I just read the white paper. But as correctness goes, I'm far more accurate than the original poster, yet you didn't note that.Dante Verizon - Wednesday, December 6, 2023 - link
More cores generally bring greater energy efficiency.mode_13h - Thursday, December 7, 2023 - link
> Nvidia always uses the ultimate hardline figures for their 700 watt TDPDatacenter CPUs and GPUs actually stay in their power envelope, no matter whose they are. This is accomplished by clock-throttling, as necessary.
> simultaneously using all the FP64 cores, FP32 cores and all the tensor cores
Highly doubtful, since the fp64 "cores" are probably bolt-ons to the fp32 ones and they almost certainly share the same vector registers.
> MI300A has 308 CU which do the rated teraflops FP64 performance and the exactly
> equivalent rated teraflops FP32 performance means higher rated FP64 performance
> at the cost of lower rated FP32 performance
What's odd is that this was true of MI200, per my understanding, but AMD has a slide showing the MI300 has 2x the vector fp32 TFLOPS of its vector fp64 TFLOPS. It's only the tensor operations where they're the same. However, that's basically a non-issue, since AI workloads would use lower-precision types like tf32, fp16/bf16, or 8-bit, where MI300 offers even more performance (4x, 8x, and 16x, respectively).
fahadse0 - Monday, December 11, 2023 - link
The scalability of MFT systems is not only a technical advantage but also a cost-effective solution. Organizations can scale their how to send video through email file transfer capabilities in alignment with business growth, avoiding unnecessary expenditures on extensive infrastructure upgrades.mode_13h - Tuesday, December 12, 2023 - link
This seems like search engine optimization spam.