Comments Locked

8 Comments

Back to Article

  • snakeeater3301 - Wednesday, December 6, 2023 - link

    2x the HPC perf-per-watt than Grace Hopper (unclear by what metric) sneaky benchmark since Grace is a monstrous 72 core ARM V2 CPU and Nvidia always uses the ultimate hardline figures for their 700 watt TDP since its simultaneously using all the FP64 cores, FP32 cores and all the tensor cores (almost no application does that). MI300A has 308 CU which do the rated teraflops FP64 performance and the exactly equivalent rated teraflops FP32 performance means higher rated FP64 performance at the cost of lower rated FP32 performance and the FP64 cores in H100's implementation take a lot less space area.

    Also 72 V2 cores vs 24 Zen3 cores obviously the V2s are going to consume more power a heck a lot of power then MI300A's 24 Zwn3 cores.
  • mdriftmeyer - Wednesday, December 6, 2023 - link

    MI300A are Zen4C Chiplets.
  • Ryan Smith - Thursday, December 7, 2023 - link

    No. They are standard Zen 4 CCDs. 8 core CCDs, 3 CCDs in total.

    They are certainly not clocked as high as they can go, though. Only 3.7GHz or so.
  • mdriftmeyer - Thursday, December 7, 2023 - link

    Thanks for the correction. I just read the white paper. But as correctness goes, I'm far more accurate than the original poster, yet you didn't note that.
  • Dante Verizon - Wednesday, December 6, 2023 - link

    More cores generally bring greater energy efficiency.
  • mode_13h - Thursday, December 7, 2023 - link

    > Nvidia always uses the ultimate hardline figures for their 700 watt TDP

    Datacenter CPUs and GPUs actually stay in their power envelope, no matter whose they are. This is accomplished by clock-throttling, as necessary.

    > simultaneously using all the FP64 cores, FP32 cores and all the tensor cores

    Highly doubtful, since the fp64 "cores" are probably bolt-ons to the fp32 ones and they almost certainly share the same vector registers.

    > MI300A has 308 CU which do the rated teraflops FP64 performance and the exactly
    > equivalent rated teraflops FP32 performance means higher rated FP64 performance
    > at the cost of lower rated FP32 performance

    What's odd is that this was true of MI200, per my understanding, but AMD has a slide showing the MI300 has 2x the vector fp32 TFLOPS of its vector fp64 TFLOPS. It's only the tensor operations where they're the same. However, that's basically a non-issue, since AI workloads would use lower-precision types like tf32, fp16/bf16, or 8-bit, where MI300 offers even more performance (4x, 8x, and 16x, respectively).
  • fahadse0 - Monday, December 11, 2023 - link

    The scalability of MFT systems is not only a technical advantage but also a cost-effective solution. Organizations can scale their how to send video through email file transfer capabilities in alignment with business growth, avoiding unnecessary expenditures on extensive infrastructure upgrades.
  • mode_13h - Tuesday, December 12, 2023 - link

    This seems like search engine optimization spam.

Log in

Don't have an account? Sign up now