For tasks that scale well across multiple nodes, the CPU + GPU combination will be a winner.
It wouldn't surprise me if the next iteration has an optional FPGA chiplet and the generation after that includes on package fabrics. Things are becoming very, very interesting.
Also worth noting: Xeon Phi (KNL) paired a mesh of enhanced Silvermont Atom cores with 16 GB of in-package MCDRAM (+ Purley's usually 6-channel DDR4 external DRAM), back in 2016.
Also, praising unreleased products as superior is definitely something a shill would do. In the MI210 thread, I never said Arctic Sound would be faster/better, just that I was more interested in using it, based on my prior experience with Intel's GPU stack. In spite of my caveats, you stated:
"If two people come and say you’re a Intel shill or biased I would start thinking about myself and not endlessly deflect everything."
Just because you and some other random account I'd never seen before (supdawgwtfd) dropped in and attacked me. So, let's hope nobody else in this thread thinks you're acting like a shill.
As a developer I'm in two minds about this. On one hand, there are times when using the CPU for specific parts of the code path can be helpful, and not having to transfer data for that will certainly help. On the other hand, one of the benefits of a discrete CPU is that it can have a large RAM space for relatively cheaply. Moving data to CPU RAM can help free GPU RAM. When GPU RAM is used for both CPU and GPU, there's less RAM for algorithms that need a lot of GPU data.
Optimally I think it could be best to have the CPU and GPU share RAM but still have access to a large pool of slower, expandable RAM.
The MI300 has unified memory and with CXL.memory you can have access to as much memory as you want. That way you don't have to manage different CPU/GPU memory locations and just pass pointers around as you want. And with 3d v-cache the CPU and GPU can both have GB of local cache to access the HBM.
It'll be interesting to see how well this scales to multiple APUs. The CPU cores embedded within a single APU will be a lot closer to those GPU dies, but not much closer to the GPU dies in another package than before. And if CPU <-> GPU communication now ends up getting routed through some other CPU cores, that could be a bit of a bottleneck.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
16 Comments
Back to Article
Kevin G - Thursday, June 9, 2022 - link
For tasks that scale well across multiple nodes, the CPU + GPU combination will be a winner.It wouldn't surprise me if the next iteration has an optional FPGA chiplet and the generation after that includes on package fabrics. Things are becoming very, very interesting.
davide445 - Friday, June 10, 2022 - link
Seems to me a concept similar to Tachyum Prodigy universal processorGc - Friday, June 10, 2022 - link
Nit: Regarding "first company to pair HBM with CPU cores", Fujitsu ARM A64FX cores use HBM2.https://images.anandtech.com/doci/15869/1534898193...
mode_13h - Sunday, June 12, 2022 - link
I caught that, as well.Also worth noting: Xeon Phi (KNL) paired a mesh of enhanced Silvermont Atom cores with 16 GB of in-package MCDRAM (+ Purley's usually 6-channel DDR4 external DRAM), back in 2016.
https://www.anandtech.com/show/9794/a-few-notes-on...
Khanan - Friday, June 10, 2022 - link
“CNDA 3” same typo here as well :DKhanan - Friday, June 10, 2022 - link
This will probably destroy Intels offerings and it’s possible it will beat Nvidias stuff too.mode_13h - Sunday, June 12, 2022 - link
Based on ...?zamroni - Friday, July 1, 2022 - link
based on sighting in rear view mirrormode_13h - Sunday, June 12, 2022 - link
In the MI210 comments, you spend a dozen posts attacking me because:"Just don’t talk about unreleased stuff and praise Intel for things they didn’t do."
Seems to me like you're talking about unreleased stuff and praising AMD for things they didn't do.
mode_13h - Sunday, June 12, 2022 - link
Also, praising unreleased products as superior is definitely something a shill would do. In the MI210 thread, I never said Arctic Sound would be faster/better, just that I was more interested in using it, based on my prior experience with Intel's GPU stack. In spite of my caveats, you stated:"If two people come and say you’re a Intel shill or biased I would start thinking about myself and not endlessly deflect everything."
Just because you and some other random account I'd never seen before (supdawgwtfd) dropped in and attacked me. So, let's hope nobody else in this thread thinks you're acting like a shill.
ET - Saturday, June 11, 2022 - link
As a developer I'm in two minds about this. On one hand, there are times when using the CPU for specific parts of the code path can be helpful, and not having to transfer data for that will certainly help. On the other hand, one of the benefits of a discrete CPU is that it can have a large RAM space for relatively cheaply. Moving data to CPU RAM can help free GPU RAM. When GPU RAM is used for both CPU and GPU, there's less RAM for algorithms that need a lot of GPU data.Optimally I think it could be best to have the CPU and GPU share RAM but still have access to a large pool of slower, expandable RAM.
sgeocla - Saturday, June 11, 2022 - link
The MI300 has unified memory and with CXL.memory you can have access to as much memory as you want. That way you don't have to manage different CPU/GPU memory locations and just pass pointers around as you want. And with 3d v-cache the CPU and GPU can both have GB of local cache to access the HBM.mode_13h - Sunday, June 12, 2022 - link
It'll be interesting to see how well this scales to multiple APUs. The CPU cores embedded within a single APU will be a lot closer to those GPU dies, but not much closer to the GPU dies in another package than before. And if CPU <-> GPU communication now ends up getting routed through some other CPU cores, that could be a bit of a bottleneck.