Weird, I thought HBM latency not as good as regular DRAM. Also the memory capacity is tiny right? Like you see cards with only 16GB HBM but servers can have 256 or more GBs of regular DRAM.
HBM generally has better latency and bandwidth characteristics than DRAM. In some scenarios nearly twice better. The trade-off you correctly note is capacity.
HBM has improved bandwidth, but access latency suffers (up to 20% compared to DDR: https://arxiv.org/pdf/1704.08273.pdf), similar to GDDR vs. DDR. As your interface becomes more parallel, latency to retrieve any given bit goes up (wider & lower clocked vs. narrower and higher clocked bus).
The “HBM” in that paper is based on Hybrid Memory Cube, not the Jedec HBM. HMC has a very different interface and access model from HBM2, so it’s not clear that the latency conclusions from that apply at all. In particular, HMC has a high speed serdes component for the actual link between memory stacks and the main processor, whereas HBM2 keep the interface wide (which necessitates the silicon interposed).
4 HBM2e could handle up to 96GB I believe, with 24GB stacks (12Hi) - although more likely it will be a 32GB and 64GB configuration. HBM shouldn't have higher latency than off-package DRAM surely!
It could be the system uses the HBM as a massive L4 cache, or memory is partitioned under software control so you can put the data where you like, and so on.
HMC is dead. IBM's OMI sometimes feels like a bit of a spiritual sequel to it ("move memory controller to the endpoint and fan out from it; connect to the CPU with fast serial links") but doesn't do the stacking part.
Slow Atom cores with bolted on AVX512 at just 1.4GHz never was a good idea. People did take notes indeed - A64FX is about 2.5 times as fast as Xeon Phi and 4 times more power efficient.
Given Zeus will be even faster than Neoverse N1, it should be ridiculously quick. With 72 cores it'll beat EPYC 2 and 3. Making super computers out of cores this fast actually makes sense.
Exactly that. The problem with the Larrabee project right from the start was using the wrong tools for the wrong jobs. Until now Intel has only ever really had an x86 hammer (ignoring the Itanic), so they treat every problem like an x86 nail.
And then the nail fails as expected and gets cancelled... Anyone remember Quark? Using an ancient 80486 to enter the microcontroller market was just insanity and proof Intel completely lost its marbles.
Maybe Intel used the Quark SBC projects as a public beta test for the later inclusion in their PCHs as the ME and the Innovation Engine. But claiming that publicly would be just paranoid. =)
There is a pattern of continuous integration here though, as always for Intel. One could see the Phi being already a part of Xeon architecture, particularly so in the light of the heavy AVX512 clock frequency offsets. That certainly would solve the Phi's issue with high bandwidth network related processing in certain workloads which was an issue Mellanox marketing could drill into.
HBM2E has similar latency to ordinary DRAM and a much higher bandwidth (>400GB/sec stack and 4 stacks for this design giving a possible bandwidth of >1.6TB/sec). HBM2E capacity is lower than standard DRAM - max so far is 16GB per stack giving a maximum total of 64GB for this design. The HBM2E stacks are best thought of as local storage (somewhat like a huge L4 cache) for arrays and data that is very frequently accessed. High capacity DRAM is normally slower as large DIMMs (eg 256GB) need buffering circuits due to the number of chips on the DIMM and even without that the bandwidth per DDR4 channel (under 30GB/sec) is less than one tenth of the bandwidth of an HBM2E stack.
Hmm, any info when Zeus will be released (it's A77 ? N1 core was announced 1,5 year ago and still no leaks about N2. Cortex A series gets new core every year, hoped for same schedule for servers. In this news ( https://www.anandtech.com/show/15738/epi-backed-si... ) SiPearl expects to release Zeus based SoC in 2022, it's so so far away...
It's all an "arms race" (pun intended) into which country can push the envelope technology wise. At the end of the day, the more powerful nation isn't the one with the higher population, or freer economy, or even the most natural resources. It's a culmination of the above, plus technology to bring ideas into reality.
I made the point a few weeks back, of say teleporting the entire nation of Japan from the year 2000, back in time, to 1900. So 100 years backwards. They would decimate and dominate the world financially and military. Or if the point wasn't made abundant, you could take it back another 100 years to 1800. And even pit the entire world population and nations against them. They would win, quite easily too.
So the next "main race" between nations is in the information technology sector. That's why everyone is touchy-feely about intellectual property, and others interested in leaks. Case in point, iPhone leaks bombarding my feed.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
27 Comments
Back to Article
webdoctors - Tuesday, September 8, 2020 - link
Weird, I thought HBM latency not as good as regular DRAM. Also the memory capacity is tiny right? Like you see cards with only 16GB HBM but servers can have 256 or more GBs of regular DRAM.surt - Tuesday, September 8, 2020 - link
HBM generally has better latency and bandwidth characteristics than DRAM. In some scenarios nearly twice better. The trade-off you correctly note is capacity.edzieba - Tuesday, September 8, 2020 - link
HBM has improved bandwidth, but access latency suffers (up to 20% compared to DDR: https://arxiv.org/pdf/1704.08273.pdf), similar to GDDR vs. DDR. As your interface becomes more parallel, latency to retrieve any given bit goes up (wider & lower clocked vs. narrower and higher clocked bus).anonomouse - Tuesday, September 8, 2020 - link
The “HBM” in that paper is based on Hybrid Memory Cube, not the Jedec HBM. HMC has a very different interface and access model from HBM2, so it’s not clear that the latency conclusions from that apply at all. In particular, HMC has a high speed serdes component for the actual link between memory stacks and the main processor, whereas HBM2 keep the interface wide (which necessitates the silicon interposed).brucethemoose - Tuesday, September 8, 2020 - link
HBM2E supports up to 24GB a stack, but IDK if anyone is shipping more than 16GB.psychobriggsy - Tuesday, September 8, 2020 - link
4 HBM2e could handle up to 96GB I believe, with 24GB stacks (12Hi) - although more likely it will be a 32GB and 64GB configuration. HBM shouldn't have higher latency than off-package DRAM surely!It could be the system uses the HBM as a massive L4 cache, or memory is partitioned under software control so you can put the data where you like, and so on.
CajunArson - Tuesday, September 8, 2020 - link
72 cores? HBM and DDR?I remember when the usual suspects called that a dumb idea 5 years ago when Intel actually put it on the market. Of course, they don't make chips.
Looks like the people who do make chips were busy taking notes and copying the idea.
TeXWiller - Tuesday, September 8, 2020 - link
Just think of all the crazy glue required between the memory chips for pinning, blocking, streaming and the general scratchpaddery! ;)edzieba - Tuesday, September 8, 2020 - link
Larrabee / Xeon Phi (same die, different names) really was ahead of its time. Whatever happened to HMC, anyway?SarahKerrigan - Tuesday, September 8, 2020 - link
HMC is dead. IBM's OMI sometimes feels like a bit of a spiritual sequel to it ("move memory controller to the endpoint and fan out from it; connect to the CPU with fast serial links") but doesn't do the stacking part.Richard Trauben - Thursday, September 10, 2020 - link
micron dropped support. hbm banndwidth is ~10X hmc.roadmap.hbm: 128 bytes @ single digit GHZ. pinout w/in package.
hmb: <1 byte @ tens of GHz..pinout leaves package.
Santoval - Wednesday, September 16, 2020 - link
HBM killed it..Wilco1 - Tuesday, September 8, 2020 - link
Slow Atom cores with bolted on AVX512 at just 1.4GHz never was a good idea. People did take notes indeed - A64FX is about 2.5 times as fast as Xeon Phi and 4 times more power efficient.Given Zeus will be even faster than Neoverse N1, it should be ridiculously quick. With 72 cores it'll beat EPYC 2 and 3. Making super computers out of cores this fast actually makes sense.
Spunjji - Wednesday, September 9, 2020 - link
Exactly that. The problem with the Larrabee project right from the start was using the wrong tools for the wrong jobs. Until now Intel has only ever really had an x86 hammer (ignoring the Itanic), so they treat every problem like an x86 nail.Wilco1 - Wednesday, September 9, 2020 - link
And then the nail fails as expected and gets cancelled... Anyone remember Quark? Using an ancient 80486 to enter the microcontroller market was just insanity and proof Intel completely lost its marbles.TeXWiller - Wednesday, September 9, 2020 - link
Maybe Intel used the Quark SBC projects as a public beta test for the later inclusion in their PCHs as the ME and the Innovation Engine. But claiming that publicly would be just paranoid. =)There is a pattern of continuous integration here though, as always for Intel. One could see the Phi being already a part of Xeon architecture, particularly so in the light of the heavy AVX512 clock frequency offsets. That certainly would solve the Phi's issue with high bandwidth network related processing in certain workloads which was an issue Mellanox marketing could drill into.
Duncan Macdonald - Tuesday, September 8, 2020 - link
HBM2E has similar latency to ordinary DRAM and a much higher bandwidth (>400GB/sec stack and 4 stacks for this design giving a possible bandwidth of >1.6TB/sec). HBM2E capacity is lower than standard DRAM - max so far is 16GB per stack giving a maximum total of 64GB for this design. The HBM2E stacks are best thought of as local storage (somewhat like a huge L4 cache) for arrays and data that is very frequently accessed.High capacity DRAM is normally slower as large DIMMs (eg 256GB) need buffering circuits due to the number of chips on the DIMM and even without that the bandwidth per DDR4 channel (under 30GB/sec) is less than one tenth of the bandwidth of an HBM2E stack.
Tabalan - Tuesday, September 8, 2020 - link
Hmm, any info when Zeus will be released (it's A77 ? N1 core was announced 1,5 year ago and still no leaks about N2. Cortex A series gets new core every year, hoped for same schedule for servers.In this news ( https://www.anandtech.com/show/15738/epi-backed-si... ) SiPearl expects to release Zeus based SoC in 2022, it's so so far away...
anonomouse - Tuesday, September 8, 2020 - link
Appears that it’ll be announced at Arm Dev summit next month.Tabalan - Tuesday, September 8, 2020 - link
Thanks.Btw, can we get edit button please? It would be helpful against effects of momentary brain fart.
GreenReaper - Tuesday, September 8, 2020 - link
I don't see how this is a "blunder". We paid for this in part, through grants - the public have a right to know.Andrei Frumusanu - Tuesday, September 8, 2020 - link
Usually you disclose things through a controlled and detailed manner, not through a random background wall poster in a Twitter image.mode_13h - Tuesday, September 8, 2020 - link
And are you privy to details of Airbus airframes currently under development?mode_13h - Tuesday, September 8, 2020 - link
My point is that if the venture is intended to be commercially viable, then that's how you should see it - a subsidized private venture.If it's open source IP that you want, then look to university projects and the like, that are more research-focused.
Spunjji - Wednesday, September 9, 2020 - link
The public pay for private ventures through the money they spend, too. Never really understood this notional boundary.Kangal - Thursday, September 10, 2020 - link
It's all an "arms race" (pun intended) into which country can push the envelope technology wise. At the end of the day, the more powerful nation isn't the one with the higher population, or freer economy, or even the most natural resources. It's a culmination of the above, plus technology to bring ideas into reality.I made the point a few weeks back, of say teleporting the entire nation of Japan from the year 2000, back in time, to 1900. So 100 years backwards. They would decimate and dominate the world financially and military. Or if the point wasn't made abundant, you could take it back another 100 years to 1800. And even pit the entire world population and nations against them. They would win, quite easily too.
So the next "main race" between nations is in the information technology sector. That's why everyone is touchy-feely about intellectual property, and others interested in leaks. Case in point, iPhone leaks bombarding my feed.
Thud2 - Friday, September 11, 2020 - link
"surrounded by various IP whose labels are too small to be legible."ENHANCE!