HBM requires entirely new memory controllers, generally graphics chips need to be built from the ground up to support it, and considering its cost and capacity restrictions per block (even HBM2) I suspect GDDR5X will be the industry standard for the next few years.
Keeping in mind the reasons we've been using GDDR5 for the last decade: GPU's haven't been bandwidth starved. Because the industry has been stuck on 28nm for the last 6 years, GPU's haven't actually pushed memory technology. Now that 20nm is a reality for next-gen GPU's, that's going to change, but not dramatically. With GDDR5X, HBM just isn't needed....yet. But it will be. It's a superior memory architecture to DDR, but so was RAMBUS. It just wasn't necessary for the applications they were pushing it into.
GDDR5 has been used for 7 1/2 years, not 10. 28nm graphics cards came out 4 years ago, not 6. Why do you exaggerate instead of using the real numbers?
GPUs haven't been bandwidth starved because 1) GDDR5 bandwidths have increased somewhat since launch, though not much for a while, through wider buses and faster clocks and 2) compression technology has gotten better. They knew memory bandwidth would be a problem and I'm guessing settled on putting the work and money into compression specifically to solve that problem. The DRAM nodes have seen a slowdown in shrinkage along with the processor nodes, I believe. They are just entering 20nm GDDR5 now. It's reasonable to believe that in a world where the processor nodes had been shrunk faster, DRAM nodes also would have been able to, and would have been made to in order to keep up. AMD's Fury/Nano seems like they would have been bandwidth starved, or at least used a whole lot of power for its memory subsystem, without HBM1 even though it was still on a 28nm process because their color compression is not as good as Maxwell's.
I don't think HBM2 faces a capacity restriction compared to GDDR5. GDDR5 can probably still keep up with the graphics bandwidth requirements of the upcoming 14nm/16nm GPUs if pushed, but while consuming much more power. I'm guessing the biggest use for HBM2 in the beginning will probably be for compute where the bandwidth is more important. Another use will be where power consumption is at a premium. It all has to do with what people are willing to pay for, as Huang said.
Calm the fuck down, maybe you shouldn't read between the lines. The point of my post, which is correct, is with the introduction of GDDR5X, HBM is currently not needed, costs too much, and essentially needs more than a memory controller built around it (such as next-gen GPU architectures.
As I said, it will be a few years before HBM becomes mainstream. I'll meet you back here in a few years...
"Calm the fuck down, maybe you shouldn't read between the lines."
How was I not calm? You're not calm. Read between what lines? You said lots of things that were inaccurate. Not sure why you're meeting me back here in a few years. I never said HBM will or won't "become mainstream" whatever that means,
Both ya'all. The bandwidth is not needed if you consider pumping more lanes as a 'cheap' solution. a 512 bit card (290) can be rivalled by a 970 (256). However, I think we all know how memory starved the 970/ 980 is despite the delta color compression. 8GHz when you overclock the card, is what cripples it. I see a card like the 960 benefiting from such tech at even a 128 Bit bus. If you argue otherwise, please, buy stuff, experiment, BIOS edit and remove power limits, and you'll be better informed.
Source? I've owned the 280x (7970), 290x, 960, 970.
Can't say as Yojimbo sounds particularly irate here. There are some good counter-points that aren't mutually exclusive to your own points. At the end of the day, I think you both agree that HBM won't initially be used on mainstream cards, though perhaps you dispute its usefulness in premium products.
@Yojimbo: "I'm guessing the biggest use for HBM2 in the beginning will probably be for compute where the bandwidth is more important. Another use will be where power consumption is at a premium."
@Yojimbo: "GDDR5 has been used for 7 1/2 years, not 10. 28nm graphics cards came out 4 years ago, not 6. Why do you exaggerate instead of using the real numbers?"
I don't think Samus was exaggerating. Rather, Samus was simply trying to make a point and didn't feel the need to spend the time to get exact numbers for a fairly well know fact that the industry has been held up as a whole by lack of process improvements that will be present in the next gen graphics chips. I'm sure Samus thought the estimates thrown out, while not precise would be close enough to get the point across. To be fair, while they were off the mark, the point of the post still stands:
@Samus: "HBM requires entirely new memory controllers, generally graphics chips need to be built from the ground up to support it, and considering its cost and capacity restrictions per block (even HBM2) I suspect GDDR5X will be the industry standard for the next few years."
A huge advantage of HDM over GDDR5 is packaging. Even if performance of GDDR5X were similar to HBM 1, the same capacity memory will take up a lot more PCB space. Easily demonstrated by the Fury cards, which are tiny compared to other GDDR5 cards.
Personally, I don't see how GDDR5X has any hope of competing with HBM (especially HBM 2) in the high-end graphics card market, but GDDR5X, if it's cheap enough, could be a decent replacement for lower end cards that like to use DDR3, or better yet, a faster replacement for main system RAM. That's what I'd like to see: GDDR5X on the motherboard, and HBM 2 on the graphics card. In large quantities of both, please.
HBM actually doesn't require new memory controllers in the sense that you are presenting. If you knew what you were talking about you'd know that the memory controller is the logic slice that is part of the HBM stack. ie it's off the GPU die. The GPU die only contains a PHY.
Your argument that HBM was created primarily for bandwidth is incorrect. HBM was primarily created because GDDR is hitting a power wall. The clock speeds are so high that the power budget is too much to maintain. Clock speeds also are not infinite. We are at that wall also. HBM solves those issue for many years to come.
Also, 20nm? That got skipped. 14/16nm is what AMD and Nvidia are using. Don't even try to say they are close enough so we should "read between the lines."
My guess is GDDR memory is going to be around for a while to supply the lower end GPUs or other areas where it may not make sense to use HBM for some reason or another.
Agreed. It looks really promising as a bridge tech for use in more price sensitive market segments where the cost of the interposer needed for HBM would be prohibitive. With time I suspect the interposer costs will drop; both because the tech itself becomes more mature and because they can be made on older processes that will get steadily cheaper due to lack of general demand.
For this year's card I suspect the $30 pricepoint that's been floated for the first generation ones AMDs using now will probably mean most cards at the sub $200 pricepoint will continue to use conventional GDDR of some sort. There'll probably be some exceptions (especially if smaller/cheaper interposers are available for 1/2 stack implementations); but compact/low profile variant cards have never been a big segment of the market due to both the extra cost involved and the poorer cooling from smaller heatsinks and less fans.
Yeah I don't think we will see HBM cards anywhere NEAR the $200 price point, at least this gen. My bet is that we will see one GPU from AMD with HBM. Then nVidia will be one or two, GP100 for sure, GP104, maybe. Then the next cards down will be GDDR5X, and then below that DDR3/4, for the ultra cheap oem stuff. I bet we will see GDDR5 phased out pretty quick.
I could've sworn I read that GP104 and GP106 were going to have dual HMB2/GDDR5(X?) memory controllers; but couldn't find where I did when I went back to look again. The 106 having one would only make sense if an HBM2 1060 variant was under consideration. HMB2 on midlevel GPUs would make most sense in mobile markets where margins are higher and physical size a more important concern; but if the tech's available and HMB2 supplies aren't prohibitively tight I'd expect someone to make a custom board model for the desktop.
It's been done before with DDR and GDDR support on the same chip. I hope they market it better than previous days, where you'd have two cards using the same GPU designation with both as options.
Even if they made two revisions of the chips with different memory controllers, I wouldn't expect them to perform any different. Even next gen GPU architecture built on 20nm is unlikely to saturate a GDDR5X configuration.
GDDR5X will still use more power and be larger than HBM2. HBM2 seems like it will also get to market faster. So there are reasons other than performance to use it on mobile chips rather than GDDR5/GDDR5X.
Do you have anything constructive to contribute other than trolling all of my posts Yojimbo? You didn't even respond to what I wrote. GDDR5X and HBM are the same size (they're both built on the same process node) it's just that HBM is vertical and DDR is planar. HBM requires a thick PCB because of the interposer layer and additional packaging, making it unrealistic for embedded solutions in the near term. That's why not a single one has been announced.
As far as power consumption? Did you not read the chart? GDDR5X reduced power consumption significantly, and since we don't even know what HBM2 is rated at yet, it's safe to assume after added power consumption of the interposer, package substrate and more complex memory controller, they're virtually equal albeit HBM2 is significantly faster. But as it stands, the performance isn't even needed and won't be needed for some time.
I did respond to what you wrote. You replied to DanNeely, dismissing his notion that HBM may be used in mobile applications because GDDR5X is likely to perform well enough. I wrote in support of what he said by saying that performance is not the only thing to consider.
By mobile, I assume DanNeely is talking about laptops. Why would a GP104 or GP106 or any other "mid-level GPU" be in an SOC?
It's easy to imagine that nothing has been announced as far as using HBM in mobile because 1) NVIDIA isn't using HBM yet and the only thing they've specifically paired it with in their presentations so far are mezzanine boards, as far as I know, and 2) AMD neither has a mobile presence nor deep pockets at the moment. An attack on the high end desktop space was probably a better strategic choice than suddenly trying to chase OEMs for contracts to get into laptops while there were limited HBM chips.
I see nothing in that chart that suggests that GDDR5X, while more power efficient than GDDDR5, can be "safely assumed" to be as power efficient as HBM 2. Not sure why you think the HBM interposer or packaging or even memory controller is going to consume more power than the PCB traces, packaging, and memory controller that GDDRX5 uses. I haven't seen anything about that. In fact, from an EETimes article: http://www.eetimes.com/author.asp?section_id=36&am... "Stacking ICs that communicate with one another to minimize the signal interconnect is an emerging trend in low-power design. We have seen several cases where processors and memory are stacked over a silicon interposer that makes connections using Through Silicon Vias (TSV). These interposers provide a low capacitance signal interconnect between die, thus reducing the I/O active power consumption."
I'm not sure why you think I'm attacking you just because I am disagreeing with you. I wouldn't be surprised at all if HBM2 is hardly seen outside the compute market (I would be surprised if HBM2 isn't seen heavily in the compute market) in 2016. A lot has to do with pricing and availability of GDDR5, GDDR5X, and HBM2. My point was that performance is not the only consideration. High performance laptops are likely not very cost sensitive, so saving on power and space may be worth the higher cost. So I think DanNeely's suggestion seems plausible at this point.
Samus, you're missing the point. It's actually super easy to saturate GDDR5X. You take the extra bandwidth and shrink the memory bus and make a smaller/cheaper/lower power chip with the same performance as last-gen tech. So yes, there will be some wide-bus implementations that really don't need all the bandwidth... but there will be mid-range and lower chips that really could use more bandwidth, but due to power, temp, and cost constraints they make due with a narrower bus.
In any case where the bus isn't that wide (again, cards on a budget), IF there was two different versions (GDDR5 vs GDDR5X) marketed under the same name, there could and probably would be a performance difference. Now, generally the high-end cards are strictly using the faster memory anyway. If you look at cards with DDR vs GDDR for example, historically this has been something that happened in mid or low-end cards. Right where you would end up with narrower buses! So the likelihood of performance differences is high... just like it has been in the past.
The differences won't be nearly as vast as DDR vs GDDR days, however.
We know DDR4 RAM on the desktop uses less voltage than DDR3 but despite the speed increase — latency increased a lot to compensate for that voltage reduction.
"A lot" being on the order of going from 5.6 to 6.6 nanoseconds for decent (not absolutely top-of-the line gear). Higher end stuff appears to differ less, including a DDR4 kit advertised at 3Ghz 15CL which would be just 5ns of latency (which may be almost too good to be true).
There does seem to be a mild increase in latency, but nothing to write home about - though prices are still clearly higher.
Unless I am mistaken CL ratings refer to latency in terms of clock speed. So going from DDR3 CL12 to DDR4 CL16 will still mean an overall reduction in latency.
That depends on the clock speeds. Looking at high-speed, low latency performance kits on Newegg I would come up with a different comparison. What about a 2 x 8GB 1866 CL8 DDR3 kit vs. 2 x 8GB 2400 CL12 DDR4? I'm not even comparing 4GB modules, so it's not like I'm playing to DDR3's strengths.
Those are two of the lowest-latency kits I saw in 2 x 8. If you go up or down you can favor one tech over the other obviously, but since we weren't talking density I think this is more than fair.
@Alexvrb: "What about a 2 x 8GB 1866 CL8 DDR3 kit vs. 2 x 8GB 2400 CL12 DDR4?"
For those that don't want to do the math themselves: 2 x 8GB 1866 CL8 DDR3 = 4.2ns 2 x 8GB 2400 CL12 DDR4 = 5ns That is, the DDR4 memory has about 19% more latency than the DDR3 in this comparison.
Thanks Burnt. Which goes back to my point... you can pick fantasy numbers and say that DDR4 is better in all regards, or you can look at actual memory kits and discover that (at least thus far) DDR3 actually holds some slight edge in terms of latency.
DDR4 certainly wins on bandwidth and capacity, however. Availability is soon going to be an issue for DDR3 too. I only took exception with the specific example given in terms of latency.
GPUs typically use threads, small caches and buffers to hide the latency. Due to the regular workloads and low cost threading, the hiding is usually successful.
Do you have any thoughts on how GPU threads will work with a super-wide 4096-bit memory bus? Will the compilers be good enough to make this wide bus work well for most apps?
The wider the better, considering that unlike CPUs, GPUs work with large sets of sequential data - vertices, textures and whatnot - 4096 bits are only 512 bytes. Fetching more data per work cycle means compute units will be better fed. Now, it won't result in a significant improvement, since for sequential data only the first access comes at a penalty, and subsequent access latency is masked by the prefetcher, but it means more data per work cycle, thus lower clock frequency and lower power consumption.
If you're referring to the HBM, there are multiple independent (8) channels per stack providing granularity for the memory controller, if I have understood correctly. Those channels certainly are not accessed in lock-step, so there should be a plenty of parallelism available to serve the needs of the execution units, and room for improving the controllers. The applications and compilers will try their best to avoid using the external (of the chip) memory, as they have done so far.
For GPUs? Yeah, bandwidth is king. That's why they never bring up latency. For CPUs? It depends, but there are situations where "slower" memory that's lower latency has advantages. The only time I'd really concern myself with bandwidth with main RAM is if you're building something with a powerful (relatively speaking) iGPU and no discrete GPU, and plan on taxing said iGPU. Such as a 512 shader APU, especially when overclocking the onboard graphics.
It's better than 1st gen HBM. Using the same number of chips as on the 980TI at the same clock rate, you now get 672GB/s in bandwidth. That's because the bus width doubles per chip.
I bet we will see this GDDR5X almost entirely wipe out GDDR5. On the ultra low end we will still se DDR3 (maybe DDR4 now), and the midrand and performance segments will probably pretty much all move to GDDR5X, as it's a lot more cost effective to raise the clock than it is to widen the bus. Then the halo segment will be HBM2.
I really hope that the days of 64 and 128 bit DDR3 as video card memory are mostly behind us and GDDR5 or 5X takes over that role. Even on a 64 bit bus, GDDR5 usually offers 40Gbps versus DDR3's awful 14Gbps. With such limited bandwidth, there's almost no point in purchasing a low end GPU. Just use whatever iGPU is sitting on the CPU package and be done with it. Maybe that will finally change with HBM positioned to take over the top end of GPUs and relegate GDDR to mid- and low-end graphics.
Then again, if Intel or AMD release processors with dedicated HBM, it might make owning even a mid range GPU as pointless as a low end one is now.
The cards using DDR3 now are low end ones with profit margins so tiny that even one or two extra dollars on better ram would wipe out the profit margin. They'll switch to DDR4 as soon as the price crosses; and since GPUs are an area where throughput is more important than latency will get more benefit from it than CPUs are.
GDDR is a premium product and will never come to the entry level GPU market segment. A few years from now we may see GDDR get squeezed out as HBM matures and gets cheap enough to push down the product stack; but DDR will remain on the bottom end until/unless DDR itself is replaced on the CPU front. By HBM, or HMC, or WideIO, possibly by something else. (From what I've read, HBM was tuned for GPU workloads and has too much latency/random IO penalty to be good replacement for on CPUs, HMC was targeted at HPC workloads and is expected to be much more expensive to implement, while WideIO is being aimed at phones/tablets. General purpose CPUs seem to've been missed by all three to one extent or another. My WAG would be that WideIO would probably be the closest fit to what we'd want for a mainstream x86 CPU, and the extra capacity needed could be gotten by just scaling the bus wider than for a phone/tablet CPU; but until Intel or AMD speak about future plans who knows.)
There really aren't that many DDR3 equipped cards for sale even now. Previous gen GPUs that are old stock still on sale are, but at least on the NV side of the house, the GT 730 $50-$60 USD) and up can all be purchased with GDDR5 leaving ONLY the GT 720 as the outstanding retail GPU that has nothing but DDR3. As far as I can tell, DDR3's days are over and I'm not certain that DDR4 is good enough to end up on the extreme low end. Even if it does, a narrow 64-bit bus will prohibit performance to the point where Intel's GT2 parts will offer comparable (and possibly better) end user experiences.
Exactly. I'm sure it's cheaper than HBM else why develop it? Lack of interposer and using existing GDDR5 ecosystems makes this pretty obvious. Question is how much? And is it cheaper than GDDR5?
In a prefetch buffer architecture, when a memory access occurs to a row the buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants adjacent datawords in memory, which in practice is very often the case. For instance, when a 64 bit CPU accesses a 16-bit-wide DRAM chip, it will need 4 adjacent 16 bit datawords to make up the full 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip; it is multiplied by the burst depth "4" to give the size in bits of the full burst sequence). An 8n prefetch buffer on an 8 bit wide DRAM would also accomplish a 64 bit transfer.
The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core (each memory access results in a burst of 8 datawords on the IOs). Thus a 200 MHz memory core is combined with IOs that each operate eight times faster (1600 megabits per second). If the memory has 16 IOs, the total read bandwidth would be 200 MHz x 8 datawords/access x 16 IOs = 25.6 gigabits per second (Gbit/s), or 3.2 gigabytes per second (GB/s). Modules with multiple DRAM chips can provide correspondingly higher bandwidth.
N is not a unit, not everything has to be, because units are conventionally separated by a space like so: 256 GB. n just stands for "times", it's just an index or a variable in the sense of "n-times" or "to the nth degree". But yeah, it can mean that 8 datawords can be prefetched at a time, or so I gather from the above wiki-quote. I was asking myself the same question BTW.
Doesn't sound like anything new. Just another die shrink while doubling the pre-fetch. The basic principal for the last 16+ years since the start of DDR SDRAM. The only reason they re-used the GDDR5 in the name was because the signals are the same as GDDR5. GDDR# was a fork of the DDR# standards and has often incorporated features that later find their way into DDR# standards. There is not a direct GDDR5 = DDR5 or similar relationship because GPU can be designed for specific memory without some of the limitations that apply to DIMM's in sockets attached to CPU's that are further away.
It looks like GDDR5X could be faster than 1st gen HBM; on a 384-bit bus it could achieve 672 GB/s vs. HBM's 512 GB/s. And if they manage to achieve, say, 12 Gbps on a 512-bit bus, the same way AMD manages 6 Gbps GDDR5 on a 512-bit bus, we could be looking at a GDDR5X limit of about 768 GB/s - that seems mighty impressive for a serial memory technology.
If I'm not mistaken, a while back NVIDIA announced they won't achieve their initial goal of 1 TB/s on their first gen HBM. Is it possible that GDDR5X stays competitive performance-wise with HBM for longer than expected? How about "budget" versions of future flagships utilizing GDDR5X instead of HBM? After factoring in color compression a GDDR5X card might achieve a theoretical maximum very close to 1 TB/s - wouldn't that be enough even for 4K and VR?
Seems like for truly high end applications, HBM will be the standard. But this will enable higher performance mid range cards one or two steps down from the highest end cards.
Their naming scheme is crap. They should name it GDDR6. Why? Because when someone does a search on EBay for GDDR5X it will also give GDDR5 options and the same goes for other sites. Should these sites fix their searches? Yes, but they won't/don't and in the end it makes it confusing for consumers.
To bad we didn't skip DDR4 memory as system memory. Even when graphics card first came out with GDDR 4, the change wasn't all that impressive. But when I bought an HD 4870, with GDDR5, I was ecstatic. This new 5X seems cool but more like the DDR3 to 4 transition.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
70 Comments
Back to Article
jragonsoul - Friday, January 22, 2016 - link
Wow... impressive and damn near 1st gen HBM performance. I wonder how 2nd gen HBM will do compared...Stuka87 - Friday, January 22, 2016 - link
It has HBM2 in one of the charts. Its listed as 1TB/sec.phoenix_rizzen - Friday, January 22, 2016 - link
And there's even an article about HBM2 on this very site, posted just yesterday. ;)ToTTenTranz - Friday, January 22, 2016 - link
HBM rev.2 doubles the bandwidth, quadruples the memory density and is in volume production right now.GDDR5X will only start production in half a year.
Samus - Saturday, January 23, 2016 - link
HBM requires entirely new memory controllers, generally graphics chips need to be built from the ground up to support it, and considering its cost and capacity restrictions per block (even HBM2) I suspect GDDR5X will be the industry standard for the next few years.Keeping in mind the reasons we've been using GDDR5 for the last decade: GPU's haven't been bandwidth starved. Because the industry has been stuck on 28nm for the last 6 years, GPU's haven't actually pushed memory technology. Now that 20nm is a reality for next-gen GPU's, that's going to change, but not dramatically. With GDDR5X, HBM just isn't needed....yet. But it will be. It's a superior memory architecture to DDR, but so was RAMBUS. It just wasn't necessary for the applications they were pushing it into.
Yojimbo - Sunday, January 24, 2016 - link
GDDR5 has been used for 7 1/2 years, not 10. 28nm graphics cards came out 4 years ago, not 6. Why do you exaggerate instead of using the real numbers?GPUs haven't been bandwidth starved because 1) GDDR5 bandwidths have increased somewhat since launch, though not much for a while, through wider buses and faster clocks and 2) compression technology has gotten better. They knew memory bandwidth would be a problem and I'm guessing settled on putting the work and money into compression specifically to solve that problem. The DRAM nodes have seen a slowdown in shrinkage along with the processor nodes, I believe. They are just entering 20nm GDDR5 now. It's reasonable to believe that in a world where the processor nodes had been shrunk faster, DRAM nodes also would have been able to, and would have been made to in order to keep up. AMD's Fury/Nano seems like they would have been bandwidth starved, or at least used a whole lot of power for its memory subsystem, without HBM1 even though it was still on a 28nm process because their color compression is not as good as Maxwell's.
I don't think HBM2 faces a capacity restriction compared to GDDR5. GDDR5 can probably still keep up with the graphics bandwidth requirements of the upcoming 14nm/16nm GPUs if pushed, but while consuming much more power. I'm guessing the biggest use for HBM2 in the beginning will probably be for compute where the bandwidth is more important. Another use will be where power consumption is at a premium. It all has to do with what people are willing to pay for, as Huang said.
Samus - Sunday, January 24, 2016 - link
Calm the fuck down, maybe you shouldn't read between the lines. The point of my post, which is correct, is with the introduction of GDDR5X, HBM is currently not needed, costs too much, and essentially needs more than a memory controller built around it (such as next-gen GPU architectures.As I said, it will be a few years before HBM becomes mainstream. I'll meet you back here in a few years...
Yojimbo - Sunday, January 24, 2016 - link
"Calm the fuck down, maybe you shouldn't read between the lines."How was I not calm? You're not calm. Read between what lines? You said lots of things that were inaccurate. Not sure why you're meeting me back here in a few years. I never said HBM will or won't "become mainstream" whatever that means,
0razor1 - Monday, January 25, 2016 - link
Both ya'all.The bandwidth is not needed if you consider pumping more lanes as a 'cheap' solution. a 512 bit card (290) can be rivalled by a 970 (256). However, I think we all know how memory starved the 970/
980 is despite the delta color compression. 8GHz when you overclock the card, is what cripples it.
I see a card like the 960 benefiting from such tech at even a 128 Bit bus. If you argue otherwise, please, buy stuff, experiment, BIOS edit and remove power limits, and you'll be better informed.
Source? I've owned the 280x (7970), 290x, 960, 970.
BurntMyBacon - Monday, January 25, 2016 - link
@Samus: "Calm the fuck down, ..."Can't say as Yojimbo sounds particularly irate here. There are some good counter-points that aren't mutually exclusive to your own points. At the end of the day, I think you both agree that HBM won't initially be used on mainstream cards, though perhaps you dispute its usefulness in premium products.
@Yojimbo: "I'm guessing the biggest use for HBM2 in the beginning will probably be for compute where the bandwidth is more important. Another use will be where power consumption is at a premium."
BurntMyBacon - Monday, January 25, 2016 - link
@Yojimbo: "GDDR5 has been used for 7 1/2 years, not 10. 28nm graphics cards came out 4 years ago, not 6. Why do you exaggerate instead of using the real numbers?"I don't think Samus was exaggerating. Rather, Samus was simply trying to make a point and didn't feel the need to spend the time to get exact numbers for a fairly well know fact that the industry has been held up as a whole by lack of process improvements that will be present in the next gen graphics chips. I'm sure Samus thought the estimates thrown out, while not precise would be close enough to get the point across. To be fair, while they were off the mark, the point of the post still stands:
@Samus: "HBM requires entirely new memory controllers, generally graphics chips need to be built from the ground up to support it, and considering its cost and capacity restrictions per block (even HBM2) I suspect GDDR5X will be the industry standard for the next few years."
rarson - Friday, January 29, 2016 - link
A huge advantage of HDM over GDDR5 is packaging. Even if performance of GDDR5X were similar to HBM 1, the same capacity memory will take up a lot more PCB space. Easily demonstrated by the Fury cards, which are tiny compared to other GDDR5 cards.Personally, I don't see how GDDR5X has any hope of competing with HBM (especially HBM 2) in the high-end graphics card market, but GDDR5X, if it's cheap enough, could be a decent replacement for lower end cards that like to use DDR3, or better yet, a faster replacement for main system RAM. That's what I'd like to see: GDDR5X on the motherboard, and HBM 2 on the graphics card. In large quantities of both, please.
Despoiler - Monday, January 25, 2016 - link
HBM actually doesn't require new memory controllers in the sense that you are presenting. If you knew what you were talking about you'd know that the memory controller is the logic slice that is part of the HBM stack. ie it's off the GPU die. The GPU die only contains a PHY.Your argument that HBM was created primarily for bandwidth is incorrect. HBM was primarily created because GDDR is hitting a power wall. The clock speeds are so high that the power budget is too much to maintain. Clock speeds also are not infinite. We are at that wall also. HBM solves those issue for many years to come.
Also, 20nm? That got skipped. 14/16nm is what AMD and Nvidia are using. Don't even try to say they are close enough so we should "read between the lines."
xenol - Tuesday, January 26, 2016 - link
My guess is GDDR memory is going to be around for a while to supply the lower end GPUs or other areas where it may not make sense to use HBM for some reason or another.DanNeely - Friday, January 22, 2016 - link
Agreed. It looks really promising as a bridge tech for use in more price sensitive market segments where the cost of the interposer needed for HBM would be prohibitive. With time I suspect the interposer costs will drop; both because the tech itself becomes more mature and because they can be made on older processes that will get steadily cheaper due to lack of general demand.For this year's card I suspect the $30 pricepoint that's been floated for the first generation ones AMDs using now will probably mean most cards at the sub $200 pricepoint will continue to use conventional GDDR of some sort. There'll probably be some exceptions (especially if smaller/cheaper interposers are available for 1/2 stack implementations); but compact/low profile variant cards have never been a big segment of the market due to both the extra cost involved and the poorer cooling from smaller heatsinks and less fans.
extide - Friday, January 22, 2016 - link
Yeah I don't think we will see HBM cards anywhere NEAR the $200 price point, at least this gen. My bet is that we will see one GPU from AMD with HBM. Then nVidia will be one or two, GP100 for sure, GP104, maybe. Then the next cards down will be GDDR5X, and then below that DDR3/4, for the ultra cheap oem stuff. I bet we will see GDDR5 phased out pretty quick.DanNeely - Saturday, January 23, 2016 - link
I could've sworn I read that GP104 and GP106 were going to have dual HMB2/GDDR5(X?) memory controllers; but couldn't find where I did when I went back to look again. The 106 having one would only make sense if an HBM2 1060 variant was under consideration. HMB2 on midlevel GPUs would make most sense in mobile markets where margins are higher and physical size a more important concern; but if the tech's available and HMB2 supplies aren't prohibitively tight I'd expect someone to make a custom board model for the desktop.Alexvrb - Saturday, January 23, 2016 - link
It's been done before with DDR and GDDR support on the same chip. I hope they market it better than previous days, where you'd have two cards using the same GPU designation with both as options.Samus - Saturday, January 23, 2016 - link
Even if they made two revisions of the chips with different memory controllers, I wouldn't expect them to perform any different. Even next gen GPU architecture built on 20nm is unlikely to saturate a GDDR5X configuration.Yojimbo - Sunday, January 24, 2016 - link
GDDR5X will still use more power and be larger than HBM2. HBM2 seems like it will also get to market faster. So there are reasons other than performance to use it on mobile chips rather than GDDR5/GDDR5X.Samus - Sunday, January 24, 2016 - link
Do you have anything constructive to contribute other than trolling all of my posts Yojimbo? You didn't even respond to what I wrote. GDDR5X and HBM are the same size (they're both built on the same process node) it's just that HBM is vertical and DDR is planar. HBM requires a thick PCB because of the interposer layer and additional packaging, making it unrealistic for embedded solutions in the near term. That's why not a single one has been announced.As far as power consumption? Did you not read the chart? GDDR5X reduced power consumption significantly, and since we don't even know what HBM2 is rated at yet, it's safe to assume after added power consumption of the interposer, package substrate and more complex memory controller, they're virtually equal albeit HBM2 is significantly faster. But as it stands, the performance isn't even needed and won't be needed for some time.
Yojimbo - Sunday, January 24, 2016 - link
I did respond to what you wrote. You replied to DanNeely, dismissing his notion that HBM may be used in mobile applications because GDDR5X is likely to perform well enough. I wrote in support of what he said by saying that performance is not the only thing to consider.By mobile, I assume DanNeely is talking about laptops. Why would a GP104 or GP106 or any other "mid-level GPU" be in an SOC?
GDDR5X chips and HBM chips do not take up the same PCB footprint. Here you can see the thickness of the R9 Nano PCB: http://images.hardwarecanucks.com/image//skymtl/GP... Here you can see how the RAM chips are not high: http://images.hardwarecanucks.com/image//skymtl/GP... It doesn't look unrealistic to put it in a laptop.
It's easy to imagine that nothing has been announced as far as using HBM in mobile because 1) NVIDIA isn't using HBM yet and the only thing they've specifically paired it with in their presentations so far are mezzanine boards, as far as I know, and 2) AMD neither has a mobile presence nor deep pockets at the moment. An attack on the high end desktop space was probably a better strategic choice than suddenly trying to chase OEMs for contracts to get into laptops while there were limited HBM chips.
I see nothing in that chart that suggests that GDDR5X, while more power efficient than GDDDR5, can be "safely assumed" to be as power efficient as HBM 2. Not sure why you think the HBM interposer or packaging or even memory controller is going to consume more power than the PCB traces, packaging, and memory controller that GDDRX5 uses. I haven't seen anything about that. In fact, from an EETimes article: http://www.eetimes.com/author.asp?section_id=36&am... "Stacking ICs that communicate with one another to minimize the signal interconnect is an emerging trend in low-power design. We have seen several cases where processors and memory are stacked over a silicon interposer that makes connections using Through Silicon Vias (TSV). These interposers provide a low capacitance signal interconnect between die, thus reducing the I/O active power consumption."
I'm not sure why you think I'm attacking you just because I am disagreeing with you. I wouldn't be surprised at all if HBM2 is hardly seen outside the compute market (I would be surprised if HBM2 isn't seen heavily in the compute market) in 2016. A lot has to do with pricing and availability of GDDR5, GDDR5X, and HBM2. My point was that performance is not the only consideration. High performance laptops are likely not very cost sensitive, so saving on power and space may be worth the higher cost. So I think DanNeely's suggestion seems plausible at this point.
Samus - Monday, January 25, 2016 - link
tl/drSee ya back here in a few years to discuss all those laptops you're saying HBM-equipped GPU's will be in...
h4rm0ny - Tuesday, January 26, 2016 - link
>>"tl/dr"Too long for you, maybe. But I read it and found it a very interesting post. Thank you for writing it, Yojimbo.
DanNeely - Monday, January 25, 2016 - link
If nothing else, TDP not spent on GDDR can be spent on bumping the GPU clock for a bit of extra speed.Alexvrb - Saturday, January 30, 2016 - link
Samus, you're missing the point. It's actually super easy to saturate GDDR5X. You take the extra bandwidth and shrink the memory bus and make a smaller/cheaper/lower power chip with the same performance as last-gen tech. So yes, there will be some wide-bus implementations that really don't need all the bandwidth... but there will be mid-range and lower chips that really could use more bandwidth, but due to power, temp, and cost constraints they make due with a narrower bus.In any case where the bus isn't that wide (again, cards on a budget), IF there was two different versions (GDDR5 vs GDDR5X) marketed under the same name, there could and probably would be a performance difference. Now, generally the high-end cards are strictly using the faster memory anyway. If you look at cards with DDR vs GDDR for example, historically this has been something that happened in mid or low-end cards. Right where you would end up with narrower buses! So the likelihood of performance differences is high... just like it has been in the past.
The differences won't be nearly as vast as DDR vs GDDR days, however.
Lolimaster - Tuesday, January 26, 2016 - link
AMD plans to use HBM from top to bottom including APU's as soon as possible.Oxford Guy - Friday, January 22, 2016 - link
The word latency doesn't appear anywhere.We know DDR4 RAM on the desktop uses less voltage than DDR3 but despite the speed increase — latency increased a lot to compensate for that voltage reduction.
Is it that latency is immaterial for GPU RAM?
emn13 - Friday, January 22, 2016 - link
"A lot" being on the order of going from 5.6 to 6.6 nanoseconds for decent (not absolutely top-of-the line gear). Higher end stuff appears to differ less, including a DDR4 kit advertised at 3Ghz 15CL which would be just 5ns of latency (which may be almost too good to be true).There does seem to be a mild increase in latency, but nothing to write home about - though prices are still clearly higher.
Gigaplex - Saturday, January 23, 2016 - link
That's still a 15%-20% increase in latency. That's pretty significant, especially for latency dependent workloads.colinisation - Saturday, January 23, 2016 - link
Unless I am mistaken CL ratings refer to latency in terms of clock speed. So going from DDR3 CL12 to DDR4 CL16 will still mean an overall reduction in latency.Alexvrb - Saturday, January 23, 2016 - link
That depends on the clock speeds. Looking at high-speed, low latency performance kits on Newegg I would come up with a different comparison. What about a 2 x 8GB 1866 CL8 DDR3 kit vs. 2 x 8GB 2400 CL12 DDR4? I'm not even comparing 4GB modules, so it's not like I'm playing to DDR3's strengths.http://www.newegg.com/Product/Product.aspx?Item=N8...
http://www.newegg.com/Product/Product.aspx?Item=N8...
Those are two of the lowest-latency kits I saw in 2 x 8. If you go up or down you can favor one tech over the other obviously, but since we weren't talking density I think this is more than fair.
BurntMyBacon - Monday, January 25, 2016 - link
@Alexvrb: "What about a 2 x 8GB 1866 CL8 DDR3 kit vs. 2 x 8GB 2400 CL12 DDR4?"For those that don't want to do the math themselves:
2 x 8GB 1866 CL8 DDR3 = 4.2ns
2 x 8GB 2400 CL12 DDR4 = 5ns
That is, the DDR4 memory has about 19% more latency than the DDR3 in this comparison.
Alexvrb - Saturday, January 30, 2016 - link
Thanks Burnt. Which goes back to my point... you can pick fantasy numbers and say that DDR4 is better in all regards, or you can look at actual memory kits and discover that (at least thus far) DDR3 actually holds some slight edge in terms of latency.DDR4 certainly wins on bandwidth and capacity, however. Availability is soon going to be an issue for DDR3 too. I only took exception with the specific example given in terms of latency.
TeXWiller - Friday, January 22, 2016 - link
GPUs typically use threads, small caches and buffers to hide the latency. Due to the regular workloads and low cost threading, the hiding is usually successful.10101010 - Friday, January 22, 2016 - link
Do you have any thoughts on how GPU threads will work with a super-wide 4096-bit memory bus? Will the compilers be good enough to make this wide bus work well for most apps?ddriver - Friday, January 22, 2016 - link
The wider the better, considering that unlike CPUs, GPUs work with large sets of sequential data - vertices, textures and whatnot - 4096 bits are only 512 bytes. Fetching more data per work cycle means compute units will be better fed. Now, it won't result in a significant improvement, since for sequential data only the first access comes at a penalty, and subsequent access latency is masked by the prefetcher, but it means more data per work cycle, thus lower clock frequency and lower power consumption.Samus - Saturday, January 23, 2016 - link
Exactly. GPU workloads, at least graphics, are very "predictable."TeXWiller - Friday, January 22, 2016 - link
If you're referring to the HBM, there are multiple independent (8) channels per stack providing granularity for the memory controller, if I have understood correctly. Those channels certainly are not accessed in lock-step, so there should be a plenty of parallelism available to serve the needs of the execution units, and room for improving the controllers. The applications and compilers will try their best to avoid using the external (of the chip) memory, as they have done so far.10101010 - Saturday, January 23, 2016 - link
Thanks; the independent channels are what I was wondering about. Looks like 2016 is going to be an amazing year for the GPU.ddriver - Friday, January 22, 2016 - link
Are you sure latency is higher to compensate for the voltage reduction? Because from engineering perspective, that doesn't make a lot of sense.Alexvrb - Saturday, January 23, 2016 - link
For GPUs? Yeah, bandwidth is king. That's why they never bring up latency. For CPUs? It depends, but there are situations where "slower" memory that's lower latency has advantages. The only time I'd really concern myself with bandwidth with main RAM is if you're building something with a powerful (relatively speaking) iGPU and no discrete GPU, and plan on taxing said iGPU. Such as a 512 shader APU, especially when overclocking the onboard graphics.tviceman - Friday, January 22, 2016 - link
If GM200 had 14ghz GDDR5X ram, it's bandwidth would be 672 gb/s.The Von Matrices - Friday, January 22, 2016 - link
If Hawaii/390X had 14 GHz GDDR5X ram, its bandwidth would be 896 GB/sEden-K121D - Friday, June 17, 2016 - link
HaHanandnandnand - Saturday, January 23, 2016 - link
It can't beat HBM1 on bandwidth per watt. HBM2 is even better. I assume it won't be more than a couple years before we hear about an HBM3.patrickjp93 - Tuesday, February 2, 2016 - link
It's better than 1st gen HBM. Using the same number of chips as on the 980TI at the same clock rate, you now get 672GB/s in bandwidth. That's because the bus width doubles per chip.extide - Friday, January 22, 2016 - link
I bet we will see this GDDR5X almost entirely wipe out GDDR5. On the ultra low end we will still se DDR3 (maybe DDR4 now), and the midrand and performance segments will probably pretty much all move to GDDR5X, as it's a lot more cost effective to raise the clock than it is to widen the bus. Then the halo segment will be HBM2.BrokenCrayons - Monday, January 25, 2016 - link
I really hope that the days of 64 and 128 bit DDR3 as video card memory are mostly behind us and GDDR5 or 5X takes over that role. Even on a 64 bit bus, GDDR5 usually offers 40Gbps versus DDR3's awful 14Gbps. With such limited bandwidth, there's almost no point in purchasing a low end GPU. Just use whatever iGPU is sitting on the CPU package and be done with it. Maybe that will finally change with HBM positioned to take over the top end of GPUs and relegate GDDR to mid- and low-end graphics.Then again, if Intel or AMD release processors with dedicated HBM, it might make owning even a mid range GPU as pointless as a low end one is now.
DanNeely - Monday, January 25, 2016 - link
The cards using DDR3 now are low end ones with profit margins so tiny that even one or two extra dollars on better ram would wipe out the profit margin. They'll switch to DDR4 as soon as the price crosses; and since GPUs are an area where throughput is more important than latency will get more benefit from it than CPUs are.GDDR is a premium product and will never come to the entry level GPU market segment. A few years from now we may see GDDR get squeezed out as HBM matures and gets cheap enough to push down the product stack; but DDR will remain on the bottom end until/unless DDR itself is replaced on the CPU front. By HBM, or HMC, or WideIO, possibly by something else. (From what I've read, HBM was tuned for GPU workloads and has too much latency/random IO penalty to be good replacement for on CPUs, HMC was targeted at HPC workloads and is expected to be much more expensive to implement, while WideIO is being aimed at phones/tablets. General purpose CPUs seem to've been missed by all three to one extent or another. My WAG would be that WideIO would probably be the closest fit to what we'd want for a mainstream x86 CPU, and the extra capacity needed could be gotten by just scaling the bus wider than for a phone/tablet CPU; but until Intel or AMD speak about future plans who knows.)
BrokenCrayons - Tuesday, January 26, 2016 - link
There really aren't that many DDR3 equipped cards for sale even now. Previous gen GPUs that are old stock still on sale are, but at least on the NV side of the house, the GT 730 $50-$60 USD) and up can all be purchased with GDDR5 leaving ONLY the GT 720 as the outstanding retail GPU that has nothing but DDR3. As far as I can tell, DDR3's days are over and I'm not certain that DDR4 is good enough to end up on the extreme low end. Even if it does, a narrow 64-bit bus will prohibit performance to the point where Intel's GT2 parts will offer comparable (and possibly better) end user experiences.zodiacfml - Friday, January 22, 2016 - link
Nice. But the real discussion here is how more expensive than the previous one or how cheaper this is compared to HBM or HBM2.beginner99 - Saturday, January 23, 2016 - link
Exactly. I'm sure it's cheaper than HBM else why develop it? Lack of interposer and using existing GDDR5 ecosystems makes this pretty obvious. Question is how much? And is it cheaper than GDDR5?Mr Perfect - Saturday, January 23, 2016 - link
Okay, can someone explain what "n" stands for, as used in 16n and 18n prefetch? I've not seen that abbreviation before.know of fence - Sunday, January 24, 2016 - link
In a prefetch buffer architecture, when a memory access occurs to a row the buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants adjacent datawords in memory, which in practice is very often the case. For instance, when a 64 bit CPU accesses a 16-bit-wide DRAM chip, it will need 4 adjacent 16 bit datawords to make up the full 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip; it is multiplied by the burst depth "4" to give the size in bits of the full burst sequence). An 8n prefetch buffer on an 8 bit wide DRAM would also accomplish a 64 bit transfer.The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core (each memory access results in a burst of 8 datawords on the IOs). Thus a 200 MHz memory core is combined with IOs that each operate eight times faster (1600 megabits per second). If the memory has 16 IOs, the total read bandwidth would be 200 MHz x 8 datawords/access x 16 IOs = 25.6 gigabits per second (Gbit/s), or 3.2 gigabytes per second (GB/s). Modules with multiple DRAM chips can provide correspondingly higher bandwidth.
https://en.wikipedia.org/wiki/Synchronous_dynamic_...
Mr Perfect - Sunday, January 24, 2016 - link
Thanks. So it's an abbreviation for dataword? 4n is read as 4 datawords, 8n as 8 datawords, etc?know of fence - Sunday, January 24, 2016 - link
N is not a unit, not everything has to be, because units are conventionally separated by a space like so: 256 GB. n just stands for "times", it's just an index or a variable in the sense of "n-times" or "to the nth degree". But yeah, it can mean that 8 datawords can be prefetched at a time, or so I gather from the above wiki-quote. I was asking myself the same question BTW.Lolimaster - Saturday, January 23, 2016 - link
Considering AMD plans to use HBM2 or HBM3 for an APU, GDDR5X is short lived.Lolimaster - Saturday, January 23, 2016 - link
For low low/mid you will only need 2GB of vram which means 1 stacked chip with 256GB/s bandwidth (HBM2).junky77 - Saturday, January 23, 2016 - link
But HBM also operates differently, leading to different optimal efficiency in different scenarios529th - Saturday, January 23, 2016 - link
What will the 4096 bit bus mean with Mantle/DX12/Vulcan now that more cores will be used?SunLord - Sunday, January 24, 2016 - link
I'm gonna assume we'll see HBM2 replace GDDR5 in the high end (Fury/390/980TI class) but GDDR5X in future high-mid range (380/960) classtygrus - Sunday, January 24, 2016 - link
Doesn't sound like anything new. Just another die shrink while doubling the pre-fetch. The basic principal for the last 16+ years since the start of DDR SDRAM. The only reason they re-used the GDDR5 in the name was because the signals are the same as GDDR5.GDDR# was a fork of the DDR# standards and has often incorporated features that later find their way into DDR# standards. There is not a direct GDDR5 = DDR5 or similar relationship because GPU can be designed for specific memory without some of the limitations that apply to DIMM's in sockets attached to CPU's that are further away.
yhselp - Sunday, January 24, 2016 - link
It looks like GDDR5X could be faster than 1st gen HBM; on a 384-bit bus it could achieve 672 GB/s vs. HBM's 512 GB/s. And if they manage to achieve, say, 12 Gbps on a 512-bit bus, the same way AMD manages 6 Gbps GDDR5 on a 512-bit bus, we could be looking at a GDDR5X limit of about 768 GB/s - that seems mighty impressive for a serial memory technology.If I'm not mistaken, a while back NVIDIA announced they won't achieve their initial goal of 1 TB/s on their first gen HBM. Is it possible that GDDR5X stays competitive performance-wise with HBM for longer than expected? How about "budget" versions of future flagships utilizing GDDR5X instead of HBM? After factoring in color compression a GDDR5X card might achieve a theoretical maximum very close to 1 TB/s - wouldn't that be enough even for 4K and VR?
damianrobertjones - Monday, January 25, 2016 - link
...Which probably means that it can go a LOT faster but, as is usually the case, you wouldn't want to do that as they'd prefer to go for more $$$$$Shadowmaster625 - Monday, January 25, 2016 - link
So why isnt it called GDDR6?Oxford Guy - Tuesday, January 26, 2016 - link
I guess because marketeers like X so much.06GTOSC - Tuesday, January 26, 2016 - link
Seems like for truly high end applications, HBM will be the standard. But this will enable higher performance mid range cards one or two steps down from the highest end cards.JonnyDough - Monday, February 1, 2016 - link
Their naming scheme is crap. They should name it GDDR6. Why? Because when someone does a search on EBay for GDDR5X it will also give GDDR5 options and the same goes for other sites. Should these sites fix their searches? Yes, but they won't/don't and in the end it makes it confusing for consumers.seanr - Wednesday, February 3, 2016 - link
To bad we didn't skip DDR4 memory as system memory. Even when graphics card first came out with GDDR 4, the change wasn't all that impressive. But when I bought an HD 4870, with GDDR5, I was ecstatic. This new 5X seems cool but more like the DDR3 to 4 transition.