Comments Locked

120 Comments

Back to Article

  • osxandwindows - Monday, July 25, 2016 - link

    Interesting stuff. Wonder if this can be used as additional local storage for the entire system?
  • Intel999 - Monday, July 25, 2016 - link

    Yes, they stated that the SSD could be used as local storage too.
  • osxandwindows - Monday, July 25, 2016 - link

    Thats nice.
  • ddriver - Tuesday, July 26, 2016 - link

    I honestly don't see the point in this. It might be useful for older mobos which don't have M2 slots as a way to have a fast SYSTEM storage, but for storage for the GPU it doesn't make sense.

    Currently SSDs cannot even max out the bandwidth of a PCI-E 3 slot, and GPUs themselves are interfaced through x16 slots, which means they get much better than SSD bandwidth with system ram (of which you can have plenty) and can also access system SSD without any significant penalty, at least relative to the typical bandwidth, iops and latency of SSD. Lastly, it is not like SSDs are ANYWHERE NEAR the bandwidth and latency of GPU memory, 2-3 Gb/s in the case of the fastest SSDs is SLOW AS A SLOTH compared to the 200-300 Gb/s for a typical upper mid range GPU.
  • Spunjji - Tuesday, July 26, 2016 - link

    It doesn't have to be anywhere near the bandwidth of the GPU memory to be relevant. That's what tiers of storage are about. What it does mean is you can keep storage as close as possible to where it needs to be. It also means you can store way, way more data there than you can fit in system Ram.

    There are probably only limited scenarios where this makes sense, though, because the systems it will be used in will most likely have other SSD storage on the PCIe bus somewhere else in the system.
  • ddriver - Tuesday, July 26, 2016 - link

    The purpose of having it closer is so that it can be accessed faster. But when the medium is so slow, it defeats the purpose. Latency is not a real issue in large data sets, as it can be completely masked out by buffering. Performance gains from this will be minuscule and likely offset by the hit on the thermal budgets as elaborated in the comment below.

    SSD bandwidth is simply dreadfully low for GPU compute scenarios, and moving the SSD closer will not make the SSD any faster than it already is, it is simply going to remove some negligible overheads, at the extra cost of the hardware, the loss of thermal headroom and throttling performance losses and the need to change your code to target that extra functionality.

    This will make more sense for something like Optane, if it lives up to the performance hype and is not overly expensive, but for flash SSDs it is entirely pointless.
  • DanNeely - Tuesday, July 26, 2016 - link

    It'd be useless for realtime rendering like in our games; but the amount of data used in professional rendering for Hollywood is insane. Ars puts it above 64GB per frame (and with frame rates well below 1 per second). In that case using up not just all of the GPUs ram but all of the CPUs ram as well and needing to stream data from SSDs is plausible. If that's the case moving the storage closer to the GPU to cut latency should speed things up; especially in cases where you've got multiple GPUs in your renderbox and don't have enough PCIe lanes to give them all that much dedicated SSD bandwidth.

    That said I'm interested in if there're any benchmarks that could be run by tech sites that hit these levels of detail; or if we'll have to hope for posts from VFX studios to see how much these actually can help.
  • ddriver - Tuesday, July 26, 2016 - link

    This is a very arbitrary number, and it involves actual assets, and it concerns software rendering done on the CPU in large rendering farms with fairly slow and latent interconnect. AMD is demoing it with video, but looking at those 850 mb/sec for the non-ssd configuration, it does look like the benchmark was written to give an artificial advantage to showcase their concept in an unrealistic scenario which will fail to deliver in practice.

    With PCIe you have DMA, and with a x16 v3 slot you have ~11 GB/sec bandwidth (out of ~16 theoretical). Latency is quite low too, in the realm of tens of microseconds. Also, modern GPUs can work asynchronously, meaning that you don't have to waste processing time to do the transfers, you can transfer the next buffer while you process the current one, eliminating all subsequent latency penalties, leaving only the initial one, which is quite literally nothing for such workloads.

    What I am saying it is practically 100% possible to get the benefits AMD demos "with SSD" without integrating the SSD onto the GPU. SSDs are far from being even close to saturating a PCIe x16 link, and the latency is negligible next to that of the SSD. You don't really save much by moving SSDs to the GPU, but you increase the cost, complexity and TDP. So it is not merely redundant, I am highly skeptical that the PROs will be able to outweigh the CONs.
  • close - Wednesday, July 27, 2016 - link

    I bet the patented "ddriver 5.25" HDD" would be much better for this purpose - bothe faster and cheaper than any SSD or current day HDD according to your back-of-the-napkin calculations. It must be a huge burden to be the best engineer around but with nothing to show for it. ;) [/s]

    At frames in the tens or hundreds of GBs there's only so much you can do with any kind of RAM memory before it starts spilling. And when it does you might be better off with an additional storage tier, albeit slower but closer to the GPU.
  • eachus - Sunday, July 31, 2016 - link

    "The purpose of having it closer is so that it can be accessed faster. But when the medium is so slow, it defeats the purpose. Latency is not a real issue in large data sets, as it can be completely masked out by buffering."

    Except when it can't. I used to write this type of software professionally, I still do on occasion now that I am retired. A good example of the latency problem would be an implementation of the Simplex algorithm for linear programming problems. Rather than recompute the basis it is normal to take some number of steps creating vectors such that you multiply the basis by a vector at each step. After so many steps you recompute the basis (select a square matrix out of the data and invert it). So 50 (say) steps, then select the new basis and invert it.

    For most problem sizes, the time to collect the new basis dominates the algorithm. (If you go too many steps without recomputing, you start taking wrong steps.) The amount of data in the new basis is (relatively) small so bandwidth is not a problem, but the complete dataset is often more than n Gigabytes, where n is the size you can keep in main memory. ;-)

    You can create problems which need most of main memory to store the basis, but those problems are usually (transportation or assignment) problems that have special solutions.

    I've also dealt with other algorithms that need access to a large dataset, but where the next section to be read in is selected almost at random. (Some graph partitioning problems...)
  • MLSCrow - Tuesday, July 26, 2016 - link

    I'm really sorry that you don't see a point to this. I'm equally sorry that you feel that working on a new and brilliantly innovative technology that can provide benefit for certain situations has no "practical applications" and is completely "pointless". If you cannot see any benefit to this, then I'm sorry for you lack of vision.

    To spend money on R&D on a new and innovative product that doesn't necessarily target a mainstream market really isn't the move of a "desperate" company. It's the move of a company with vision that is moving forward, a company that is willing to spend money on inventing new things, which realistically, no desperate company would waste money on. A desperate company would be as stringent and efficient about their bottom line as possible. The fact that they are doing these things is a sign that the company is or is getting back on track and doing well. Their new GPU's are incredible. Their new Zen CPU's are going to be incredible. Their stock is soaring. They're gaining market share and investor confidence as well as confidence in themselves, and it's showing.

    Again, this isn't to replace VRAM, as that is the fastest buffer there is, but it will provide a new and fast buffer between system storage and VRAM that can store much more information that system RAM. Lets say, that on average, the typical power use has 16GB of system RAM. 4K videos can be over 10 times that space. 8K Videos and beyond will push you to over half a Terabyte. Having two SSD's in RAID 0, allows the fastest buffer than can store the entire file, skipping many steps required to send the data to the GPU. This can provide a large benefit to video editors, reducing latency by reducing the amount of hops, potentially reducing stuttering and other related issues and is an option that I feel is a great addition to a GPU. To offer customization to a GPU is a first and honestly, who wouldn't want the option of customization if you can have it?

    I don't need an extra gas tank in my car, I have one already, but if you told me I could have an additional one that stores way more gas than not only my default tank, but more gas than a gas station itself, with better, higher octane gas than I could get from a gas station, which will then improve the performance of my vehicle as well as reduce the frequency with which I need to stop to fill up, pffft, who wouldn't want that? Be realistic. Your negative comments are excessive, leaving one to conclude that you have bias, which then discredits your position.
  • ddriver - Tuesday, July 26, 2016 - link

    I'd say there are more reasons to feel sorry about yourself:

    1 - you don't see how this is technically redundant
    2 - you can't present any meaningful scenario what will show benefit
    3 - you claim that "Their new Zen CPU's are going to be incredible", and while I say "it better be" making such claims reveals you as a fanboy, so your level of intellect is already in question.
    4 - you are awful at making analogies, a more apt analogy would be to integrate a fuel tank into the engine to "bring gas closer to the engine, because that hose to the tank is so long and it has latency" which even you should now is entirely, 100% unnecessary.
    5 - you can't discern the difference between basic concepts such as negativism vs realism, efficiency vs redundancy and innovation vs desperation.

    Once again AMD are wasting budget on unnecessary stuff, and it is not like they are in a position to afford it. This concept could work nicely with something BETTER than flash nand, like for example Optane (again, if it lives up to the hype) - this is HBM all over again, they went for HBM prematurely, and not because they were innovative, but because they were desperate, their GPUs were poor performers and power hogs, and HBM offered a tiny performance boost and TDP drop, but it wasn't a game changer, Fury did not become the king of gaming, nor even a particularly well selling product.

    Few months from now PCIe v4 will be mainstream and it will double the bandwidth, further increasing the amount of data you can stream between PCIe devices (may I add directly without having to go through the CPU), further rendering this whole idea even less beneficial than it already is.
  • MLSCrow - Tuesday, July 26, 2016 - link

    ddriver, you've discredited yourself with your trolling. If you don't like this technology from AMD, then simply don't invest in it. Those that have a need, will. Everything else you have to say is /yawn at this point. Make something better. Prove that it's worthless. Run tests, show your results, and then get back to us about how pointless it is, if you can.
  • close - Wednesday, July 27, 2016 - link

    ddriver doesn't understand this kind of stuff. He reads diagonally and imagines GPU X + Y GB VRAM + SSD must be worse than GPU X + Y GB VRAM + system storage simply because he doesn't understand tiering and thinks that if an SSD is slower than RAM than adding it as a lower tier must somehow slow down the higher tiers o_O.

    As a sidenote to better understand this sorry dude he insists that his "design" for a 5.25" hard drive is better (faster/cheaper) than anything on the market but there's a conspiracy among storage manufacturers to keep HDDs and SSDs small, slow and expensive. Keep this in mind before wasting too much time writing long replies to him. You'll just confuse him. ;)

    Yes indeed he is biased. But not necessarily towards a specific company (although AMD is always the good target) but towards the engineers that actually put products on the market while he twiddles his thumbs in frustration on Anandtech.
  • ddriver - Wednesday, July 27, 2016 - link

    You are a real nitwit aren't you? How hard is it to understand that nand flash is the bottleneck here, and that an SSD on the GPU will not be any faster than a SSD on PCIe? That PCIe v3, much less the upcoming v4 has ample bandwidth and negligible latency which would in no way impede the GPU performance, that PCIe has DMA and GPUs are asynchronous and transferring data will not take away from GPU work cycles - all you have to is implement a basic good old buffering. It is so simple and obvious, yet you fail to grasp it.

    But hey why don't you repeat a few more times how silly is my concept of independent head 5.25" HDDs? I am sure that will make you feel better, since that such a substantiated argument ;) Oh wait, that's right, you haven't and couldn't possibly substantiate it because you lack the intellect to grasp it, much less to discredit it. You know no better than what the greedy and lazy industry crams down your throat to milk you for profit. And anything better is outside of your little cozy box of mainstream conformism, and you are not a big "outside the box thinker", come to think of it, you aren't exactly an "inside the box thinker" either, you are just a repeater, who wouldn't know innovation if it took a dunp on your face :D
  • close - Thursday, July 28, 2016 - link

    No, it actually makes me sad that guys like you think that some shitty napkin calculations are always better than anybody else's, including accomplished engineers with a lot more to show for it then you.

    Which reminds me of this: http://www.commitstrip.com/en/2016/06/02/thank-god...

    Now it's my turn: does insisting with your shitty ideas that somehow never seem to pan out make you feel better about yourself? I mean you're everywhere on AT commenting, using big words but for some reason you always miss the mark. That's some pent up frustration right there.
  • jabber - Friday, July 29, 2016 - link

    Oh don't worry, he'll probably call it revolutionary when Nvidia does it with a Quadro card.
  • fanofanand - Tuesday, August 2, 2016 - link

    With how hard times have been for HDD manufacturers, if they had a silver bullet you don't think they would have used it by now? Any additional moving parts inside a HDD enclosure increases failure points. Get over yourself.
  • msroadkill612 - Saturday, April 29, 2017 - link

    Speaking from down the pike a bit, you have been prescient sir. Well said too.

    I dont get it either. How can u violently hate something they patently dont understand, and then have to bore and bad vibe everybody with it?

    Whats not to like? A GPU with onboard 4GBps+~?, immediately adjacent to the gpu & vram for extra speed, and using ~no system resources. A true graphics co-processor.
  • Samus - Tuesday, July 26, 2016 - link

    Incredibly innovative idea from AMD. I don't expect that much anymore, but I love surprises.

    And although the dev kits are unreasonably priced, at retail there isn't a good reason "SSG" cards would command more than a $200 premium over non-SSG equivalents, especially since this could all be done native within Polaris GPU, the circuitry routing and additional power requirements are negligible, and they probably won't bundle SSD m.2 drives.

    The real issue is going to be potential benefits. It's hard to believe this will be any faster than going over an ultra fast PCIe 3.0 interface to another PCIe 3.0 device (the m.2 drive) just a few inches away. It isn't like any of this stuff is saturating the bus, and NAND is just too slow to be used in latency-intensive tasks where VRAM is required.

    Can't wait for the benchmarks.
  • ddriver - Tuesday, July 26, 2016 - link

    So then how is it "incredibly innovative" if it doesn't seem to have any practical applications? Because it is new? In this regard putting just about anything on a GPU would be "incredibly innovative"... Let's put a bar code printer on the GPU so it can print diagnostics on paper when something goes wrong... or something like that LOL.

    To me this move from AMD doesn't spell out "innovation" but rather "desperation". Sadly, I think they wouldn't be doing stuff like that if they were confident in their future products. At this point it doesn't look like this is about anything more than some hype and making a few thousand dollars on those wonderfully priced developer kits, because somehow a 250$ GPU with added M2 slot more than justifies a 10k purchase, whose purpose would likely be to prove SSD on GPU is pointless, but hey at least it will generate more heat.
  • Spunjji - Tuesday, July 26, 2016 - link

    Your reasoning doesn't make sense. A company experimenting with new products is not "desperate" unless they absolutely have to make them sell. This is a beta so it's not like they have bet the farm on it. Their customers will decide whether or not it makes sense for them.
  • ddriver - Tuesday, July 26, 2016 - link

    It is not just a company, it is a company that struggles to compete. AMD have a long history of trying to make up for their hardware in various gimmicks, ranging from barely useful to entirely pointless.

    This particular "product" - aside from being applicable in a very, very narrow market niche, will also over very, very mediocre performance advantage, hopefully enough to be worthy the price premium and enough to pay for the RD it took. Knowing AMD, it will likely be yet another loss on a very tall mountain of losses.

    The very fact AMD are not showing any "charts" and making claims of improvement figures goes to show they literally have nothing, because if they did, they'd be more than willing to use it.

    Lastly - fast M2 SSDs are known at being very good and fast at hitting throttling temperatures where performance degrades significantly, plus imagine the effect on the GPU that already reaches 90 degree C would have when you slap two 90 degree C SSDs on the back of the PCB - that means you immediately lose a lot of performance, as the GPU will throttle EVEN MORE than it usually does, likely diminishing any tiny benefits you may get from putting SSDs on board.

    Finally - don't take my word for it, just look at AMD's track record, those guys - very good at shooting themselves in the foot, even when they have decent architecture as a foundation. How much confidence can one have in a company that struggles to stay in business and hasn't really made money in like forever?
  • ddriver - Tuesday, July 26, 2016 - link

    I'd go for a neat marketing moniker for it - the Radeon Pro SSG with MAT technology, that is Mutually Assured Throttling, where the GPU and the SSD work in a delicate and mutually beneficial tandem - the GPU ensures the SSDs throttle sooner, the SSDs ensure the GPU throttles sooner and further. Plus I just bet being strapped onto a literal hotplate will do wonders for the reliability of the SSDs.

    But let's wait and see how AMD will capitalize on this pointless "yet-another-amazing-idea", they can't make money with good products that make sense, perhaps they will score big on stuff that doesn't make sense.
  • MLSCrow - Tuesday, July 26, 2016 - link

    I now regret having responded to a previous post of yours. It's clear now that you simply do not like AMD. Whether or not you lost money in the past, having invested in them, I don't know, but again, your negative comments are overly excessive. It doesn't matter what AMD has done in the past, the people running the company now are not the same people that were running it before. This is a new company and what they are doing and have been doing is nothing short of awesome. I welcome Polaris with open arms and a thinner wallet, I welcome Zen equally. AMD is indeed an innovative company, that may have invented some things that didn't pan out well, but for the most part, they've invented things that have changed the world of computing forever. The first and still used 64-bit instruction set, which Intel had to strait up copy. The first multi-core CPU's, the first APU's, great GPU's over the history of their acquisition of ATi, Mantle, which Vulcan and DX12 may not have existed without, HSA, which is the right way to evolve processing. Compute performance, etc.

    In the past, AMD had incredibly poor management. Hector Ruiz almost saw to the destruction of the entire company. Predecessors weren't able to restore the strength of the company until recently and decision to take a chance on a new and different type of CPU architecture backfired (a great attempt at something different, which could have set them apart, perhaps if it wasn't rushed, over-hyped, and designed better out the gate, but you can't really hold it against them for trying), but with the leadership of Lisa Su as well as the decision to bring back arguably one of the greatest CPU architects of all time, giving him free reign to hire his own team and the freedom to design from the ground up, along with providing Raja with RTG to keep him around, are all paying off now.

    If not for AMD we wouldn't be where we are today or moving in the right direction like we are now toward superior technologies. Also, if not for AMD, we'd all be paying premium prices with unlimited ceilings due to monopolized markets, which their existence and products ensure we aren't. If anything, even haters of AMD should appreciate their existence and efforts simply in reducing prices of their more favored products. Whether you like em or hate em, it's good that they are around.
  • ddriver - Tuesday, July 26, 2016 - link

    You are wrong - I'd like to see AMD up on its feet, not because I like it, but for the sake of competition. And does tend to offer better value for their products, but not because of noble aspirations, only because they are currently the underdog. But as good as their value might be, it doesn't help much when they can't compete in terms of absolute performance. And this includes power efficiency as well.

    Hopefully Zen won't be yet another flop, they desperately need to get back in the game, not that they did that much better back in the days of athlon, when intel could only offer garbage. But it is AMD after all, they've been making a lot of promises, they did waste RD budget on lots of uarch iterations, and they all sucked. It doesn't look like they have their priorities in tact. I have a theory there, that intel is secretly keeping amd "alive" just so that it looks less of a monopoly.

    It would be quite foolish for ANYONE to LIKE any CORPORATION, it is not that any of them are there to do any of us favors, they do what they do not for the sake of the product or the consumer, they do it for the sake of money. A lot of the "AMD" love is based on sympathy for an underdog, a lot of it is a product of AMD being forced in a more "giving" position, a lot of it is pure moronic fanboysm. But don't fool yourself that AMD is any better than intel, if they had the upper hand they'd be just as bad. It is the same with regular people - the rich can get away with anything, the poor NEED to be good because they don't have the money to afford to be bad.
  • cocochanel - Tuesday, July 26, 2016 - link

    He doesn't hate AMD. He is just an Nvidia shareholder. There is a bunch of them patrolling the top tech websites out there causing so much misery. They just want to see AMD fold up, that's all there is to it. And they'll pull all the tricks out of the bag to do so.
  • ddriver - Tuesday, July 26, 2016 - link

    I don't gamble, and that includes buying anyone's shares ;) Also, I have over 60 radeons under my roof running and crunching numbers 24/7, simply because they are the best bang for the buck for FP64 performance. In fact the only nvidia product I currently own is the GPU that came with a laptop as the only option.
  • Samus - Tuesday, July 26, 2016 - link

    Your just now picking up on drivers anti-AMD trolling? He's been going at it awhile now, since at least Fuji, under the defense he likes to see competition while at the same time never being critical of nVidia and their anti competitive nature (dating back to 3Dfx) their price fixing, or their proprietary atmosphere (they like closed systems - Linux drivers being a rare exception) which caused them to lose consideration for any game console now and moving forward.

    Then there is the obvious fact AMD doesn't actually make bad products, their entry level gaming cards and their FireGL professional cards are all industry mainstays. Solid works in particular favors AMD GPU's over nVidia.
  • ddriver - Tuesday, July 26, 2016 - link

    Yeah, let's hope I will someday get to reach the level of excellence of you fangirls and cheer at mediocrity. I have never criticized nvidia? Willing to bet on it?

    After a cursory search I was able to find those, there are probably plenty more:

    http://www.anandtech.com/show/7764/the-nvidia-gefo...

    LOL, epic? Crippling FP64 performance further from 1/24 to 1/32 - looks like yet another nvidia architecture I'll be skipping due to abysmal compute performance per $ ratio...

    http://www.anandtech.com/show/9516/nvidia-announce...

    Way to go, 8 times the difference is equal to almost no difference. I guess you almost won't care if your boss cuts your salary 8 times right?

    http://www.anandtech.com/show/7525/nvidia-33182-ga...

    I just wish they were not so botched ... And it is not like this is an isolated case, I've seen a ton of people complaining about the same issue to which nvidia remains indifferent. My next GPU will be a radeon...

    http://www.anandtech.com/show/7897/nvidia-announce...

    And now, a limited time offer, you can get TWO titans at the price of THREE!

    http://www.anandtech.com/show/8069/nvidia-releases...

    OpenCL + ATi gives you about 10 times better bang for the buck. Only people looking to waste money will go for nvidia compute solutions.

    http://www.anandtech.com/show/9779/nvidia-announce...

    It is too expensive even at educational pricing level. Cheap nvidia pushing for profit even on such a low volume market as dev platforms... way to go...
    Also, I don't see OpenCL mentioned anywhere, so thanks but no thanks!

    http://www.anandtech.com/show/9681/nvidia-announce...

    Once again, disappointing FP64 performance. FP64 is very important for workstation workloads.
    Sadly, in order to compete AMD is adopting the same strategy.

    Naturally, nvidia's product value is so low that it's been long since I even considered buying it, and thus logically, I care very little about what nvidia does. Do you bother criticizing stuff you don't care about? You are only confusing me having standards with trolling because you are a clueless wannabe who pretends to be competent while failing hard at it, a typical victim of consumerism.
  • fanofanand - Tuesday, August 2, 2016 - link

    Very impressive, that was the best defense of a trolling accusation that I can recall seeing. Now go back to your anti-AMD trolling. :P
  • fanofanand - Tuesday, August 2, 2016 - link

    Optane didn't show massive increases either, must not be worthwhile and was clearly an act of desperation from Intel. /s
  • evancox10 - Tuesday, July 26, 2016 - link

    These are Radeon Pro cards meant for very specific business/HPC applications, and they will be priced for that market (i.e. measured in $1000's, not $100's). I doubt you will see these in consumer cards anytime soon
  • Bullwinkle J Moose - Tuesday, October 4, 2016 - link

    Thats nice....
    But not as nice as fast simultaneous Reads and Writes
    How fast can it copy/paste to and from the RAID SSD's?
    Not very fast at all I bet

    Update your test procedures NOW AnandTech

    There's a Storm Coming!
  • SunLord - Monday, July 25, 2016 - link

    Since they have two slot do they run them in raid 0 to give more speed? I'd think speed over capacity would be more useful for this
  • prisonerX - Tuesday, July 26, 2016 - link

    There is no RAID since there is no OS or filesystem. The question is if they'll read them concurrently, and the answer is "duh."
  • Ryan Smith - Tuesday, July 26, 2016 - link

    Yes, they are in RAID-0.
  • Eden-K121D - Monday, July 25, 2016 - link

    I can't wait for PCIe 4.0 with 32GT/s bi-directional bandwidth
  • Bullwinkle J Moose - Wednesday, October 5, 2016 - link

    At PCIe 4.0, why not just build the graphics card like a motherboard and add USB 3.1 / Thunderzolt and a couple M.2 SSDs

    Hell, just plug 3 or 4 quadcore Intel CPU's directly into the graphics card at that point

    Why settle for virtual machines when Heavy Mettle will do?

    HEY!

    I was F*^&%ng Kidding!

    OMG, Their thinking about it now....

    What haz I done?
  • Bullwinkle J Moose - Wednesday, October 5, 2016 - link

    yeah, I know Their should be they're

    so.....THERE!
  • Communism - Monday, July 25, 2016 - link

    If they are going to price these things sky-high, why wouldn't people just get GP100 with NVLink and just use system ram?

    All in all, this seems incredibly pointless.
  • testbug00 - Tuesday, July 26, 2016 - link

    why would AMD price them insanely high? AMD isn't NVidia or Intel. Even if acting like Intel or Nvidia would help them sometimes.
  • just4U - Tuesday, July 26, 2016 - link

    and anyway... one would think that such things would be more relevant to Nvidia's GP102.. not yet out as far as I know.
  • testbug00 - Tuesday, July 26, 2016 - link

    that depends what is required to make this work. It could be Nvidia could design their GP102 around this and have it out when AMD has it out.
  • Michael Bay - Tuesday, July 26, 2016 - link

    AMD dreams to be exactly like nV and intel, it just fails.
  • Samus - Tuesday, July 26, 2016 - link

    Their professional graphics division has turned an annual net profit since the acquisition of ATI.

    That says volume when considering almost no other AMD division has done so consistently. Innovations and niche products like this, Mantle, and so on, keep them afloat.
  • eachus - Sunday, July 31, 2016 - link

    The "insanely high" pricing for beta prototypes are because these are "bleeding edge" prototypes. I spent years of my life on the bleeding edge, both as a customer and as a producer. If you have an application where this can save you thousands of dollars a month? AMD will hear from you, and provide the needed support to get the prototypes working. Then, next year, perhaps with a Vega 10 GPU instead, they will produce a commercial product.

    I know of several sesmic processing groups who would kill for this product, if it can cut their processing time on their biggest (and deepest) visualizations by a week or so.

    Oh, by the way. Note that Intel may by next year have a Crosspoint memory card to put in instead of a Flash RAM card.
  • just4U - Tuesday, July 26, 2016 - link

    I am not really sure what to make of your comment.. Trolling? hmm.. Both AMD and indeed Nvidia sell professional graphic solutions that can be quite expensive due to the feature set and drivers involved. Think upwards of 7X the cost of a GP100. Aside from that this is a developers kit..
  • Demiurge - Tuesday, July 26, 2016 - link

    NVLink does not work that way!!!! ("Windmills do not work that way!!!!")

    1) Paths to the CPU/main memory often get saturated by the processing being much faster than the point-to-point bus they are connected. The idea here is to avoid the external bus and keep data local (close to the processing element). The larger the working dataset, the more evident this becomes.

    2) GDDR5X or HBM2 memory is not cheap, MLC flash is by comparison. This is similar the concept behind using the paging file (or virtual memory) on an HDD to enhance system memory for programs running on PC's. I don't know how much a GPU with 128GB to 1TB of GDDR5X or HBM2 memory will cost, but I can guarantee ~$10K for a development kit and another $10K for the SSD's through the 1-2 year lifetime is a bargain. That's really what the hidden proposition is here... value and scarcity of a comparable solution.
  • Communism - Tuesday, July 26, 2016 - link

    Their SSD solution is 5 gB/s.

    PCIe 3.0 x16 is 16 gB/s per direction.

    NVLink is 40 gB/s per direction.

    This "solution" is completely and utterly pointless unless it costs a negligible amount over how much a polaris alone costs.

    We're being told it costs 10K for the card alone, sans the SSDs.
  • Eden-K121D - Tuesday, July 26, 2016 - link

    Nvlink doesn't work with intel CPUs
  • ZeDestructor - Tuesday, July 26, 2016 - link

    Right now it doesn't at all. Supermicro, Wistron and Quanta have already demoed systems with NVLink for inter-GPU communication and PCIe 3.0 x16 for GPU-CPU communication. Not quite as good as NVLink all the way, but for huge datasets, a potentially huge boost to have more data shared across GPUs.
  • Nagorak - Tuesday, July 26, 2016 - link

    Use some logic. If this was "totally pointless" they wouldn't have released it at all. It's a totally niche product as it is. If it had no actual use they wouldn't have squandered resources on coming up with it and rolling it out.
  • Communism - Tuesday, July 26, 2016 - link

    Your assumption that everything made has a valid purpose is a flawed premise.

    The spinning off of GF from AMD saved them no money as the entire deal was a way for the executives to siphon more money for themselves.

    The purchase of SeaMicro by AMD was completely and utterly pointless from the start, and any engineer from AMD I would assume is smart enough to know why. It was obviously an executive decision to siphon more money for themselves.
  • prisonerX - Tuesday, July 26, 2016 - link

    Your tinfoilhat has cut off your the circulation to your head.
  • Communism - Tuesday, July 26, 2016 - link

    "No good deed goes unpunished" ._.

    No more free analysis for people on this shill infested site.
  • smilingcrow - Tuesday, July 26, 2016 - link

    I think you may need analysis be it free or otherwise.
  • Michael Bay - Tuesday, July 26, 2016 - link

    Oh look, gommie opens his trap about "free".
  • Notmyusualid - Tuesday, July 26, 2016 - link

    Indeed.
  • yannigr2 - Tuesday, July 26, 2016 - link

    Your assumption that you know better than a multi billion company doesn't look so valid.
  • yannigr2 - Tuesday, July 26, 2016 - link

    Fanboys know better than companies.
  • prisonerX - Tuesday, July 26, 2016 - link

    It doesn't cost $10K per card. That's for the developer kit. They haven't announced retail pricing.
  • Demiurge - Tuesday, July 26, 2016 - link

    If two cars are stopped at a red light, which one is faster?
  • pogostick - Tuesday, July 26, 2016 - link

    the red one.
  • silverblue - Tuesday, July 26, 2016 - link

    Unless it's the Milky Way cars.
  • Snorklax - Tuesday, July 26, 2016 - link

    Image you have a 2TB dataset you need your GPU to process.

    In all cases, this dataset will have to be read from slow storage and written to slow storage. No amount of NVLink is going to help unless you can get the entire dataset into RAM. If you have 2TB of RAM, I'm guessing that the product is not for you.

    But if the GPU has direct read/write access to the slow storage, you would get rid of all the latency in having to go through the system to fetch or write the data. So your GPU is wrangling with the 2TB of data all by itself, with 0 impact on the rest of the system

    How is this pointless?
  • Notmyusualid - Tuesday, July 26, 2016 - link

    With a name like Communism - how do you expect to win a logical argument against him.

    Regardless, what you say makes 1000% times more sense that his comment, so rest assured many of us can see that.
  • ddriver - Tuesday, July 26, 2016 - link

    Latency would not be an issue in this scenario, because streaming will mask it, there will only be the initial latency - so instead of processing your data set in say 60 minutes, you will be processing it in 59 minutes 59 seconds and 99 hundredths if you access data through the PCI_E bus.
  • smilingcrow - Tuesday, July 26, 2016 - link

    Potentially for a sequential read but what about random?
  • yannigr2 - Tuesday, July 26, 2016 - link

    This doesn't look like something that will sell in hundred of thousands of cards, more like a few thousands at best. And for a few thousand people/corporations, 1TB of SSD storage could be enough and could also offer enough acceleration and be a good alternative than going Nvidia and NVlink and then trying to build a system with half terabyte or more of system memory.
  • vladx - Tuesday, July 26, 2016 - link

    Few thousand? If they manage to sell 100 of these, AMD will consider themselves "lucky".
  • FMinus - Tuesday, July 26, 2016 - link

    A university here runs around ~650 FirePRO W9100 alone, what makes you think they only sell low volume of those cards, aside from your stupidity.
  • vladx - Tuesday, July 26, 2016 - link

    Because the price is 10k/piece and the use cases are much smaller than an ordinary workstation card.
  • vladx - Tuesday, July 26, 2016 - link

    No university in the world will buy such an esoteric solutions.
  • silverblue - Tuesday, July 26, 2016 - link

    No, the dev kit is $10K, but I suppose that won't stop AMD charging a lot if it has customers who will pay it.
  • vladx - Tuesday, July 26, 2016 - link

    I was only refering to the developer kit obviously since that's the only price we got, duh.
  • eachus - Sunday, July 31, 2016 - link

    Ever heard of the Large Synoptic Survey Telescope? https://www.lsst.org/ These cards will be very useful for building up a base image of the sky, and noting changes with each image. One problem, since the telescope has a 3200 megapixel CCD, it would take 32 or more of these, in separate workstations. Not a problem really, dozens of astronomers will chose sectors of the images to work with. The number of "events" to process, per image, will be in the thousands.

    But to me, the real use case is sesmic processing. You spend hours, often days of supercomputer time coming up with one three dimensional image, several gigabytes in size. Then an analyst downloads the image to a workstation, and tries to make sense of it. The pretty sesmic images you may have seen, are pretty, but are also the result of hours of post-processing by a skilled analyst to create an image which is easy to read. Having the data local, rather than supplied by a server, will take lots of coffee breaks out of the process. (Points where you wait three or four minutes for the next image to be created.)
  • D. Lister - Monday, July 25, 2016 - link

    "Meanwhile actual memory management/usage/tiering is handled by a combination of the drivers and developer software, so developers will need to code specifically for it as things stand."

    ...and all the developers need to do, before they specifically code for this one product, is to pay a paltry $10,000, and hope that this is indeed a real product, that will have long enough support and supply by AMD for it to be worth the developes' work and expense. Yeah, sounds like a winning idea already. If they kept clutching at straws like this, I might start thinking they may be drowning or something.
  • rhysiam - Monday, July 25, 2016 - link

    I'm confused. Surely even on-card PCIe NVMe drives are going to be substantially slower and higher latency than going to system RAM. So in any cases where system RAM can act as the buffer, this solution loses.

    Then if you really need a larger buffer than system ram can provide, is going via the system actually such a significant bottleneck that an onboard NVMe drive over a system one makes a sizeable difference? Even a local drive will have a PCIe interface to navigate as well as the latency inherent in reading the flash. So we're already into the hundreds of microseconds territory. Does a system M.2 NVMe drive actually add that much more?

    I'm guessing the answer is "yes", or this would be utterly pointless. Am I missing something?
  • jjj - Tuesday, July 26, 2016 - link

    Could be designed for xpoint and then, we don't have the data for xpoint to speculate.
  • haukionkannel - Tuesday, July 26, 2016 - link

    That is exacly what I was thinking. Intells xpoint would be a perfect match to this technology!
    Very interesting to see real test with this system.
  • Atari2600 - Tuesday, July 26, 2016 - link

    Yep. Xpoint would work v.well.

    That or its a means to get the software infrastructure up and running for future higher-powered APUs, or for future HBM based cards that have large memory pools - and can benefit immediately/quicker from this branch of software development.
  • prisonerX - Tuesday, July 26, 2016 - link

    You're conveniently assuming that all those system paths are sitting there idle. If you have 1TB of data you need to feed the video card, you will be using (and filling) busses that are also being used for other things. The local M2 PCIe only acts as a point-to-point connection to the GPU and is guaranteed to be available. Not only is the latency much lower (since there is no contention) but it never changes. You need very little buffering since you're streaming all the data into the processing pipeline from a reliable source.

    Add to this the fact that since the on-card drives are private and persistent you can cache intermediate data instead of having to reprocess it. Again very useful for very large data sets.
  • bryanlarsen - Tuesday, July 26, 2016 - link

    This smells like a semi-custom part. I bet that an oil and gas company approached AMD and offered to underwrite R&D in exchange for a volume commitment and a support guarantee.
  • beginner99 - Tuesday, July 26, 2016 - link

    This actually makes sense in contrast to the product. I mean if you have such complex data sets, why would you run them on something as "slow" as a Polaris10 GPU?
  • Demiurge - Tuesday, July 26, 2016 - link

    The most insightful comment I've read thus far.
  • Communism - Tuesday, July 26, 2016 - link

    Sounds like a massive collusion deal between executives to siphon company funds tbh.

    This thing makes exactly zero sense.
  • Kjella - Tuesday, July 26, 2016 - link

    I think so too, I'm guessing someone wrote a very special piece of software for machines with lots of GPUs, lots of SSDs and found the bottleneck was PCIe bus contention. Then they approached AMD to make special accelerator cards, these cards have already paid for themselves. Now AMD is just testing the waters to see if there's other potential users.

    The dev kit price is just a hint, if you're even considering this you have a $100k+ engineer/IT specialist writing custom code and you've tapped out on COTS hardware already and you will need production hardware, support etc. and you'll probably make a custom million dollar deal with AMD about it. If you're not in that category, then this product is not for you.
  • asdacap - Tuesday, July 26, 2016 - link

    Did not see this coming, but in some sense its a good idea to have a secondary storage. The question is, why SSD? Why not some SO-DIMM slots? Not enough capacity?

    Also interesting to see that PCIe is actually a bottleneck... which is alleviate by nonvolatile memory?
  • tamalero - Tuesday, July 26, 2016 - link

    Pretty sure the answer is obvious. If they do sodimm memory only. It will be useless as storage.
    By using SSD, they can server as storage for the video card AND server as storage for normal every day usage.
    The first makes it very niche product. the second makes it more appealing for broader usages.
  • smilingcrow - Tuesday, July 26, 2016 - link

    Just finding space for 32 SO-DIMM slots on a graphics card would be a challenge and this assumes you can even buy 32GB sticks so if not make that 64 slots.
    Then there's the cost (~$5k), heat etc.
    It's a non starter.
  • zmeul - Tuesday, July 26, 2016 - link

    I saw this: https://i.imgur.com/IUIixNF.png
    where did they got that 4590MB/s from? there isn't a NVMe SSD in existence that can do that .. to my knowledge
    unless they put them in RAID0 since the card supports 2 m.2 SSDs, but compared it against a single m.2 ?!?!?!
    this makes no sense
  • testbug00 - Tuesday, July 26, 2016 - link

    they used the FPS for that. As 92/17 * 848 (I think that is what the one on the left says) = about 4600.
  • testbug00 - Tuesday, July 26, 2016 - link

    worded badly, that number is how much the frames use. Each 8K frame is about 50MB.
  • zmeul - Tuesday, July 26, 2016 - link

    same deal
    how do you get 4.5GB/s out of a SSD?
  • Eden-K121D - Tuesday, July 26, 2016 - link

    Raid 0
  • Ryan Smith - Tuesday, July 26, 2016 - link

    Bingo.
  • Demiurge - Tuesday, July 26, 2016 - link

    More likely that the data is not overcoming the internal SSD cache RAM and is being burst out. Though, I will admit, I haven't done the numbers, but it is possible.
  • TheITS - Tuesday, July 26, 2016 - link

    Looks to me like a very niche case where the video card needs more memory to work with and the system RAM is already being maxed just talking with the CPU, so it'd hurt performance more to share the faster system memory than putting slower memory on the GPU.
  • serendip - Tuesday, July 26, 2016 - link

    For those use cases where 32 GB GPU RAM or even 128 GB system RAM isn't enough... like what exactly? It seems a bit odd to need so much local storage for a single card. If it's for HPC for oil and gas, wouldn't a cluster of GPUs work better?
  • HrD - Tuesday, July 26, 2016 - link

    I'm confused about the GPU in use here. Some sites report that according to the information they got from AMD "The Radeon Pro SSG has a single graphics processor based on Fiji architecture, also used in the company's dual-GPU Radeon Pro Duo." while others (such as Anandtech) report that again according to information given by AMD the card uses a Polaris chip. Which one is it?
  • FMinus - Tuesday, July 26, 2016 - link

    All those new Radeon PRO cards are based on Polaris.
  • BMNify - Tuesday, July 26, 2016 - link

    It's clearly not polaris
  • HollyDOL - Tuesday, July 26, 2016 - link

    This is a case where I could imagine a good use of 3D X-Point instead of those SSDs...
  • SentinelBorg - Tuesday, July 26, 2016 - link

    This was exactly my thought after reading the first few lines. 3D X-Point would be perfect for this.
  • mjcutri - Tuesday, July 26, 2016 - link

    This card isn't about speed, it's about capacity. I used to run FEA in SolidWorks and would run out of memory (vid + sys) all the time on complex models. This would have helped tremendously.
    Like the article said, it's directed at very specific cases - Oil & gas, FEA within cad, etc. It's not directed at consumers in any way.
  • vladx - Tuesday, July 26, 2016 - link

    Good luck selling $750 worth of hardware for $9999, AMD. Very innovative indeed LOL
  • FMinus - Tuesday, July 26, 2016 - link

    They are not selling to you, they are selling to companies which can afford to pay such prices. Do you really think any FirePRO, RadeonPRO, Quadro, Tesla chip is worth what they are asking for, considering most of those products are based on consumer variants which sells for ten times less? No, but the market is different thus the pricing is adjusted.
  • vladx - Tuesday, July 26, 2016 - link

    > They are not selling to you, they are selling to companies which can afford to pay such prices.

    If it was only addressed to big companies and not developers as well, they wouldn't have presented it as a developer kit.

    > Do you really think any FirePRO, RadeonPRO, Quadro, Tesla chip is worth what they are asking for...

    Yes the price of those cards is worth their price since they cover a lot more use cases while this solution's applications can be counted on one's hand.
  • xrror - Tuesday, July 26, 2016 - link

    I like the theory posted earlier that this card exists because a large customer (like oil & gas, government project, etc.) desired a solution for a specific usage case.

    What's completely maddening though, is this inadvertently really is a tease for something that AMD has the rather unique product portfolio to have made a truly custom SKU for and didn't.

    If you consider AMD's APU experience, and that they would already plan (i assume!) Polaris being designed to be used in their APU architecture + Polaris already is designed as a unified memory and native virtualization support:

    I imagine a custom "APU" or SoC that is mainly Polaris + NVe storage controller specifically for this application to be well within AMD's capability.

    I really hope they have an opportunity to make a gen 2 of this project, just because it's so different. It also could morph into a sort of "expandable FireGL" or nVidia Tesla competitor?
  • BMNify - Tuesday, July 26, 2016 - link

    There are pictures floating around from the back of the card, exposing the PCB. We can clearly see that there are no GDDR5 memory chips soldered onto the PCB. The huge GPU package has a huge load of capacitors and there are one 8-pin and one 6-pin power connector, suggesting a TBP of >225-300 Watt.

    Raja did not mention Polaris for this product, neither does AMD in any info text. Other sites are already reporting that the devkit uses Fiji.

    Where does your info with Polaris 10 come from?
  • BMNify - Tuesday, July 26, 2016 - link

    confirmation by Robert Hallock:
    https://twitter.com/Thracks/status/757992332583067...
  • MLSCrow - Tuesday, July 26, 2016 - link

    I have to admit, although this product may only target an extremely niche market, the fact that AMD continues to innovate and come out with new technologies, even if those technologies are eventually mimicked and improved upon by other companies, is still a breath of fresh air every time. I like this product, even if I don't have a particular use for it, I still like that they are trying new things, having new visions, boldly going....ok sorry.
  • xenol - Tuesday, July 26, 2016 - link

    This might be something that actually uses M.2's bandwidth, as nothing else I've seen outside of benchmarks do.
  • pogostick - Tuesday, July 26, 2016 - link

    This comment board is much worse than usual. What's with all the vitriol?
  • AnnonymousCoward - Tuesday, July 26, 2016 - link

    Raw 8K 24-bit 60Hz should require >6GB/s (7680*4320*60*24/1e9/8), and 92Hz is >9.2GB/s. How does 4.6GB/s get the job done?
  • Mugur - Wednesday, July 27, 2016 - link

    Going back to Earth, I think that it would be a great idea to bundle a mainstream card like a Radeon RX 480 with am M.2 SSD for a good price.
  • Chaser - Thursday, July 28, 2016 - link

    Unexpected and impressive. I buy Nvidia (for now), but WTG AMD!
  • msroadkill612 - Saturday, April 29, 2017 - link

    A belated post.

    Its an awesome notion. News to me. Well done Ryan.

    Recent such raid 0 benches i have seen indicate such an array could easily exceed the bandwidth of 4 lanes, or 4GBps.

    they got
    "ATTO Read 2491 3314 +33
    ATTO Write 1568 3034 +93"

    with and w/o raid on an intel mobo w/ 3x native m.2 slots.

    Whats v interesting & important is raid0 brings read and write speeds into ~parity. Many operations require both, so the slower dictates the pace.

    If so, the effective pace really has doubled by striping 2 drives.

    AMD could well do much better numbers on their cards.

    A terabyte of space seems excessive, given 8GB is generous vram now (a big jump for coders to get their heads around and use), but i think the samsung ssds are fastest in the 512GB size.

    Yes, the prospects of it being used as a resource by the cpu via the 16 lane gpu bus, seem good. It would be the fastest storage on the system.

    Now u have me wondering Ryan. Could it be a killer feature on future top end vega gpu cards?

    Its pretty consistent w/ their stated vega philosophy/features (as of apr 2017)- memory sharing/layers of memory like l1 l2 l3 cache on a cpu.

    A massive l3 gpu cache perhaps?

    We know they will add hbm2, why not a ssd controller?

    If its real for vega, curiously they have gone noth ways. vram is much faster, and have added a ~memory layer that is much slower, but huge.
  • msroadkill612 - Saturday, April 29, 2017 - link

    clarification: the intel mobo above, had 3x m.2 sockets native, but only 2 were used for a raid0 array.

Log in

Don't have an account? Sign up now