Neat. I am wondering what softwares can take theses beast into good use? A lot of computational chemistry software still uses no CUDA or if they do, use it for simple stuff.
SpaceX uses NVIDIA GPUs to simulate the chemical mixing within virtual combustion chambers, down to the molecular and nanosecond scales, while designing their methane-based Raptor engine. They've been waiting for NVLink to speed up their work even further.
Here is their demonstration while lecturing at GPU Technology Conference (I've queued it up to where it gets interesting, but feel free to rewind and watch it all) https://youtu.be/vYA0f6R5KAI?t=1456
As is the power consumption. The GPUs are allegedly rated at about 300w each, along with two high-core count Xeons and associated hardware. At full tilt, these things could be pushing almost 3kw. There is no such thing as a free ride... and those are pretty darn impressive machines.
"ASCI White consumed 3 megawatts of power. The DGX-1 will consume ~3 KW (maybe a little less, I doubt they'd design it to run at >90% power supply spec). So 3.44x the performance for >1000x less power. ~3500x more energy efficient."
Is NVlink fast enough/low enough in latency that it could allow two GPUs in SLI to combine their ram instead of having to keep duplicate copies of all game assets?
It's not being discussed here, since the initial focus is all on compute. However it's a discussion we're going to need to have again once consumer Pascal approaches. 1 NVLink is ~100 pins/traces, so that may be an issue.
(If I had to take a guess, any kind of SLI sharing may be something that requires dev involvement ala DX12 Explicit)
80 GBps for NVLink between interposers (assuming all 4 links are slaved together in a dual-GPU card), vs 720 GBps for the 16 GB HBM2 on-interposer - seems like just a tad asymmetrical...
Almost a factor of 10 difference, so definitely a case of NUMA.
Well, even if you used all four NVlinks, directly between two GPU's, that's still only 80GB/sec. Around 10% of the speed of the native HBM2 VRAM (~720GB/sec I believe) so ... probably not fast enough really to do something like that without some major performance issues. You can see, it is really designed for compute, and I have a feeling we will not see NVLink in the consumer incarnations of these GPU's, and instead NV will switch to an over-the-pcie SLI system, like AMD did with Hawaii/XDMA.
And yet, 16 lanes of PCIe Gen3 have less than 16GB/s of bandwidth. So, a single NVLink is already faster. All 4 NVLink make it over 5x faster. Moreover, NVLink seems to be designed for memory transfer (NUMA capabilities) while PCIe has only a fraction of it.
So, you may be right (doubtful that they implement 4 lanes of NVLink for SLI), but perf wise, it would definitely make sense and be faster than PCIe or Main RAM access.
Yeah it would be faster, but SLI doesnt use direct RAM access, it just needs to transfer the frame buffer from the secondary card(s) to the main one and also pass some synchronization stuff as well. NVLink is just too many pins. The existing SLI connector is like 10 pins. NVLink is 100 pins per lane. The connectors for that would be rather expensive.
Probably because they have been designing this system for a while. I wouldn't be surprised if they switch to v4 CPU's during the production of these, since they are compatible with the same socket/platform. Although most of the heavy lifting on these are going to be done by the GPU's not the CPU's so I doubt NV really cares much.
Only ~8TB of SSD storage?? Seems a bit too small for me... A server should have way more right..? Or am I just terribly misinformed..? If you're going to have a massive server you're gonna have to spend well over half a million on this.
With the architecture of the components in this build, it makes me wonder if GPU's and GDDR will become discrete components, much like the CPU and DDR are already. Think of it, motherboards with slots for both a CPU and GPU, and variable amounts of RAM for each. Not that I truly mind if that does or doesn't happen, but it's an interesting thought to think what computers will look like in 10+ years.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
31 Comments
Back to Article
SaolDan - Wednesday, April 6, 2016 - link
Neat!slayek - Wednesday, April 6, 2016 - link
Neat. I am wondering what softwares can take theses beast into good use? A lot of computational chemistry software still uses no CUDA or if they do, use it for simple stuff.Eidigean - Wednesday, April 6, 2016 - link
SpaceX uses NVIDIA GPUs to simulate the chemical mixing within virtual combustion chambers, down to the molecular and nanosecond scales, while designing their methane-based Raptor engine. They've been waiting for NVLink to speed up their work even further.Here is their demonstration while lecturing at GPU Technology Conference (I've queued it up to where it gets interesting, but feel free to rewind and watch it all) https://youtu.be/vYA0f6R5KAI?t=1456
Shadow7037932 - Wednesday, April 6, 2016 - link
That was an interesting presentation. Thanks for the link.esoel_ - Wednesday, April 6, 2016 - link
But does it run crysis in VR ?Jon Tseng - Wednesday, April 6, 2016 - link
... With Iray ray tracing...madwolfa - Wednesday, April 6, 2016 - link
But will it run Crysis..?Jokes aside... great looking machine. The amount of horsepower packed in one box is astonishing.
bill.rookard - Wednesday, April 6, 2016 - link
As is the power consumption. The GPUs are allegedly rated at about 300w each, along with two high-core count Xeons and associated hardware. At full tilt, these things could be pushing almost 3kw. There is no such thing as a free ride... and those are pretty darn impressive machines.madwolfa - Wednesday, April 6, 2016 - link
"ASCI White consumed 3 megawatts of power. The DGX-1 will consume ~3 KW (maybe a little less, I doubt they'd design it to run at >90% power supply spec). So 3.44x the performance for >1000x less power. ~3500x more energy efficient."https://www.reddit.com/r/hardware/comments/4dhh0s/...
webdoctors - Wednesday, April 6, 2016 - link
Just amazing how far we've come in 10 years.SirFlamenco - Wednesday, June 10, 2020 - link
4 years later and the DGX-3 eats it for breakfastpwingert - Thursday, April 7, 2016 - link
In the words of Ash Ketchum renowned Pokémon trainer...Science sure is great!pwingert - Thursday, April 7, 2016 - link
Cranks up the Nukes Homer! Nvidia's DGX1 data center is starving for power again!DanNeely - Wednesday, April 6, 2016 - link
Is NVlink fast enough/low enough in latency that it could allow two GPUs in SLI to combine their ram instead of having to keep duplicate copies of all game assets?Ryan Smith - Wednesday, April 6, 2016 - link
It's not being discussed here, since the initial focus is all on compute. However it's a discussion we're going to need to have again once consumer Pascal approaches. 1 NVLink is ~100 pins/traces, so that may be an issue.(If I had to take a guess, any kind of SLI sharing may be something that requires dev involvement ala DX12 Explicit)
boeush - Wednesday, April 6, 2016 - link
80 GBps for NVLink between interposers (assuming all 4 links are slaved together in a dual-GPU card), vs 720 GBps for the 16 GB HBM2 on-interposer - seems like just a tad asymmetrical...Almost a factor of 10 difference, so definitely a case of NUMA.
extide - Wednesday, April 6, 2016 - link
Well, even if you used all four NVlinks, directly between two GPU's, that's still only 80GB/sec. Around 10% of the speed of the native HBM2 VRAM (~720GB/sec I believe) so ... probably not fast enough really to do something like that without some major performance issues. You can see, it is really designed for compute, and I have a feeling we will not see NVLink in the consumer incarnations of these GPU's, and instead NV will switch to an over-the-pcie SLI system, like AMD did with Hawaii/XDMA.frenchy_2001 - Wednesday, April 6, 2016 - link
And yet, 16 lanes of PCIe Gen3 have less than 16GB/s of bandwidth.So, a single NVLink is already faster. All 4 NVLink make it over 5x faster.
Moreover, NVLink seems to be designed for memory transfer (NUMA capabilities) while PCIe has only a fraction of it.
So, you may be right (doubtful that they implement 4 lanes of NVLink for SLI), but perf wise, it would definitely make sense and be faster than PCIe or Main RAM access.
extide - Thursday, April 7, 2016 - link
Yeah it would be faster, but SLI doesnt use direct RAM access, it just needs to transfer the frame buffer from the secondary card(s) to the main one and also pass some synchronization stuff as well. NVLink is just too many pins. The existing SLI connector is like 10 pins. NVLink is 100 pins per lane. The connectors for that would be rather expensive.iwod - Wednesday, April 6, 2016 - link
Why Intel Xeon E5 v3 when V4 is available? Surely Nvidia knew this.extide - Wednesday, April 6, 2016 - link
Probably because they have been designing this system for a while. I wouldn't be surprised if they switch to v4 CPU's during the production of these, since they are compatible with the same socket/platform. Although most of the heavy lifting on these are going to be done by the GPU's not the CPU's so I doubt NV really cares much.iwod - Wednesday, April 6, 2016 - link
Well it would make sense if they switch to v4 just for the power savings. Even if the task are not even CPU bounded.remosito - Thursday, April 7, 2016 - link
With a power consumption of 3000 Watt, a few measly Watts saved from v4 is not gonna make much of a difference.Des_Eagle - Thursday, April 7, 2016 - link
My research group can really use one of these...this thing is made for computation electromagnetics.HideOut - Thursday, April 7, 2016 - link
So why didnt they release this monster with the newest intel V4 dual socket Xeon with 22 cores pro socket?NerroEx - Tuesday, April 12, 2016 - link
Only ~8TB of SSD storage?? Seems a bit too small for me... A server should have way more right..? Or am I just terribly misinformed..? If you're going to have a massive server you're gonna have to spend well over half a million on this.will1956 - Tuesday, April 12, 2016 - link
my guess it would be linked up to high/ultra high speed external storage.the InfiniBand EDR can handle ~100 Gb/sec and this has 4
BrownCoat40 - Tuesday, April 12, 2016 - link
With the architecture of the components in this build, it makes me wonder if GPU's and GDDR will become discrete components, much like the CPU and DDR are already. Think of it, motherboards with slots for both a CPU and GPU, and variable amounts of RAM for each. Not that I truly mind if that does or doesn't happen, but it's an interesting thought to think what computers will look like in 10+ years.benzosaurus - Wednesday, April 13, 2016 - link
Looks like Nvidia's now changed the metric from "can it run Crysis?" to "how many Crysis instances can it run simultaneously?"Eden-K121D - Wednesday, April 13, 2016 - link
It would help in processing data from radio array telescopeslzhao403 - Tuesday, May 17, 2016 - link
8 GPUs in one BOX? Wandering how they handle heating problem