I'm guessing ~100Mh/s from each GPU, x16 = 1.6Gh/s
Power will be ~220W each (maybe less for these newer babies) x16 = 3.52kW
Add in a couple of meaty Platinum 8180's, that can draw 205W each, but will likely draw 40 something watts (off the cuff guess), whilst idling away. Couple that will some mammoth m/b that want a couple hundred watts, all that RAM & unused NVME SSDs we'll round that up to 500W.
So my guestimation is ~4kW, lets throw in 5% PSU conversion losses (I suppose they are gonna be good), so we are looking at 4.2kW continuous power draw, for Ethereum mining. Less than 5kW total for sure though.
So $?
Annually: 46.46973093 coins mined. Power Cost [10c/kWh] (in USD) $3,679.20 Profit (in USD) $17,398.55 per year. Days to break even: 8391.50 Day(s).
If we re-run this with my UK energy costs:
Days to break even: 9610.94 Day(s).
Clearly I am on holiday too - to even bother with this response.
In addition, someone ran the V100's on an Amazon AWS cluster for nearly an hour, and with all costs considered, came out at *negative* $25k USD/year. Interesting though, and well written up.
It would be interesting to see a comparison between the DGX-2 and the POWER9 systems with NVLink to the processors. I don’t know offhand how many GPUs you can stuff in the IBM, but it seems like there is a lot more bandwidth.
It is notable that NVIDIA went with Xeons. Because POWER would be redundant, or some combination of price/perfomance/energy advantage.
OK, quick check - 6 GPUs max in IBM. But, if they built an NVLink switch for it, it would attach to PCIe 4 vs. PCIe 3. But again, at what price and energy usage.
Probably also a question of software availability and development. That bandwidth tho.
IBM doesn't use the PCI bus for GPU-GPU links (it has PCIe4 available for other uses however). Bandwidth for GPU-GPU (and CPU-GPU) communications is 300GB/s, same as DGX-2 above. The difference is all 16 GPUs (4xV100 for air-cooled servers) in the IBM box enjoy that bandwidth, not just the 4 GPUs on each V100 card (because they are attached to a PCIe bus in regular x86 servers). IBM has another advantage -- the memory can be extended to system RAM (max of 2TB per server). IBM is not limited to "just" 16GB or 32GB per card. GPU bandwidth and memory sizing are really important for scaling GPU jobs past a few V100 cards. IBM's Distributed Deep Learning library, for example, has already been shown to scale to 512 GPUs in a cluster-like design.
First of all, NVIDIA started investing in GPU compute when AMD was at its peak. Second of all, NVIDIA's data center business is profitable, highly profitable, in fact.
The major part gaming plays is that it gives NVIDIA economy of scale when designing and producing the chips.
It's also worth mentioning that gaming also benefits from NVIDIA's AI efforts. Without AI there would probably be no denoising solution for real time ray tracing, or at least it would run a lot more slowly because there would be no tensor cores and no work on TensorRT.
When did real time ray tracing actually become a thing? I heard about the recently announced ray tracing extensions to the DirectX 12 and Vulcan APIs, but is this something that makes sense (performance/FPS wise) for the current generation of GPUs or do they just prepare for the next one? If we are talking about the current one which GPU card would be powerful enough to fully ray trace a game at 30+ fps? Would the 1080Ti be powerful enough, even at 1080p (higher resolutions must be out of the question)? Or are they just talking about "ray tracing effects" with perhaps 15 - 20% of each scene ray traced and the rest of it rasterized?
Pixar's Renderman software has been used by pretty much every winner of a Best Visual Effect Oscar, and the software itself even won a lifetime achievement Academy Award. The Renderman software did not support ray tracing until Pixar made the movie Cars (2006) because it was too computationally expensive for movies. However computational speed has been ramping up quickly.
Real time ray tracing started in 2005 with a SIGGRAPH demo. In 2008 Intel did a 15-30 fps demo of Enemy Territory: Quake Wars using real time ray tracing on a 16 core, ~3GHz machine. This was their third attempt, but I couldn't find any info on the first two. They did a 40-80 fps demo in 2010 of the 2009 title Wolfenstein, but they had to use a cluster of 4 computers, each with a Knights Ferry card. For comparison, a single V100 chip is 5x as powerful as the entire cluster, and that's not counting the Tensor cores.
While ray tracing is currently much slower, it scales better with more complex scenes. GPU acceleration has also taken off. NVIDIA is now starting to push the idea of using real time ray tracing for some effects on their Volta chips (NVIDIA RTX Technology).
Correct me if I'm wrong, but isn't it cheaper to run 2 DGX-1's and get the same compute power? Also it uses less power. $399,000.00 vs $298,000.00 and 10KW vs 7KW.
For the people buying these, density is just as important as those other factors. Like they have a certain amount of rack space dedicated to DGX units, and now they can get double the GPU performance (along with more RAM/CPU/Storage) in the same space as before.
One of the selling points there that DGX-2 equips V100s with 32 GB of HBM instead of 16 GB. The other one is that it employs much faster fabric. These two could drive significant improvement in performance on some workloads that has memory or bandwidth bottlenecks on DGX-1.
It depends on the workload. Compute performance is useless if it is heavily constrained by communications bandwidth. The DGX-2 allows every GPU in the node direct memory access at 300 GB/s to all HBM memory in the node (the memory on its own package is still 900 GB/s of course). That's one big pool of 512 GB. Two DGX-1s linked via Infiniband or Ethernet would have two pools of 256 GB (after upgrading to 32 GB V100s). Within one pool (node) each GPU would enjoy only 50 or 100 GB/s of direct memory access bandwidth and between pools the latency for memory operations would be much higher and the bandwidth would be lower than in the DGX-2 where the operations are carried out within one pool.
You're not wrong. The point of this thing is mainly to efficiently scale up neural network training to larger models than you could fit on the 8 V100's that a DGX-1 can host.
If you don't need more than 8x V100s in a single box, then you probably wouldn't be buying this.
Not anymore. It may take some time to trickle to the OEMs, but all V100 are now fitted with 32GB of HBM2 and NV will keep selling DGX1. So, older DGX1 will hve 16GB/GPU, but newer ones, comparable to the DGX2, will have 32GB. Biggest difference will be density and fabric. DGX2 allows all 16 GPUs to share their memory.
Guys, i know, you love vulgar girls What about online communication with them without limits? Here http://lonaism.ga you can find horny real girls from different countries.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
28 Comments
Back to Article
WithoutWeakness - Tuesday, March 27, 2018 - link
But can it run Crysis in 4K?ToTTenTranz - Tuesday, March 27, 2018 - link
Asking the real questions.Holliday75 - Tuesday, March 27, 2018 - link
We've moved on from Crysis and 4k. Now its how many coins can it mine?Notmyusualid - Wednesday, March 28, 2018 - link
@ Holliday75Indeed, and beat me to it.
I'm guessing ~100Mh/s from each GPU, x16 = 1.6Gh/s
Power will be ~220W each (maybe less for these newer babies) x16 = 3.52kW
Add in a couple of meaty Platinum 8180's, that can draw 205W each, but will likely draw 40 something watts (off the cuff guess), whilst idling away. Couple that will some mammoth m/b that want a couple hundred watts, all that RAM & unused NVME SSDs we'll round that up to 500W.
So my guestimation is ~4kW, lets throw in 5% PSU conversion losses (I suppose they are gonna be good), so we are looking at 4.2kW continuous power draw, for Ethereum mining. Less than 5kW total for sure though.
So $?
Annually:
46.46973093 coins mined.
Power Cost [10c/kWh] (in USD) $3,679.20
Profit (in USD) $17,398.55 per year.
Days to break even: 8391.50 Day(s).
If we re-run this with my UK energy costs:
Days to break even: 9610.94 Day(s).
Clearly I am on holiday too - to even bother with this response.
In addition, someone ran the V100's on an Amazon AWS cluster for nearly an hour, and with all costs considered, came out at *negative* $25k USD/year. Interesting though, and well written up.
SiSiX - Tuesday, March 27, 2018 - link
I don't know about 4k, but I would think it could finally play it at 640x480 at the lowest settings...probably. ;)Jon Tseng - Wednesday, March 28, 2018 - link
Think you'll struggle. May still have to dial down the FSAA settings. :-pSantoval - Friday, March 30, 2018 - link
I am pretty sure it can run Crysis 4 at 16K with plenty of GPU and CPU power to spare for other stuff.THE1ABOVEALL - Thursday, May 3, 2018 - link
Sarcasm got deleted out of your dictionary, I see.The Hardcard - Tuesday, March 27, 2018 - link
It would be interesting to see a comparison between the DGX-2 and the POWER9 systems with NVLink to the processors. I don’t know offhand how many GPUs you can stuff in the IBM, but it seems like there is a lot more bandwidth.It is notable that NVIDIA went with Xeons. Because POWER would be redundant, or some combination of price/perfomance/energy advantage.
The Hardcard - Tuesday, March 27, 2018 - link
OK, quick check - 6 GPUs max in IBM. But, if they built an NVLink switch for it, it would attach to PCIe 4 vs. PCIe 3. But again, at what price and energy usage.Probably also a question of software availability and development. That bandwidth tho.
o0rainmaker0o - Tuesday, June 19, 2018 - link
IBM doesn't use the PCI bus for GPU-GPU links (it has PCIe4 available for other uses however). Bandwidth for GPU-GPU (and CPU-GPU) communications is 300GB/s, same as DGX-2 above. The difference is all 16 GPUs (4xV100 for air-cooled servers) in the IBM box enjoy that bandwidth, not just the 4 GPUs on each V100 card (because they are attached to a PCIe bus in regular x86 servers). IBM has another advantage -- the memory can be extended to system RAM (max of 2TB per server). IBM is not limited to "just" 16GB or 32GB per card. GPU bandwidth and memory sizing are really important for scaling GPU jobs past a few V100 cards. IBM's Distributed Deep Learning library, for example, has already been shown to scale to 512 GPUs in a cluster-like design.Kvaern1 - Tuesday, March 27, 2018 - link
Well, don't call gamers useless. Footing the research bill for supercomputers, self driving cars and whatnot since AMD went defunct.StrangerGuy - Tuesday, March 27, 2018 - link
All NV needs to do is sell one DGX-2 and it would made more profit than all of Vega combined.Yojimbo - Tuesday, March 27, 2018 - link
First of all, NVIDIA started investing in GPU compute when AMD was at its peak. Second of all, NVIDIA's data center business is profitable, highly profitable, in fact.The major part gaming plays is that it gives NVIDIA economy of scale when designing and producing the chips.
Yojimbo - Tuesday, March 27, 2018 - link
It's also worth mentioning that gaming also benefits from NVIDIA's AI efforts. Without AI there would probably be no denoising solution for real time ray tracing, or at least it would run a lot more slowly because there would be no tensor cores and no work on TensorRT.Santoval - Friday, March 30, 2018 - link
When did real time ray tracing actually become a thing? I heard about the recently announced ray tracing extensions to the DirectX 12 and Vulcan APIs, but is this something that makes sense (performance/FPS wise) for the current generation of GPUs or do they just prepare for the next one?If we are talking about the current one which GPU card would be powerful enough to fully ray trace a game at 30+ fps? Would the 1080Ti be powerful enough, even at 1080p (higher resolutions must be out of the question)? Or are they just talking about "ray tracing effects" with perhaps 15 - 20% of each scene ray traced and the rest of it rasterized?
jbo5112 - Monday, April 23, 2018 - link
Pixar's Renderman software has been used by pretty much every winner of a Best Visual Effect Oscar, and the software itself even won a lifetime achievement Academy Award. The Renderman software did not support ray tracing until Pixar made the movie Cars (2006) because it was too computationally expensive for movies. However computational speed has been ramping up quickly.Real time ray tracing started in 2005 with a SIGGRAPH demo. In 2008 Intel did a 15-30 fps demo of Enemy Territory: Quake Wars using real time ray tracing on a 16 core, ~3GHz machine. This was their third attempt, but I couldn't find any info on the first two. They did a 40-80 fps demo in 2010 of the 2009 title Wolfenstein, but they had to use a cluster of 4 computers, each with a Knights Ferry card. For comparison, a single V100 chip is 5x as powerful as the entire cluster, and that's not counting the Tensor cores.
While ray tracing is currently much slower, it scales better with more complex scenes. GPU acceleration has also taken off. NVIDIA is now starting to push the idea of using real time ray tracing for some effects on their Volta chips (NVIDIA RTX Technology).
WorldWithoutMadness - Wednesday, March 28, 2018 - link
will someone use it for mining?steve wilson - Wednesday, March 28, 2018 - link
Correct me if I'm wrong, but isn't it cheaper to run 2 DGX-1's and get the same compute power? Also it uses less power. $399,000.00 vs $298,000.00 and 10KW vs 7KW.A5 - Wednesday, March 28, 2018 - link
For the people buying these, density is just as important as those other factors. Like they have a certain amount of rack space dedicated to DGX units, and now they can get double the GPU performance (along with more RAM/CPU/Storage) in the same space as before.mode_13h - Monday, April 2, 2018 - link
No. Rack space isn't *that* valuable, especially when you consider the energy efficiency penalty of all the NVSwitches.eSyr - Wednesday, March 28, 2018 - link
One of the selling points there that DGX-2 equips V100s with 32 GB of HBM instead of 16 GB. The other one is that it employs much faster fabric. These two could drive significant improvement in performance on some workloads that has memory or bandwidth bottlenecks on DGX-1.mode_13h - Monday, April 2, 2018 - link
No, the memory increase is across-the-board. So, DGX-1 will also now ship with 32 GB per V100.Yojimbo - Friday, March 30, 2018 - link
It depends on the workload. Compute performance is useless if it is heavily constrained by communications bandwidth. The DGX-2 allows every GPU in the node direct memory access at 300 GB/s to all HBM memory in the node (the memory on its own package is still 900 GB/s of course). That's one big pool of 512 GB. Two DGX-1s linked via Infiniband or Ethernet would have two pools of 256 GB (after upgrading to 32 GB V100s). Within one pool (node) each GPU would enjoy only 50 or 100 GB/s of direct memory access bandwidth and between pools the latency for memory operations would be much higher and the bandwidth would be lower than in the DGX-2 where the operations are carried out within one pool.mode_13h - Monday, April 2, 2018 - link
You're not wrong. The point of this thing is mainly to efficiently scale up neural network training to larger models than you could fit on the 8 V100's that a DGX-1 can host.If you don't need more than 8x V100s in a single box, then you probably wouldn't be buying this.
eSyr - Wednesday, March 28, 2018 - link
DGX-1 has only 8×16 == 128 GB of GPU memory.frenchy_2001 - Wednesday, March 28, 2018 - link
Not anymore. It may take some time to trickle to the OEMs, but all V100 are now fitted with 32GB of HBM2 and NV will keep selling DGX1.So, older DGX1 will hve 16GB/GPU, but newer ones, comparable to the DGX2, will have 32GB.
Biggest difference will be density and fabric. DGX2 allows all 16 GPUs to share their memory.
Ferrynthia - Sunday, April 1, 2018 - link
Guys, i know, you love vulgar girlsWhat about online communication with them without limits? Here http://lonaism.ga you can find horny real girls from different countries.