Well, they'll presumably release the GV102 in some more affordable flavor of Titan. Titan V0?
And maybe they'll eventually release a GV100 in a PCIe card with all four HBM2 stacks... although that'll probably receive Quadro branding & price point, like the Quadro GP100.
So, after Titan Xp and Titan V(ista), next should be Titan 7. That's a definite buy, but I'd skip Titan 8. Maybe go for Titan 8.1, or just hold out for Titan 10.
OK, while I agree that nobody should buy a Titan V for gaming you say: "Already after Battlefield 1 (DX11) and Ashes (DX12), we can see that Titan V is not a monster gaming card, though it still is faster than Titan Xp. "
Uh... yeah that's wrong. Anything that's faster than a Titan XP *is* a monster gaming card by definition. It's just not a very good purchase for $3000 since it's not really targeted towards gaming.
I agree with this. Cost aside - it's the best gaming card on the market. Unless NVIDIA launches the 1180/2080/whatever80 soon, this card will be in Terry Crews' Old Spice rig by the end of the month.
"Uh... yeah that's wrong. Anything that's faster than a Titan XP *is* a monster gaming card by definition."
The big issue right now is that it's not consistently faster, especially at the 99th percentile Or for that matter, not as bug-free as it needs to be.
I'm going to be surprised if it doesn't get better with later drivers. But for the moment, even if you throw out the price, it's kind of janky in games.
How much of that could be due to the disabled HBM2 stack? Since the GPU was designed to have 4 fully-functional stacks, perhaps there are some bottlenecks when running it with only 3.
Back when it was released, it was all about running 1080p Max. Today it's running 4K Max. Next challenge is 8K Max settings, given the Dell UP3218K exists.
Great preview of the new Titan V card thank you. I really got a kick out of the "But Can It Run Crysis?" part and you actually gave numbers to back up that question which of coarse it can pretty well.
Crysis at 8K equivalent! 4k = 8 million pixels x 4 super sampled becomes 32 MP. Tho that means there'd be no AA if run on an 8k monitor..and what about 99th percentile? ;)
The good news is that we're working on that for the full review. The deep learning frameworks are a lot harder to test, and we were running out of time ahead of the holiday break, so it had to get pulled. It's definitely looking interesting though.
Blame problems with either the memory controller and/or the interposer connecting the HBM2. NVidia kept the core count the same vs Tesla, but dropped one of the 4 ram stacks; so we know that something related to that was the biggest failure point in turning dies into even more expensive Tesla cards.
Titan V refresh might get the 4th stack for 16GB in a year or so if yields improve enough to justify it; otherwise the question for more ram becomes if/when 8GB stacks of HBM2 are available in sufficient quantity.
Nvidia's market segmentation tactics won't allow for 16 GB of HBM2 at such a bargain price. You might get it in the form of a Quadro GV100, however, for >= $2x of Titan V.
I fully expect the rumours to be true and Volta to be skipped for gaming by NV - already we heard of a replacement (Ampere) and this points that way too. Just a feeling though...
The "But can it run Crysis" joke started with the original Crysis in 2007. So it was only appropriate that we use it for that test. Especially since it let us do something silly like running 4x supersample anti-aliasing.
They make it pretty clear everywhere this card is meant for ML training.
It's the only scenario where it makes sense financially.
Gaming is a big NO at 3K dollars per card. Mining is a big NO with all the cheaper specific chips for the task.
On ML it may mean halving or cutting by 4 the training time on a workstation, and if you have it running 24/7 for hyperparameter tuning it pays itself compared to the accumulated costs of Amazon or Google cloud machines.
An SLI of titans and you train huge models under a day in a local machine. That's a great thing to have.
Will game developers be able to use these tensor cores to make the AI in their games smarter? That would be cool if AI shifted from the CPU to the GPU.
First and formost, that depends if mainstream Volta cards get tensor cores.
Beyond that I'm not sure how much it'd help there directly, AFAIK what Google/etc are doing with machine learning and neural networks is very different from typical game AI.
The part of the article about Volta no longer having a superscalar architecture is incorrect. Although there is only one warp scheduler per SM partition (what do you call those things anyway?), each clock cycles only serves half a warp, so it takes two clock cycles for an instruction to feed into one of the execution pipelines, but during the second cycle, the warp schedular is free is issue a second instruction to one of the other pipelines. IIRC, Fermi did this too.
Also, the part about per-thread PC and Stack is misleading. Warps are still executing (or not executing) from a single instruction sequence. The threads within a warp are not concurrently executing different instructions, nor are threads being dynamically shuffled between different warps - at least, not at a hardware level.
Interesting, so it have full floating point compute capabilities 1*fp64 -> 2*fp32 -> 4*fp16 + Tensor cores. But that half precision is only for CUDA? So no direct3d 12 minimum floating point precision.
A hell of a lot of "It works great but only if you buy and program exclusively for Nvidia!" stuff here. Reminds me of Sony's penchant for exclusive lock in stuff over a decade ago when they were dominant. Didn't work out for Sony then, and this is worse for customers as they'll need to spend money on both dev and hardware.
I'm sure some will be shortsighted enough to do so. But with Google straight up outbuying Nvidia for AI researchers (reportedly up to, or over, 10 million for just a 3 year contract) it's not a long term bet I'd make.
I assumed you've not heard of CUDA before? NVIDIA had long been the only game in town when it comes to gpgpu HPC. They're really a monopoly at this point, and researchers have no interest in making they're jobs harder by moving to a new ecosystem.
OpenCL is out there, and AMD has had some products that were more than competitive with Nvidia, in the past. I think Nvidia won HPC dominance by bribing lots of researchers with free/cheap hardware and funding CUDA support in popular software packages. It's only with Pascal that their hardware really surpassed AMD's.
True that Cuda seems to dominate HPC. I think Nvidia did a good job of cultivating the market for it.
The trick for them now is that most deep learning users use frameworks which aren't tied to any Nvidia-specific APIs. I know they're pushing TensorRT, but it's certainly not dominant in the way Cuda dominates HPC.
The problem is that even the gpu accelerated nn frameworks are still largely built first using cuda. torch, caffe and tensorflow offer varying levels of ocl support (generally between some and none). Why is this still a problem? Well, where are the ocl 2.1+ drivers? Even 2.0 is super patchy (mainly due to nvidia not officially supporting anything beyond 1.2). Add to this their most recent announcements about merging ocp into vulkan and you have yourself an explanation for why cuda continues to dominate. My hope is that khronos announce vulkan 2.0, with ocl being subsumed, very soon. Doing that means vendors only have to maintain a single driver (with everything consuming spirv) and nvidia would, basically, be forced to offer opencl-next. Bottom-line: if they can bring the ocl functionality into vulkan without massively increasing the driver complexity, I'd expect far more interest from the community.
Your mistake is focusing on OpenCL support as a proxy for AMD support. Their solution was actually developing OpenMI as a substitute for Nvidia's cuDNN. They have forks of all the popular frameworks to support it - hopefully they'll get merged in, once ROCm support exists in the mainline Linux kernel.
Of course, until AMD can answer the V100 on at least power-effeciency grounds, they're going to remain an also-ran, in the market for training. I think they're a bit more competitive for inferencing workloads, however.
What are you suggesting? GPU are a very customized piece of silicon and you have to code for them with optimization for each single architecture if you want to exploit them at the maximum. If you think that people buy $10.000 cards to be put in $100.000 racks for a multiple $1.000.000 server just to use open source not optimized not supported not guarantee code in order to make AMD fanboys happy, well, not, it's not like the industry works. Grow up.
I don't know if you've heard of OpenCL, but there's not reason why a GPU needs to be programmed in a proprietary language.
It's true that OpenCL has some minor issues with performance portability, but the main problem is Nvidia's stubborn refusal to support anything past version 1.2.
Anyway, lots of businesses know about vendor lock-in and would rather avoid it, so it sounds like you have some growing up to do if you don't understand that.
Grow up. I repeat. None is wasting millions in using not certified, supported libraries. Let's avoid talking about entire frameworks. If you think that researches with budgets of millions are nerds working in a garage with avoiding lock-in strategies as their first thought in the morning, well, grow up kid. Nvidia provides the resources to allow them to exploit their expensive HW at the most of its potential reducing time and other associated costs. Also when upgrading the HW with a better one. That's what counts when investing millions for a job. For you kid's home made AI joke, you can use whatever alpha library with zero support and certification. Others have already grown up.
No kid here. I've shipped deep-learning based products to paying customers for a major corporation.
I've no doubt you're some sort of Nvidia shill. Employee? Maybe you bought a bunch of their stock? Certainly sounds like you've drunk their kool aid.
Your line of reasoning reminds me of how people used to say businesses would never adopt Linux. Now, it overwhelmingly dominates cloud, embedded, and underpins the Android OS running on most of the world's handsets. Not to mention it's what most "researchers with budgets of millions" use.
"The integer units have now graduated their own set of dedicates cores within the GPU design, meaning that they can be used alongside the FP32 cores much more freely."
Yay! Nvidia caught up to gcn 1.0! Seriously, this goes to show how good the gcn arch was. It was probably too ambitious for its time as those old gpus have aged really well it took a long time for games to catch up.
<blockquote>Nvidia caught up to gcn 1.0!</blockquote> Yeah! It is known to the entire universe that it is nvidia that trails AMD performances. Luckly they managed to get this Volta out in time before the bankruptcy.
AMD can pay me half their marketing budget and I will still do better than them...by doing exactly nothing. Their marketing is worse than being in a state of non-existence.
It's true. All they had to do was pay some grad students to optimize HPC and deep learning software for their GPUs. They could've done that for the price of only a couple marketing persons' salaries.
That would not be a surprise. AMD strategy on SW support has always been leaving others (usually not professionist) do the job at their own cost. Results is that AMD HW has never had a decent SW support other than for gaming (and that's only because Sony and MS spend money for improving gaming performances for their consoles).
It *is* pretty big and burns about as much power. Yet, it's nowhere near as fast at deep learning. Even with its lower purchase price, it's still not operationally cost-competitive with GV100.
If you look at its feature set, it was really aimed at HPC and deep learning. In the face of Volta's tensor cores, it kinda fell flat, on the latter front.
Would be in line with the CUDA improvements. I.e, two 1080s would be much better at mining. Most of the uplift is in tensor performance, which no algo uses.
"For our full review hopefully we can track down a Quadro GP100"
YES. The oddity here is that the GP100 might end up being better than the Titan V at gaming due to having 128 ROPs vs. 96 ROPs and even higher memory bandwidth.
Outside of half precision matrix multiplication, the GP100 should be roughly ~43% faster due mainly to the difference in ALU counts in professional workloads. Boost clocks are a meager 25 Mhz difference. Major deviations beyond that 43% difference would be where the architectures differ. There is a chance benchmarks would come in below that 43% mark if memory bandwidth comes into play.
Absolute boost frequency is meaningless as we have already seen that those values are not respected anywhere with the use of boost 3.0. You can be higher or lower. What is important is the power draw and the temperatures. These are the limiting factors to reach and sustain the boost frequency and beyond. With a 800+mm^2 die and 21 billions of transistor you may expect that the consumption is not that low as for a 14 billion die, and the frequencies cannot be sustained that much. What is promising is that if these are the power drain of such a monster chip, the consumer grade chips made on the same PP will really be fresh and that the frequencies can be pushed really high to suck all the thermal and power drain gap. Just imagine a Volta/Ampere GP104 consuming 300+W, same as the new Vega GPU based custom cards. #poorVega
I can't take a titan-as-compute seriously if its double precision is disabled That, to me, makes it only aimed at graphics Yet this whole article is expressing the whole titan family as 'compute machines'...
It has all compute capability enabled: FP16, FP32, FP64 and the tensor cores. The double speed FP16 is just not (yet?) exposed to graphics applications.
The original Titan/Black/Z and this Titan V have uncrippled fp64. It's only the middle two generations - Titan X and Titan Xp - that use consumer GPUs with a fraction of the fp64 units.
The figure for tensor operations seems fishy, it's not based on tf.complex128 I guess more probably tf.uint8 or tf.int8 and then it's no longer FLOPS, maybe TOPS? I hope you take a look at that when you flesh out the tensorflow part. If they can do 110 TFLOPS of tf.float16, then it's very impressive but I doubt that.
"Each SM is, in turn, contains 64 FP32 CUDA cores, 64 INT32 CUDA cores, 32 FP64 CUDA cores, 8 tensor cores, and a significant quantity of cache at various levels." IIRC, the '64 FP32 cores' and '32 FP64 cores' are one and the same: the FP64 cores can operate as a pair of FP32 cores (same as GP100 can do two FP16 operations with the FP32 cores).
Why no one is talking about the Actual FLOPS/Peak FLOPS ? Clearly, achieving a constant 110TFLOPs that Titan has at disposal is simply not possible. What's the consistent FLOPS it can achieve before Memory Bandwidth becomes a bottleneck? When 12GB of VRAM isn't enough to hold all your data (Neural net training), then you're doing as good as previous gens.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
111 Comments
Back to Article
jjj - Wednesday, December 20, 2017 - link
Bah but this is last week's Titan, aren't they launching a new one this week?WorldWithoutMadness - Thursday, December 21, 2017 - link
Titan A, next week Titan G, next 2 weeks Titan I, next month Titan N, and lastly Titan ASshowster - Saturday, December 23, 2017 - link
*Slow Clap*Luscious - Wednesday, December 27, 2017 - link
Don't forget the dual Volta Titan ZZmode_13h - Wednesday, December 27, 2017 - link
...in case anyone wondered why they never released Pascal as Titan P. I guess Nvidia knows their market too well for that.mode_13h - Wednesday, December 27, 2017 - link
Well, they'll presumably release the GV102 in some more affordable flavor of Titan. Titan V0?And maybe they'll eventually release a GV100 in a PCIe card with all four HBM2 stacks... although that'll probably receive Quadro branding & price point, like the Quadro GP100.
mode_13h - Tuesday, January 30, 2018 - link
So, after Titan Xp and Titan V(ista), next should be Titan 7. That's a definite buy, but I'd skip Titan 8. Maybe go for Titan 8.1, or just hold out for Titan 10.CajunArson - Wednesday, December 20, 2017 - link
OK, while I agree that nobody should buy a Titan V for gaming you say: "Already after Battlefield 1 (DX11) and Ashes (DX12), we can see that Titan V is not a monster gaming card, though it still is faster than Titan Xp. "Uh... yeah that's wrong. Anything that's faster than a Titan XP *is* a monster gaming card by definition. It's just not a very good purchase for $3000 since it's not really targeted towards gaming.
nathanddrews - Wednesday, December 20, 2017 - link
I agree with this. Cost aside - it's the best gaming card on the market. Unless NVIDIA launches the 1180/2080/whatever80 soon, this card will be in Terry Crews' Old Spice rig by the end of the month.P-P-P-POWWWWERRRRRRRRRR!
Ryan Smith - Wednesday, December 20, 2017 - link
"Uh... yeah that's wrong. Anything that's faster than a Titan XP *is* a monster gaming card by definition."The big issue right now is that it's not consistently faster, especially at the 99th percentile Or for that matter, not as bug-free as it needs to be.
I'm going to be surprised if it doesn't get better with later drivers. But for the moment, even if you throw out the price, it's kind of janky in games.
mode_13h - Wednesday, December 27, 2017 - link
How much of that could be due to the disabled HBM2 stack? Since the GPU was designed to have 4 fully-functional stacks, perhaps there are some bottlenecks when running it with only 3.Golgatha777 - Sunday, December 24, 2017 - link
It's a monster card and the best by definition. It's also a terrible value proposition.FreckledTrout - Wednesday, December 20, 2017 - link
Nice sense of humor with the "But Can It Run Crysis?" tab. First card that can pull 60fps on Crysis at 4k.Pork@III - Wednesday, December 20, 2017 - link
"First card that can pull 60fps on Crysis at 4k"Such a video card is not possible to exist. :)
tipoo - Wednesday, December 20, 2017 - link
Does that mean we can now kill that meme in the comments?Ian Cutress - Wednesday, December 20, 2017 - link
Back when it was released, it was all about running 1080p Max.Today it's running 4K Max.
Next challenge is 8K Max settings, given the Dell UP3218K exists.
jabbadap - Wednesday, December 20, 2017 - link
Funny 2160p with 4x SSAA is actually rendering at 7680x4320(aka not so really "8K") and scaling it back to 2160p...Ryan Smith - Wednesday, December 20, 2017 - link
Of course we'll need to run it with 4x SSAA at 8K, to ensure there are no jaggies...HollyDOL - Wednesday, December 20, 2017 - link
I'd need a microscope to tell a difference at that resolution.jbo5112 - Thursday, December 21, 2017 - link
Yes, but what about those of us who want to run Crysis on 3 of the 8k Dell UP3218K monitors?WB312 - Friday, December 22, 2017 - link
I am from the future, we still can't run Crysis.mode_13h - Wednesday, December 27, 2017 - link
HDMI 2.1 already supports 10k @ 120 Hz, so the bar moves ever higher.Yojimbo - Wednesday, December 20, 2017 - link
Small error: "Rather both Tesla P100 and Titan V ship with 80 SMs enabled, making for a total of 5120 FP32 CUDA cores and 672 tensor cores."80 x 8 = 640. The P100 and Titan V each have 640 tensor cores. 672 is what a full GV100 has.
jabbadap - Wednesday, December 20, 2017 - link
Yeah and that would be Tesla V100, P100 is pascal.Yojimbo - Wednesday, December 20, 2017 - link
Yeah true. I didn't notice that.rocky12345 - Wednesday, December 20, 2017 - link
Great preview of the new Titan V card thank you. I really got a kick out of the "But Can It Run Crysis?" part and you actually gave numbers to back up that question which of coarse it can pretty well.007ELmO - Wednesday, December 20, 2017 - link
of course...it's only a $3000 carddjayjp - Wednesday, December 20, 2017 - link
Crysis at 8K equivalent! 4k = 8 million pixels x 4 super sampled becomes 32 MP. Tho that means there'd be no AA if run on an 8k monitor..and what about 99th percentile? ;)nedjinski - Wednesday, December 20, 2017 - link
what will nvidia do about that nagging crypto mining problem?lazarpandar - Wednesday, December 20, 2017 - link
Make it cost 3kMachine learning is more valuable than crypto
tipoo - Wednesday, December 20, 2017 - link
This would be a pretty bad choice for mining.The tensor cores don't work on any current mining algorithm. The CUDA cores have a small uplift. Two 1080s would be much faster miners.
lazarpandar - Wednesday, December 20, 2017 - link
Really disappointed that there isn’t a tensorflow performance tab since there is literally a physical portion of the gpu dedicated to tensor cores.Ryan Smith - Wednesday, December 20, 2017 - link
The good news is that we're working on that for the full review. The deep learning frameworks are a lot harder to test, and we were running out of time ahead of the holiday break, so it had to get pulled. It's definitely looking interesting though.SharpEars - Wednesday, December 20, 2017 - link
I applaud the increase in double-precision, but 12 GB of VRAM, seriously? For a $3k card?DanNeely - Wednesday, December 20, 2017 - link
Blame problems with either the memory controller and/or the interposer connecting the HBM2. NVidia kept the core count the same vs Tesla, but dropped one of the 4 ram stacks; so we know that something related to that was the biggest failure point in turning dies into even more expensive Tesla cards.Titan V refresh might get the 4th stack for 16GB in a year or so if yields improve enough to justify it; otherwise the question for more ram becomes if/when 8GB stacks of HBM2 are available in sufficient quantity.
extide - Saturday, December 23, 2017 - link
Vega FE and 16GB Mac Pro versions use the 8GB stacks, so they are available to some degree..mode_13h - Wednesday, December 27, 2017 - link
Nvidia's market segmentation tactics won't allow for 16 GB of HBM2 at such a bargain price. You might get it in the form of a Quadro GV100, however, for >= $2x of Titan V.beisat - Wednesday, December 20, 2017 - link
I fully expect the rumours to be true and Volta to be skipped for gaming by NV - already we heard of a replacement (Ampere) and this points that way too. Just a feeling though...Qwertilot - Wednesday, December 20, 2017 - link
Well, there’s no way they’d ever include a lot of the compute features in gaming cards so the two were always going to be pretty different.What they end up calling everything is a bit of a moot point :)
crysis3? - Wednesday, December 20, 2017 - link
so why'd you run the benchmark on the original crysis no one benchmarks? I assume the titan v cannot get 60fps maxed out on crysis 3 then.Ryan Smith - Wednesday, December 20, 2017 - link
The "But can it run Crysis" joke started with the original Crysis in 2007. So it was only appropriate that we use it for that test. Especially since it let us do something silly like running 4x supersample anti-aliasing.crysis3? - Wednesday, December 20, 2017 - link
ahSirPerro - Wednesday, December 20, 2017 - link
They make it pretty clear everywhere this card is meant for ML training.It's the only scenario where it makes sense financially.
Gaming is a big NO at 3K dollars per card. Mining is a big NO with all the cheaper specific chips for the task.
On ML it may mean halving or cutting by 4 the training time on a workstation, and if you have it running 24/7 for hyperparameter tuning it pays itself compared to the accumulated costs of Amazon or Google cloud machines.
An SLI of titans and you train huge models under a day in a local machine. That's a great thing to have.
mode_13h - Wednesday, December 27, 2017 - link
The FP64 performance indicates it's also aimed at HPC. One has to wonder how much better it could be at each, if it didn't also have to do the other.And for multi-GPU, you really want NVlink - not SLI.
takeshi7 - Wednesday, December 20, 2017 - link
Will game developers be able to use these tensor cores to make the AI in their games smarter? That would be cool if AI shifted from the CPU to the GPU.DanNeely - Wednesday, December 20, 2017 - link
First and formost, that depends if mainstream Volta cards get tensor cores.Beyond that I'm not sure how much it'd help there directly, AFAIK what Google/etc are doing with machine learning and neural networks is very different from typical game AI.
tipoo - Wednesday, December 20, 2017 - link
They're more for training the neural nets than actually executing a games AI routine.hahmed330 - Wednesday, December 20, 2017 - link
Finally a card that can properly nail Crysis!crysis3? - Wednesday, December 20, 2017 - link
closer to 55fps if it were crysis 3 maxed outcrysis3? - Wednesday, December 20, 2017 - link
because he benchmarked the first crysispraktik - Wednesday, December 20, 2017 - link
Actually probably both XP and V could run 4k Crysis pretty well - do we need 4xssaa @ 4k??Ryan Smith - Wednesday, December 20, 2017 - link
"do we need 4xssaa"If it were up to me, the answer to that would always be yes. Jaggies suck.
tipoo - Wednesday, December 20, 2017 - link
Do they plan on exposing fast FP16 in software? When consumer Volta launches maybe?Ryan Smith - Wednesday, December 20, 2017 - link
Nothing has been announced at this time.Keldor314 - Wednesday, December 20, 2017 - link
The part of the article about Volta no longer having a superscalar architecture is incorrect. Although there is only one warp scheduler per SM partition (what do you call those things anyway?), each clock cycles only serves half a warp, so it takes two clock cycles for an instruction to feed into one of the execution pipelines, but during the second cycle, the warp schedular is free is issue a second instruction to one of the other pipelines. IIRC, Fermi did this too.mode_13h - Wednesday, December 27, 2017 - link
Also, the part about per-thread PC and Stack is misleading. Warps are still executing (or not executing) from a single instruction sequence. The threads within a warp are not concurrently executing different instructions, nor are threads being dynamically shuffled between different warps - at least, not at a hardware level.MrSpadge - Wednesday, December 20, 2017 - link
> Sure, compute is useful. But be honest: you came here for the 4K gaming benchmarks, right?Actually, no: I came for compute, power and voltage.
jabbadap - Wednesday, December 20, 2017 - link
Interesting, so it have full floating point compute capabilities 1*fp64 -> 2*fp32 -> 4*fp16 + Tensor cores. But that half precision is only for CUDA? So no direct3d 12 minimum floating point precision.Native7i - Wednesday, December 20, 2017 - link
So it looks like V series focused on machine learning and development.Maybe rumors are correct about Ampere replacing Pascal...
extide - Saturday, December 23, 2017 - link
Maybe, I mean GP100 was very different than GP102 on down, so they could do the same thing..maroon1 - Wednesday, December 20, 2017 - link
Correct if I'm wrong, Crysis warhead running 4K with 4xSSAA means it is running 8K (4 times as much as 4K) and then downscale to 4KRyan Smith - Wednesday, December 20, 2017 - link
Yes and no. Under the hood it's actually using a rotated grid, so it's a little more complex than just rendering it at a higher resolution.The resource requirements are very close to 8K rendering, but it avoids some of the quality drawbacks of scaling down an actual 8K image.
Frenetic Pony - Wednesday, December 20, 2017 - link
A hell of a lot of "It works great but only if you buy and program exclusively for Nvidia!" stuff here. Reminds me of Sony's penchant for exclusive lock in stuff over a decade ago when they were dominant. Didn't work out for Sony then, and this is worse for customers as they'll need to spend money on both dev and hardware.I'm sure some will be shortsighted enough to do so. But with Google straight up outbuying Nvidia for AI researchers (reportedly up to, or over, 10 million for just a 3 year contract) it's not a long term bet I'd make.
tuxRoller - Thursday, December 21, 2017 - link
I assumed you've not heard of CUDA before?NVIDIA had long been the only game in town when it comes to gpgpu HPC.
They're really a monopoly at this point, and researchers have no interest in making they're jobs harder by moving to a new ecosystem.
mode_13h - Wednesday, December 27, 2017 - link
OpenCL is out there, and AMD has had some products that were more than competitive with Nvidia, in the past. I think Nvidia won HPC dominance by bribing lots of researchers with free/cheap hardware and funding CUDA support in popular software packages. It's only with Pascal that their hardware really surpassed AMD's.tuxRoller - Sunday, December 31, 2017 - link
Ocl exists but cuda has MUCH higher mindshare. It's the de facto hpc framework used and taught in schools.mode_13h - Sunday, December 31, 2017 - link
True that Cuda seems to dominate HPC. I think Nvidia did a good job of cultivating the market for it.The trick for them now is that most deep learning users use frameworks which aren't tied to any Nvidia-specific APIs. I know they're pushing TensorRT, but it's certainly not dominant in the way Cuda dominates HPC.
tuxRoller - Monday, January 1, 2018 - link
The problem is that even the gpu accelerated nn frameworks are still largely built first using cuda. torch, caffe and tensorflow offer varying levels of ocl support (generally between some and none).Why is this still a problem? Well, where are the ocl 2.1+ drivers? Even 2.0 is super patchy (mainly due to nvidia not officially supporting anything beyond 1.2). Add to this their most recent announcements about merging ocp into vulkan and you have yourself an explanation for why cuda continues to dominate.
My hope is that khronos announce vulkan 2.0, with ocl being subsumed, very soon. Doing that means vendors only have to maintain a single driver (with everything consuming spirv) and nvidia would, basically, be forced to offer opencl-next. Bottom-line: if they can bring the ocl functionality into vulkan without massively increasing the driver complexity, I'd expect far more interest from the community.
mode_13h - Friday, January 5, 2018 - link
Your mistake is focusing on OpenCL support as a proxy for AMD support. Their solution was actually developing OpenMI as a substitute for Nvidia's cuDNN. They have forks of all the popular frameworks to support it - hopefully they'll get merged in, once ROCm support exists in the mainline Linux kernel.Of course, until AMD can answer the V100 on at least power-effeciency grounds, they're going to remain an also-ran, in the market for training. I think they're a bit more competitive for inferencing workloads, however.
CiccioB - Thursday, December 21, 2017 - link
What are you suggesting?GPU are a very customized piece of silicon and you have to code for them with optimization for each single architecture if you want to exploit them at the maximum.
If you think that people buy $10.000 cards to be put in $100.000 racks for a multiple $1.000.000 server just to use open source not optimized not supported not guarantee code in order to make AMD fanboys happy, well, not, it's not like the industry works.
Grow up.
mode_13h - Wednesday, December 27, 2017 - link
I don't know if you've heard of OpenCL, but there's not reason why a GPU needs to be programmed in a proprietary language.It's true that OpenCL has some minor issues with performance portability, but the main problem is Nvidia's stubborn refusal to support anything past version 1.2.
Anyway, lots of businesses know about vendor lock-in and would rather avoid it, so it sounds like you have some growing up to do if you don't understand that.
CiccioB - Monday, January 1, 2018 - link
Grow up.I repeat. None is wasting millions in using not certified, supported libraries. Let's avoid talking about entire frameworks.
If you think that researches with budgets of millions are nerds working in a garage with avoiding lock-in strategies as their first thought in the morning, well, grow up kid.
Nvidia provides the resources to allow them to exploit their expensive HW at the most of its potential reducing time and other associated costs. Also when upgrading the HW with a better one. That's what counts when investing millions for a job.
For you kid's home made AI joke, you can use whatever alpha library with zero support and certification. Others have already grown up.
mode_13h - Friday, January 5, 2018 - link
No kid here. I've shipped deep-learning based products to paying customers for a major corporation.I've no doubt you're some sort of Nvidia shill. Employee? Maybe you bought a bunch of their stock? Certainly sounds like you've drunk their kool aid.
Your line of reasoning reminds me of how people used to say businesses would never adopt Linux. Now, it overwhelmingly dominates cloud, embedded, and underpins the Android OS running on most of the world's handsets. Not to mention it's what most "researchers with budgets of millions" use.
tuxRoller - Wednesday, December 20, 2017 - link
"The integer units have now graduated their own set of dedicates cores within the GPU design, meaning that they can be used alongside the FP32 cores much more freely."Yay! Nvidia caught up to gcn 1.0!
Seriously, this goes to show how good the gcn arch was. It was probably too ambitious for its time as those old gpus have aged really well it took a long time for games to catch up.
CiccioB - Thursday, December 21, 2017 - link
<blockquote>Nvidia caught up to gcn 1.0!</blockquote>Yeah! It is known to the entire universe that it is nvidia that trails AMD performances.
Luckly they managed to get this Volta out in time before the bankruptcy.
tuxRoller - Wednesday, December 27, 2017 - link
I'm speaking about architecture not performance.CiccioB - Monday, January 1, 2018 - link
New bigger costier architectures with lower performance = failtuxRoller - Monday, January 1, 2018 - link
Ah, troll.CiccioB - Wednesday, December 20, 2017 - link
Useless cardVega = #poorvolta
StrangerGuy - Thursday, December 21, 2017 - link
AMD can pay me half their marketing budget and I will still do better than them...by doing exactly nothing. Their marketing is worse than being in a state of non-existence.mode_13h - Wednesday, December 27, 2017 - link
It's true. All they had to do was pay some grad students to optimize HPC and deep learning software for their GPUs. They could've done that for the price of only a couple marketing persons' salaries.CiccioB - Monday, January 1, 2018 - link
That would not be a surprise.AMD strategy on SW support has always been leaving others (usually not professionist) do the job at their own cost. Results is that AMD HW has never had a decent SW support other than for gaming (and that's only because Sony and MS spend money for improving gaming performances for their consoles).
tipoo - Friday, December 22, 2017 - link
Sarcasm? There's no Vega built up to this scale.mode_13h - Wednesday, December 27, 2017 - link
It *is* pretty big and burns about as much power. Yet, it's nowhere near as fast at deep learning. Even with its lower purchase price, it's still not operationally cost-competitive with GV100.If you look at its feature set, it was really aimed at HPC and deep learning. In the face of Volta's tensor cores, it kinda fell flat, on the latter front.
Keermalec - Wednesday, December 20, 2017 - link
What about mining benchmarks?tipoo - Friday, December 22, 2017 - link
Would be in line with the CUDA improvements. I.e, two 1080s would be much better at mining. Most of the uplift is in tensor performance, which no algo uses.Cryio - Wednesday, December 20, 2017 - link
Wait wait wait.Crysis Warhead at 4K, Very High with 4 times Supersampling? I think you mean Multisampling.
I don't think this could manage 4K60 at max settings with 4xSSAA, lol.
Ryan Smith - Thursday, December 21, 2017 - link
"I think you mean Multisampling."Nope, supersampling.=)
mode_13h - Wednesday, December 27, 2017 - link
Tile rendering FTMFW.Kevin G - Wednesday, December 20, 2017 - link
"For our full review hopefully we can track down a Quadro GP100"YES. The oddity here is that the GP100 might end up being better than the Titan V at gaming due to having 128 ROPs vs. 96 ROPs and even higher memory bandwidth.
Outside of half precision matrix multiplication, the GP100 should be roughly ~43% faster due mainly to the difference in ALU counts in professional workloads. Boost clocks are a meager 25 Mhz difference. Major deviations beyond that 43% difference would be where the architectures differ. There is a chance benchmarks would come in below that 43% mark if memory bandwidth comes into play.
CiccioB - Thursday, December 21, 2017 - link
Absolute boost frequency is meaningless as we have already seen that those values are not respected anywhere with the use of boost 3.0. You can be higher or lower.What is important is the power draw and the temperatures. These are the limiting factors to reach and sustain the boost frequency and beyond.
With a 800+mm^2 die and 21 billions of transistor you may expect that the consumption is not that low as for a 14 billion die, and the frequencies cannot be sustained that much.
What is promising is that if these are the power drain of such a monster chip, the consumer grade chips made on the same PP will really be fresh and that the frequencies can be pushed really high to suck all the thermal and power drain gap.
Just imagine a Volta/Ampere GP104 consuming 300+W, same as the new Vega GPU based custom cards.
#poorVega
croc - Wednesday, December 20, 2017 - link
I can't take a titan-as-compute seriously if its double precision is disabled That, to me, makes it only aimed at graphics Yet this whole article is expressing the whole titan family as 'compute machines'...MrSpadge - Thursday, December 21, 2017 - link
It has all compute capability enabled: FP16, FP32, FP64 and the tensor cores. The double speed FP16 is just not (yet?) exposed to graphics applications.CiccioB - Thursday, December 21, 2017 - link
In fact this one has 1/2 FP64 computing capacity with respect to FP32.At least read the first chapter of the review before commenting.
mode_13h - Wednesday, December 27, 2017 - link
The original Titan/Black/Z and this Titan V have uncrippled fp64. It's only the middle two generations - Titan X and Titan Xp - that use consumer GPUs with a fraction of the fp64 units.Zoolook13 - Thursday, December 21, 2017 - link
The figure for tensor operations seems fishy, it's not based on tf.complex128 I guess more probably tf.uint8 or tf.int8 and then it's no longer FLOPS, maybe TOPS?I hope you take a look at that when you flesh out the tensorflow part.
If they can do 110 TFLOPS of tf.float16, then it's very impressive but I doubt that.
Ryan Smith - Thursday, December 21, 2017 - link
It's float 16. Specifically, CUDA_R_16F.http://docs.nvidia.com/cuda/cublas/index.html#cubl...
CheapSushi - Thursday, December 21, 2017 - link
Would be amazing if tensor core support was incorporated into game AI and also OS AI assistants, like Cortana.edzieba - Thursday, December 21, 2017 - link
"Each SM is, in turn, contains 64 FP32 CUDA cores, 64 INT32 CUDA cores, 32 FP64 CUDA cores, 8 tensor cores, and a significant quantity of cache at various levels."IIRC, the '64 FP32 cores' and '32 FP64 cores' are one and the same: the FP64 cores can operate as a pair of FP32 cores (same as GP100 can do two FP16 operations with the FP32 cores).
Ryan Smith - Thursday, December 21, 2017 - link
According to NVIDIA, the FP64 CUDA cores are distinct silicon. They are not the FP32 cores.Notmyusualid - Friday, December 22, 2017 - link
Eth, simple O/C 82MH/s.I bow before thee...
Dugom - Saturday, December 23, 2017 - link
Will you test the 388.71 ?The 388.59 doesn't support officialy the TITAN V...
Nate Oh - Saturday, December 23, 2017 - link
Yes, it does. On page 7 of 388.59 Release Notes: "New Product Support: Added support for the NVIDIA TITAN V" [1].[1] https://us.download.nvidia.com/Windows/388.59/388....
karthik.hegde - Sunday, December 24, 2017 - link
Why no one is talking about the Actual FLOPS/Peak FLOPS ? Clearly, achieving a constant 110TFLOPs that Titan has at disposal is simply not possible. What's the consistent FLOPS it can achieve before Memory Bandwidth becomes a bottleneck? When 12GB of VRAM isn't enough to hold all your data (Neural net training), then you're doing as good as previous gens.mode_13h - Wednesday, December 27, 2017 - link
That's why you use batching, sampling, and ultimately pay the big bucks for their Tesla hardware.Shaklee3 - Wednesday, December 27, 2017 - link
To the authors: what matrix size and what sample application did you use to hit 100TFLOPS on the tensor benchmark?mode_13h - Thursday, December 28, 2017 - link
You might have better luck getting a response either on Twitter or perhaps this thread:https://forum.beyond3d.com/threads/nvidia-volta-sp...
In fact, the first post on that page seems to answer your question.
linksys - Saturday, January 6, 2018 - link
nice post it is.<a href="https://www.interspire.com/forum/member.php?u=5179... Router Customer Service</a>