AMD Dives Deep On Asynchronous Shading

by Ryan Smith on 3/31/2015 4:40 AM EST
Comments Locked

72 Comments

Back to Article

  • MobiusPizza - Tuesday, March 31, 2015 - link

    So, what's Nvidia's API answer on this? I wish AMD and Nvidia can work on some common API for once for the sake of humanity.
  • Byte - Tuesday, March 31, 2015 - link

    We really need AMD to pull some wins less we lose them forever!!! (Switched to all Radeon for mining, though wish i had nv cards for gaming)
  • dragonsqrrl - Tuesday, March 31, 2015 - link

    "Switched to all Radeon for mining"

    Jeez I feel sorry for you, hopefully you didn't invest heavily into bitcoin mining.
  • bill5 - Tuesday, March 31, 2015 - link

    didn't most people pay for their rigs with mining at least back in those days? so yeah, no investment, just free hardware
  • dragonsqrrl - Wednesday, April 1, 2015 - link

    lol... pay for their rigs? No, although some who got into it early where able to recover the cost of their graphics card(s) if they mined continuously for several months. Unfortunately with the introduction of mining ASICs that didn't last long, and the point I was trying to make was that a lot of people jumped into the bitcoin rush too late and got burned.
  • FlushedBubblyJock - Monday, April 6, 2015 - link

    I can't believe AMD who swore they would never create proprietary tech and software has done exactly that above and released their "our gpu's only" software, $quid$vr.
    Greed and bad practice locking out nVidia users, shame on them.

    I
  • Ellrick - Monday, August 31, 2015 - link

    Tyrone? Am that you? Come home to oa (url is hippo-age dot com)
  • Ellrick - Monday, August 31, 2015 - link

    i mean hippo-ages dot com

    pls
  • Spoelie - Tuesday, March 31, 2015 - link

    DX12, Vulkan
  • blanarahul - Tuesday, March 31, 2015 - link

    As time is passing by I find myself wishing more and more that game developers abandon DX12 for Vulkan. Drivers shouldn't be much of a problem for Vulcan because it's a low level API. Also since it has a more modern coding process compared to OpenGL it is more accessible to coders. Gaming is the only reason I haven't switched to Linux already. Vulkan could help me make the switch.
  • r3loaded - Tuesday, March 31, 2015 - link

    Gaming on Linux still isn't happening until we get decent drivers on-par with or better than Windows.
  • Michael Bay - Tuesday, March 31, 2015 - link

    Gaming on linux is never happening.
    Who needs additional work for meagre gains?
  • Marc GP - Tuesday, March 31, 2015 - link

    For never happening it's weird that Steam for Linux already has more than a thousand games.

    http://www.extremetech.com/gaming/201055-potent-pe...
  • niva - Tuesday, March 31, 2015 - link

    Yeah, but these guys are hardcore gamers.
  • JonnyDough - Thursday, April 2, 2015 - link

    Hardly, most of those games are Indie games and suited for the casual gamer.
  • shane007 - Sunday, April 5, 2015 - link

    900+ of those are absolute crap that look worse than Atari 2600 days!
  • Flunk - Sunday, April 5, 2015 - link

    That's a worthless metric. What percentage of Steam users are on Linux. That's what really matters.

    Oh wait, we have that:

    Windows 95.43%
    Mac OS 3.43%
    Linux 1.06%

    Source (March 2015):
    http://store.steampowered.com/hwsurvey/
  • medi03 - Monday, April 6, 2015 - link

    Valve.
  • Kyururin - Wednesday, April 8, 2015 - link

    AMD can you close shop? I am sick of Fanboys bashing each other. If you do close shop I can see the happy faces of Nvidia Fanboys cheering when they buy a $10000 Geforce 205 or if they are willing, add another $5000 for an over the top extreme edition Geforce 205 with a 50Mhz insane overclock over the stock model, a ginormous 32Mb RAM extra, and 1mm increase in the fan's blade diameter for ultra cooling during harsh gaming.
  • bug77 - Tuesday, March 31, 2015 - link

    You do know that nvidia uses the same driver for Linux and Windows. And Solaris and BSD.
  • FlushedBubblyJock - Wednesday, April 1, 2015 - link

    That one knows nothing but the amd fan pangs. So no.
  • Flunk - Sunday, April 5, 2015 - link

    Yeah, because the Windows and Linux kernels are 100% binary compatible. Yeah...
  • Ryan Smith - Tuesday, March 31, 2015 - link

    VR Direct for VR: http://www.anandtech.com/show/8526/nvidia-geforce-...

    And for general graphics, DX12/Vulkan
  • LiviuTM - Tuesday, March 31, 2015 - link

    1. Khronos group, in charge of development of OpenGL, Vulkan and OpenCL, among other standards, is presided by Neil Trevett from Nvidia. Both AMD and Nvidia are key members (or Promoters, as they are called) of the consortium.
    2. As mentioned in the article, DirectX 12 will be able to use asynchronous shading, which means all AMD GCN and Nvidia Maxwell 2 cards will benefit. I would be really surprised if Microsoft does not involve both AMD and Nvidia in the process of building DirectX 12 (and Intel too). After all, AMD, Intel and Nvidia are the only GPU manufacturers for PC space.

    Cheers
  • WaltC - Tuesday, March 31, 2015 - link

    It's competition that pushes technology forward, not "cooperation". And, the "common API" is D3d/DX...that Microsoft develops (Nobody really "owns" OpenGL, so it's always far behind DX, imo, and continuously playing catch-up.) Mantle is AMD's custom API, but if DX12 is any indication, AMD won't need Mantle for much of anything going forward, but we'll see. In a real sense, Mantle is AMD's way of keeping a fire lit under Microsoft's DX development posterior...;)
  • extide - Tuesday, March 31, 2015 - link

    Kronos 'owns' OpenGL, and it is constantly updated with features to keep it pretty much on parity with DX. Also Mantle will turn into Vulkan which is going to be the "DX12 of OpenGL"
  • FlushedBubblyJock - Monday, April 6, 2015 - link

    I think Angry Birds requires open gl 2.1+ or 3.0 or something like that -- so there's your open gl gaming. "How great!"
    /sarc
  • jospoortvliet - Thursday, April 2, 2015 - link

    Suppose you never heard of this thing called Wikipedia, a collaborative effort pushing competition out of the online encyclopedia business... Or this thing called 'linux', world's largest engineering project. WordPress, perhaps?
  • jospoortvliet - Thursday, April 2, 2015 - link

    I am saying - competition and free market are powerful tools, but don't dismiss the power of collaboration... Humans are as collaborative as they are competitive.
  • JonnyDough - Thursday, April 2, 2015 - link

    The big RAM companies showed us that, eh? I miss paying out the arse for technology.
  • sonicmerlin - Sunday, April 5, 2015 - link

    Well the price of RAM is still higher than it was when I built my computer in 2011.
  • jospoortvliet - Thursday, April 2, 2015 - link

    I am saying - competition and free market are powerful tools, but don't dismiss the power of collaboration... Humans are as collaborative as they are competitive.
  • FlushedBubblyJock - Thursday, April 2, 2015 - link

    It's so, so sickening ... the amd fanbase wailed and moaned forever about open source blah bla blah whine gurgle gurgle opencl open gl open up their piehole and cry and complain and condemn... blah blah blah blah blah blah - for YEARS MAN

    Now they're going to be at it again after their holy god amd the pure and sinless love shack just puked over mantle the proprietary tech they swore they would never do and we heard it for years here the holy and pure good guys amd ...

    OMG I'M PUKING AGAIN. NOW IT'S GOING TO BE CRIES FOR VULKAN, AND THE LIES AND THE SKIES ARE NO DOUBT LIMITLESS, but the end game won't be forthcoming...

    Just like mantle this shader junk means crap for gamers in such a small unrealized niche that PhysX is a hundred multiverse sized larger in comparison... but of course that was worthless to all the amd fanboys when they weren't screaming proprietary tech and hatred for nVida...

    PUKING !

    What about when nVidia came out with the ambient occlusion shader patch for all it's cards and it was shown in games here - all the amd fans said they hated shaders/shading who needs it - it sucks...
    After watching and experiencing endless covered up denied frauds by amd I have nothing but pure suspicion on this new vaporware matter.
  • Esteban4u - Tuesday, March 22, 2016 - link

    Great rant! keep 'em coming!
  • FlushedBubblyJock - Thursday, April 2, 2015 - link

    Why would humanity be saved by zero competition ?
    Why do you people wail and moan for competition, then demand total no winners cooperation ?
    I think the insanity has gone too far.
  • at80eighty - Tuesday, March 31, 2015 - link

    wicked stuff & great read.

    I may have missed it - but will this process benefit all games, or does it fall on game developers to take advantage of async during development?
  • at80eighty - Tuesday, March 31, 2015 - link

    oops, missed the very last paragraph somehow.
  • FlushedBubblyJock - Thursday, April 2, 2015 - link

    it's frikkin vaporware - apparently amd has 29 hidden unused technologies inside it's years old cores and they were the 1st eva' myan !!! to have it integrated like a cheese grater - frikkin parmesano baby !!!
    Problem being it was never frikkin put to use ! We hear this CRAP every time ! The spaghetti don't have no parmesan on it cause amd can't shake it out ! NVIDIA will have to do it first, then we can hear how amd has had it in their cores for frucking decades ! Then nVidia's implementation will work great and amd will lag for at least 2 years... but boy their cores had it first cause they the technology 'venter...

    Remember, they, amd the pure, had frikkin trinagle fractals up the wazoo pre cgn yadda yadda yiddi yeah man AMD AM frikkin D in their hd2900 core.... TESSELATORS MAN !

    then amd got their tessy a later butttt kicckkkkkeedddd for so frikkin long..

    AMD HAS ALL FUTURE TECH UNUSED DEEP IN IT'S CGNEXT CORES !!!! IT'S ALLL THERE !
    I'M RISING ON A DRAGON CLOUD TO RAGE3D HEAVEN
  • zlatan - Tuesday, March 31, 2015 - link

    Ryan if you want to refer to the queues than your table is wrong. The GCN 1.0 supports 1 queue/ACE, and GCN 1.1 and 1.2 supports 16 queues/ACE.
    This is the correct data:
    AMD GCN 1.2 (285) 1 Graphics + 64 Compute / 64 Compute
    AMD GCN 1.1 (290 Series) 1 Graphics + 64 Compute / 64 Compute
    AMD GCN 1.1 (260 Series) 1 Graphics + 64 Compute / 64 Compute
    AMD GCN 1.0 (7000/200 Series) 1 Graphics + 2 Compute / 2 Compute

    If you want to refer to the queue engine than this is the correct data:
    AMD GCN 1.2 (285) 1 Graphics + 8 Compute / 8 Compute
    AMD GCN 1.1 (290 Series) 1 Graphics + 8 Compute / 8 Compute
    AMD GCN 1.1 (260 Series) 1 Graphics + 8 Compute /8 Compute
    AMD GCN 1.0 (7000/200 Series) 1 Graphics + 2 Compute / 2 Compute
    NVIDIA Maxwell 2 (900 Series) 1 Graphics + 1 Compute / 1 Compute
    NVIDIA Maxwell 1 (750 Series) 1 Graphics / 1 Compute
    NVIDIA Kepler GK110 (780/Titan) 1 Graphics / 1 Compute
    NVIDIA Kepler GK10x (600/700 Series) 1 Graphics / 1 Compute
  • zlatan - Tuesday, March 31, 2015 - link

    Sorry. The GCN 1.1 and 1.2 supports 8 queues/ACE.
  • FlushedBubblyJock - Thursday, April 2, 2015 - link

    Who cares it's amd vaporware in any and every case, thief included.
  • Krteq - Tuesday, March 31, 2015 - link

    Is there any chance to shed some more light on NVIDIA Maxwell 2 GPU Queues support? Some architecture diagrams which clarify this. THX in advance
  • Ryan Smith - Wednesday, April 1, 2015 - link

    I don't have anything at this time for Maxwell 2. The last diagram I have from NVIDIA on queues was for GK110's HyperQ feature: http://images.anandtech.com/doci/6446/HyperQ.png
  • Krteq - Wednesday, April 1, 2015 - link

    THX Ryan. Could you please revise a table and correct queues counts for GCN?
  • Ryan Smith - Wednesday, April 1, 2015 - link

    The queue counts are correct. Keep in mind we're counting engines, not queues within an engine.
  • Krteq - Thursday, April 2, 2015 - link

    Nope, It isn't. If you apply this on uarch of nV GPUs, there is only one warp scheduler with up to 32 streams (queues) output.
  • StereoPixel - Thursday, April 2, 2015 - link

    Ryan, special for you - http://abload.de/img/gfxcomputedx122nu5c.jpg
  • Ryan Smith - Thursday, April 2, 2015 - link

    That table is incorrect when it comes to Maxwell 2. It has 32 engines in mixed mode.
  • Krteq - Thursday, April 2, 2015 - link

    Where is your info from? Those are NOT engines, but streams (queues). There is only one compute engine (warp scheduler) in nV GPUs.
  • StereoPixel - Wednesday, April 8, 2015 - link

    No. Maxwell 2 has 1 queue engine (1 command processor) and 32 sim. compute queues.
    http://abload.de/img/gfxcomputedx12pzu7w.jpg
  • StereoPixel - Tuesday, March 31, 2015 - link

    LOL, Ryan
    If you can read it --- http://abload.de/img/56577f9ulj.jpg
    8 ACE = 64 compute queues, because one ACE has 8 compute queues.
    Your table is incorrect.
  • StereoPixel - Tuesday, March 31, 2015 - link

    Oh, Ryan
    Special for you -- http://abload.de/img/78767566nug1.jpg (from AMD's PDF)
    9 devices = 64+ queues (!!!)
  • Krteq - Tuesday, March 31, 2015 - link

    Yes, 9 "devices" = Graphics Command Processor (sheduler) + ACEs
  • StereoPixel - Tuesday, March 31, 2015 - link

    AMD GCN 1.2 (285) 1 Graphics + 8 ACEs = 64+ Compute (64+ queues)
    AMD GCN 1.1 (290 Series) 1 Graphics + 8 ACEs = 64+ Compute (64+ queues)
    AMD GCN 1.1 (260 Series) 1 Graphics + 2 ACEs = 16+ Compute (16+ queues)
    AMD GCN 1.0+ (Kabini) 1 Graphics + 4 ACEs = 8+ Compute (8+ queues)
    AMD GCN 1.0 (7000/200 Series) 1 Graphics + 2 ACEs = 4 Compute+ (4+ queues)*
    NVIDIA Maxwell 2 (900 Series) 1 Graphics + 1 Compute = 32 Compute (32 queues)
    NVIDIA Maxwell 1 (750 Series) 1 Graphics = 1 Compute (32 queues)
    NVIDIA Kepler GK110 (780/Titan) 1 Graphics = 1 Compute (32 queues)

    *S.I. GCN support 2 hardware compute queues per ACE according to AMD's PDF OpenCL Programming Guide said (see p. 1-13)
  • StereoPixel - Tuesday, March 31, 2015 - link

    Fixed. If you want to refer to device/compute queues: http://abload.de/img/78767566nug1.jpg

    AMD GCN 1.2 (285) 1 Graphics + 8 ACEs = 9 devices = 64+ Compute (64+ queues)
    AMD GCN 1.1 (290 Series) 1 Graphics + 8 ACEs = 9 devices = 64+ Compute (64+ queues)
    AMD GCN 1.1 (260 Series) 1 Graphics + 2 ACEs = 3 device = 16+ Compute (16+ queues)
    AMD GCN 1.0+ (Kabini) 1 Graphics + 4 ACEs = 5 device = 8+ Compute (8+ queues)
    AMD GCN 1.0 (7000/200 Series) 1 Graphics + 2 ACEs = 3 device = 4 Compute+ (4+ queues)*
    NVIDIA Maxwell 2 (900 Series) 1 Graphics + 1 Compute = 1 device = 32 Compute (32 queues)
    NVIDIA Maxwell 1 (750 Series) 1 Graphics = 1 device = 1 Compute (32 queues)
    NVIDIA Kepler GK110 (780/Titan) 1 Graphics = 1 device = 1 Compute (32 queues)
  • creed3020 - Tuesday, March 31, 2015 - link

    Thanks for this interesting post. I've always been interested in synchronous vs. asynchronous spatial information processing from the perspective of what I do for a living. Seeing how this same principle applies to my graphics card is also very interesting as it shows what tremendous gain is available with just some simple theory applied in highly complex practical ways.
  • siliconwars - Tuesday, March 31, 2015 - link

    Nvidia are so full of shit. They've got nothing to compare to ACE's in hardware, that's why their VR is in such bad shape. They've been talking up VR Direct for months and now finally tell of 40-50% VR SLI improvement "if lucky". Valve and Oculus are already getting a near doubling on AMD hardware.
  • junky77 - Tuesday, March 31, 2015 - link

    it always amazes me how the multi-thread talk which is at least a decade old for the end developer is only now popping up for DX..
    BTW, how's OpenGL pipeline in contrast?
  • MrSpadge - Tuesday, March 31, 2015 - link

    Yeah, when reading this DX11 vs. DX12 stuff it's (sometimes) like: "they did WHAT?!"
    I always wondered why it seemed so hard for games to use more than 1 CPU core. They've so much independnet stuff going on per frame. At least now I start to understand and at the same time the tools available to developers are getting better. Good news!
  • marvee - Wednesday, April 1, 2015 - link

    Why isn't he fixing his table?
  • Ryan Smith - Wednesday, April 1, 2015 - link

    The queue counts are correct. Keep in mind we're counting engines, not queues within an engine.
  • fritz1969 - Thursday, April 2, 2015 - link

    Nvidia has CUDA Multi-Process Service (MPS) on top of Hyper-Q https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Proc... (see page 11)
    http://docs.nvidia.com/cuda/samples/6_Advanced/sim... Hyper-Q allows 32 simultaneous hardware-managed connections.
    AMD has 8 ACEs having each 8 compute queues, instead of 64 ACEs having each 1 compute queue. So the question is: Is this grouping of 8 compute queues on the hardware side the better way to do Asynchronous Shading (or any asynchronous parallel job)? A typical Gaming-PC has not 32 CPUs but 4 or 8 and the best way to optimal use 32 GPU-Jobs in parallel is to have on the CPU-side 32 cores to give them asynchronous something to do. Maybe Nvidia needs to do the same grouping in 4 or 8 ACEs on the software/driver side. Nowadays only AMD is eager to show how well cards are working on DX12. So, is Nvidia slower with DX12 drivers or are Nvidia cards inferior on DX12?
  • Alexvrb - Friday, April 3, 2015 - link

    I don't know how to say this without sounding like a jerk so I apologize in advance. That's quite possibly the dumbest thing I can ever recall you saying, and I've generally liked your articles. "We're counting engines, not queues" WHY? Do you also plan to rank GPUs by the number of shader cores, even across completely different architectures? I'd like a heads up if this is the case.

    "Meanwhile Maxwell 2 has 32 queues" <- This means 32 queues, yes? Chart reflects this.
    AMD 8 ACE x 8 queues = 64 queues <- This means 64 queues, yes? Chart fails to reflect this.
  • StereoPixel - Thursday, April 2, 2015 - link

    http://abload.de/img/gfxcomputedx122nu5c.jpg
    It is correct table from developer
  • Alexvrb - Friday, April 3, 2015 - link

    Ryan, there was already an article on your sister site THG that detailed a lot of this. But what they nailed that you missed was the capabilities of each ACE. Each ACE can handle up to 8 queues. So when re-evaluating your "GPU Queue Engine Support" chart I think you'll find that GCN chips with 8 ACEs can handle 64 queues - quite a lot even compared to the latest Maxwell. Even early GCN designs can handle 16, which really is plenty for those chips.
  • FlushedBubblyJock - Monday, April 6, 2015 - link

    "Of course to really measure this we will need games that can use async shaders and VR hardware – both of which are in short supply right now – but the potential benefits are clear."

    Yes, the potential benefits are clear, as clear as Bulldozers massively multi-threaded super dooper cores, if only those jerks writing code would do it correctly, and Microsoft had better make a patch for windows performance !
    YES IT'S VERY CLEAR.

    " And if AMD has their way, both VR and regular developers will be taking much greater advantage of the capabilities of asynchronous shading."

    YES, AMD will surely "get their way"... like they did with the above mega threaded everything for bullsnooozer.
  • Shahnewaz - Thursday, April 9, 2015 - link

    Anyone spot the misinformation?
    The Asynchronous Compute Engine says upto 8 ACEs per GPU; then it says each ACE can manage upto 8 queues.
    That means a total of upto 8*8 = 64 queues!
    Hello AnandTech?
  • Alexvrb - Thursday, April 9, 2015 - link

    Yeah everyone spotted it but Ryan is sticking to the numbers stubbornly. He claims he's counting queue engines, not queues, which is silly - even IF Nvidia's implementation actually went from 1 engine to 32 independent engines. The actual number of queues is the only thing that matters, not the method by which the GPU achieves this.
  • albert89 - Friday, April 10, 2015 - link

    Can anyone help to answer this question ?
    From the 'GPU engine support table' if you have a Nvidia Kepler GK208 then how many 'pure compute' do you have ?
  • albert89 - Saturday, April 11, 2015 - link

    It appears that Nvidia uses algorithms to generate an equivalent compute performance, while AMD uses a sort of hyper threading or pathways as a way to give an equivalent performance. Can anyone correct or enlighten me on this issue ?
  • akamateau - Saturday, May 2, 2015 - link

    @Ryan Smith

    Here's the question of the year.

    Run the 3dMark API Overhead test using an Intel CPU driving a Radeon 3xx dGPU how can Asynchronous Shading benefit factor in the performance?

    As a consumer about to spend $1000 on a dGPU don't I deserve to know how an Intel CPU or an AMD cpu will impact the perfromance of my rather large investment?

    Isn't that the reason that you write benchmark review articles? To educate the consumer so they make educated choices?

    Doesn't deliberately ignoring AMD CPU's when benching AMD dGPU's cast a stain on your journalistic integrity?
  • Mahigan - Friday, August 21, 2015 - link

    A few errors worth point out in your graphs.

    1. AMD GCN 1.1 (290 Series) and GCN 1.2 have 8 Asynchronous Compute Engines each capable of queuing 8 tasks. That's a queue of 64. 8 ACE and 64 Queues.

    2. Kepler and Maxwell have a single ACE like unit called the Grid Management Unit. It is this single unit which can queue 1 Graphics and 31 Compute or 32 Compute.

    AMDs solution contains 8 Pipelines, out of order
    nVIDIAs solution contains 1 Pipeline, in order

    May I suggest: http://docs.nvidia.com/cuda/samples/6_Advanced/sim...

Log in

Don't have an account? Sign up now