Name: GeForce + Radeon: Previewing DirectX 12 Multi-Adapter with Ashes of the Singularity
Item: GeForce + Radeon: Previewing DirectX 12 Multi-Adapter with Ashes of the Singularity
Author: Ryan Smith

Original Link: https://www.anandtech.com/show/9740/directx-12-geforce-plus-radeon-mgpu-preview

GeForce + Radeon: Previewing DirectX 12 Multi-Adapter with Ashes of the Singularity

VIEW ARTICLE

by Ryan Smith on October 26, 2015 10:00 AM EST

Posted in
AMD
Radeon
GeForce
GPUs
NVIDIA

180 Comments

Throughout this year we’ve looked at several previews and technical demos of DirectX 12 technologies, both before and after the launch of Windows 10 in July. As the most significant update to the DirectX API since DirectX 10 in 2007, the release of DirectX 12 marks the beginning of a major overhaul of how developers will program for modern GPUs. So to say there’s quite a bit of interest in it – both from consumers and developers – would be an understatement.

In putting together the DirectX 12 specification, Microsoft and their partners planned for the long haul, present and future. DirectX 12 has a number of immediately useful features in it that has developers grinning from ear to ear, but at the same time given the fact that another transition like this will not happen for many years (if at all), DirectX 12 and the update to the underlying display driver foundation were meant to be very forward looking and to pack in as many advanced features as would be reasonable. Consequently the first retail games such as this quarter’s Fable Legends will just scratch the surface of what the API can do, as developers are still in the process of understanding the API and writing new engines around it, and GPU driver developers are similarly still hammering out their code and improving their DirectX 12 functionality.

Of everything that has been written about DirectX 12 so far, the bulk of the focus has been on the immediate benefits of the low-level nature of the API, and this is for a good reason. The greatly reduced driver overhead and better ability to spread out work submission over multiple CPU cores stands to be extremely useful for game developers, especially as the CPU submission bottleneck is among the greatest bottlenecks facing GPUs today. Even then, taking full advantage of this functionality will take some time as developers have become accustomed to minimizing the use of draw calls to work around the bottleneck, so it is safe to say that we are at the start of what is going to be a long transition for gamers and game developers.

A little farther out on the horizon than the driver overhead improvements are DirectX 12’s improvements to multi-GPU functionality. Traditionally the domain of drivers – developers have little control under DirectX 11 – DirectX 12’s explicit controls extend to multi-GPU rendering as well. It is now up to developers to decide if they want to use multiple GPUs and how they want to use them. And with explicit control over the GPUs along with the deep understanding that only a game’s developer can have for the layout of their rendering pipeline, DirectX 12 gives developers the freedom to do things that could never be done before.

That brings us to today’s article, an initial look into the multi-GPU capabilities of DirectX 12. Developer Oxide Games, who is responsible for the popular Star Swarm demo we looked at earlier this year, has taken the underlying Nitrous engine and are ramping up for the 2016 release of the first retail game using the engine, Ashes of the Singularity. As part of their ongoing efforts to Nitrous as a testbed for DirectX 12 technologies and in conjunction with last week’s Steam Early Access release of the game, Oxide has sent over a very special build of Ashes.

What makes this build so special is that it’s the first game demo for DirectX 12’s multi-GPU Explicit Multi-Adapter (AKA Multi Display Adapter) functionality. We’ll go into a bit more on Explicit Multi-Adapter in a bit, but in short it is one of DirectX 12’s two multi-GPU modes, and thanks to the explicit controls offered by the API, allows for disparate GPUs to be paired up. More than SLI and more than Crossfire, EMA allows for dissimilar GPUs to be used in conjunction with each other, and productively at that.

So in an article only fitting for the week of Halloween, today we will be combining NVIDIA GeForce and AMD Radeon cards into a single system – a single rendering setup – to see how well Oxide’s early implementation of the technology works. It may be unnatural and perhaps even a bit unholy, but there’s something undeniably awesome about watching a single game rendered by two dissimilar cards in this fashion.

A Brief History on Multi-GPU with Dissimilar GPUs

Before we dive into our results, let’s talk briefly about the history of efforts to render games with multiple, dissimilar GPUs. After the development of PCI Express brought about the (re)emergence of NVIDIA’s SLI and AMD’s CrossFire, both companies eventually standardized their multi-GPU rendering efforts across the same basic technology. Using alternate frame rendering (AFR), NVIDIA and AMD would have the GPUs in a multi-GPU setup each render a separate frame. With the drivers handing off frames to each GPU in an alternating manner, AFR was the most direct and most compatible way to offer multi-GPU rendering as it didn’t significantly disrupt the traditional game rendering paradigms. There would simply be two (or more) GPUs rendering frames instead of one, with much of the work abstracted by the GPU drivers.

Using AFR allowed for relatively rapid multi-GPU support, but it did come with tradeoffs as well. Alternating frames meant that inter-frame dependencies needed to be tracked and handled, which in turn meant that driver developers had to add support for games on a game-by-game basis. Furthermore the nature of distributing the work meant that care needed to be taken to ensure each GPU rendered at an even pace so that the resulting in-game motion was smooth, a problem AMD had to face head-on in 2013. Finally, because AFR had each GPU rendering whole frames, it worked best when GPUs were as identical as possible in performance; a performance gap would at best require a faster card to spend some time waiting on the slower card, and at worse exacerbate the aforementioned frame pacing issues. As a result NVIDIA only allows identical cards to be paired up in SLI, and AMD only allows a slightly wider variance (typically cards using the same GPU).

In 2010 LucidLogix set out to do one better, leveraging their graphics expertise to develop their Hydra technology. By using a combination of a hardware and software, Hydra could intercept DirectX and OpenGL calls and redistribute them to split up rendering over multiple, and for the first time, dissimilar GPUs. Long a dream within the PC gaming space (and subject of a few jokes), the possibilities for using dissimilar GPUs via Hydra were immense – pairing up GPUs not only from different vendors, but of differing performance as well – resolving some of AFR’s shortcomings while allowing gamers to do things such as reuse old video cards and still receive a performance benefit.

However in the long run the Hydra technology failed to catch on. The process of splitting up API calls, having multiple GPUs render them, and compositing them back together to a single frame proved to be harder than LucidLogix expected, and as a result Hydra’s compatibility was poor and performance gains were limited. Coupled with the cost of the hardware, the licensing, and the fact that Hydra boards were never SLI certified (preventing typical NVIDIA SLI operation) meant that Hydra had a quick exit from motherboards.

In the end what LucidLogix was attempting was a valiant effort, but in retrospect one that was misguided. Working at the back-end of the rendering chain and manipulating API calls can work, but it is a massive amount of effort and it has hardware developers aiming at a moving target, requiring constant effort to keep up with new games. AMD and NVIDIA’s driver-level optimizations don’t fare too much better in this respect; there are vendor-specific shortcuts such as NVAPI that simplify this some, but even AMD and NVIDIA have to work to keep up with new games. This is why they need to issue driver updates and profile updates so frequently in order to get the best performance out of CrossFire and SLI.

But what if there was a better way to manage multiple GPUs and assign work to them? Would it be possible to do a better job working from the front-end of the rendering chain? This is something DirectX 12 sets out to answer with its multi-adapter modes.

DirectX 12 Multi-GPU

In DirectX 12 there are technically three different modes for multi-adapter operation. The simplest of these modes is what Microsoft calls Implicit Multi-Adapter. Implicit Multi-Adapter is essentially the lowest rung of multi-adapter operation, intended to allow developers to use the same AFR-friendly techniques as they did with DirectX 11 and before. This model retains the same limited ability for game developers to control the multi-GPU rendering process, which limits the amount of power they have, but also limits their responsibilities as well. Consequently, just as with DirectX 11 mutli-GPU, in implicit mode much of the work is offloaded to the drivers (and practically speaking, AMD and NVIDIA).

While the implicit model has the most limitations, the lack of developer responsibilities also means it’s the easiest to implement. In an era where multi-platform games are common, even after developers make the switch to DirectX 12 they may not want to undertake the effort to support Explicit Multi-Adapter, as the number of PC owners with multiple high-powered GPU is a fraction of the total PC gaming market. And in that case, with help from driver developers, implicit is the fastest path towards supporting multiple GPUs.

What’s truly new to DirectX 12 then are its Explicit Multi-Adapter (EMA) modes. As implied by the name, these modes require game developers to explicitly program for multi-GPU operation, specifying how work will be assigned to each GPU, how memory will be allocated, how the GPUs will communicate, etc. By giving developers explicit control over the process, they have the best chance to extract the most multi-GPU performance out of a system, as they have near-absolute control over both the API and the game, allowing them to work with more control and more information than any of the previously discussed multi-GPU methods. The cost of using explicit mode is resources: with great power comes great responsibility, and unlike implicit mode, game developers must put in a fair bit of work to make explicit mode work, and more work yet to make it work well.

Within EMA there are two different ways to address GPUs: linked mode and unlinked mode. Unlinked mode is essentially the baseline mode for EMA, and offers the bulk of EMA’s features. Linked mode on the other hand builds unlinked that by offering yet more functionality in exchange for much tighter restrictions on what adapters can be used.

The ultimate purpose of unlinked mode is to allow developers to take full advantage of all DirectX 12 capable GPU resources in a system, at least so long as they are willing to do all of the work required to manage those resources. Unlinked mode, as opposed to linked mode and implicit multi-adapter, can work with DX12 GPUs from any vendor, providing just enough abstraction to allow GPUs to exchange data but putting everything else in the developer’s hands. Depending on what developers want to do, unlinked mode can be used for anything from pairing up two dGPUs to pairing up a dGPU with an iGPU, with the GPUs being a blank slate of sort for developers to use as they see fit for whatever algorithms and technologies they opt to use.

As the base mode for DirectX 12 multi-GPU, unlinked mode presents each as its own device, with its own memory, its own command processor, and more, accurately representing the layout of the physical hardware. What DirectX 12’s EMA brings to the table that’s new is that it allows developers to exchange data between GPUs, going beyond just finished, rendered images and potentially exchanging partially rendered frames, buffers, and other forms of data. It’s the ability to exchange multiple data types that gives EMA its power and its flexibility, as without it, it wouldn’t be possible to implement much more than AFR. EMA is the potential for multiple GPUs to work together, be it similar or disparate; no more and no less.

If this sounds very vague that’s because it is, and that in turn is because the explicit API outstrips what today’s hardware is capable of. Compared to on-board memory, any operations taking place over PCI Express are relatively slow and high latency. Some GPUs in turn handle this better than others, but at the end of the day the PCIe bus is still a bottleneck at a fraction of the speed of local memory. That means while GPUs can work together, they must do so intelligently, as we’re not yet at the point where GPUs can quickly transfer large amounts of data from each other.

Because EMA is a blank slate, it ultimately falls to developers to put it to good use; DirectX 12 just supplies the tools. Traditional AFR implementations are one such option, and so are splitting up workloads in other fashions such as split-frame rendering (SFR), or even methods where one GPU doesn’t render a complete frame or fractions of a complete frame, passing off frames at different stages to different GPUs.

But practically speaking, a lot of the early focus on EMA development and promotion is on dGPU + iGPU, and this is because the vast majority of PCs with a dGPU also have an iGPU. Relative to even a $200 dGPU, an iGPU is going to offer a fraction of the performance, but it’s also a GPU resource that is otherwise going unused. Epic Games has been experimenting with using EMA to have iGPUs do post-processing, as finished frames are relatively small (1080p60 with an FP32 frame is only 2GB/sec, a fraction of PCIe 3.0 x16’s bandwidth), post-processing is fairly lightweight in resource requirements, and it typically has a predictable processing time.

Moving on, building on top of unlinked mode is EMA’s linked mode. Linked mode is by and large the equivalent of SLI/CrossFire for EMA, and is designed for systems where all GPUs being used are near-identical. Within linked mode all of the GPUs are pooled and presented to applications as a single GPU, just with multiple command processors and multiple memory pools due to the limits of the PCIe bus. Because linked mode is restricted to similar GPUs, developers gain even more power and control, as linked GPUs will be from the same vendor and use the same data formats at every step.

Broadly speaking, linked mode will be both easier and harder for developers to use relative to unlinked mode. Unlike unlinked mode there are certain assumptions that can be made about the hardware and what it’s capable of, and developers won’t need to juggle the complications of using GPUs from multiple vendors at once. On the other hand this is the most powerful mode because of all of the options it presents developers, with more complex rendering techniques likely to be necessary to extract the full performance benefit of linked mode.

Ultimately, one point that Microsoft and developers have continually reiterated in their talks is that explicit multi-adapter is that like so many other low-level aspects of DirectX 12, it’s largely up to the developers to put the technology to good use. The API provides a broad set of capabilities – tempered a bit by hardware limitations and how quickly GPUs can exchange data – but unlike DirectX 11 and implicit multi-adapter, it’s developers that define how GPUs should work together. So whether a game supports any kind of EMA operation and whether this means combining multiple dGPUs from the same vendor, multiple dGPUs from different vendors, or a dGPU and an iGPU is a question of software more than it is of hardware.

Ashes of the Singularity: Unlinked Explicit Multi-Adapter with AFR

Based off of Oxide’s Nitrous engine, Ashes of the Singularity will be the first full game released using the engine. Oxide had previously used the engine as part of their Star Swarm technical demo, showcasing the benefits of vastly improved draw call throughput under Mantle and DirectX 12. As one might expect then, for their first retail game Oxide is developing a game around Nitrous’s DX12 capabilities, with an eye towards putting a large number of draw calls to good use and to develop something that might not have been as good looking under DirectX 11.

That resulting game is Ashes of the Singularity, a massive-scale real time strategy game. Ashes is a spiritual successor of sorts to 2007’s Supreme Commander, a game with a reputation for its technical ambition. Similar to Supreme Commander, Oxide is aiming high with Ashes, and while the current alpha is far from optimized, they have made it clear that even the final version of the game will push CPUs and GPUs hard. Between a complex game simulation (including ballistic and line of sight checks for individual units) and the rendering resources needed to draw all of those units and their weapons effects in detail over a large playing field, I’m expecting that the final version of Ashes will be the most demanding RTS we’ve seen in some number of years.

Because of its high resource requirements Ashes is also a good candidate for multi-GPU scaling, and for this reason Oxide is working on implementing DirectX 12 explicit multi-adapter support into the game. For Ashes, Oxide has opted to start by implementing support for unlinked mode, both because this is a building block for implementing linked mode later on and because from a tech demo point of view this allows Oxide to demonstrate unlinked mode’s most nifty feature: the ability to utilize multiple dissimilar (non-homogenous) GPUs within a single game. EMA with dissimilar GPUs has been shown off in bits and pieces at developer events like Microsoft’s BUILD, but this is the first time an in-game demo has been made available outside of those conferences.

In order to demonstrate EMA and explicit synchronization in action, Oxide has started things off by using a basic alternate frame rendering implementation for the game. As we briefly mentioned in our technical overview of DX12 explicit multi-adapter, EMA puts developers in full control of the rendering process, which for Oxide meant implementing AFR from scratch. This includes assigning frames to each GPU, handling frame buffer transfers from the secondary GPU to the primary GPU, and most importantly of all controlling frame pacing, which is typically the hardest part of AFR to get right.

Because Oxide is using a DX12 EMA AFR implementation here, this gives Ashes quite a bit of flexibility as far as GPU compatibility goes. From a performance standpoint the basic limitations of AFR are still in place – due to the fact that each GPU is tasked with rendering a whole frame, all utilized GPUs need to be close in performance for best results – otherwise Oxide is able to support a wide variety of GPUs with one generic implementation. This includes not only AMD/AMD and NVIDIA/NVIDIA pairings, but GPU pairings that wouldn’t typically work for Crossfire and SLI (e.g. GTX Titan X + GTX 980 Ti). But most importantly of course, this allows Ashes to support using an AMD video card and an NVIDIA video card together as well. In fact beyond the aforementioned performance limitations, Ashes’ AFR mode should work on any two DX12 compliant GPUs.

From a technical standpoint, Oxide tells us that they’ve had a bit of a learning curve in getting EMA working for Ashes – particularly since they’re the first – but that they’re happy with the results. Obviously the fact that this even works is itself a major accomplishment, and in our experience frame pacing with v-sync disabled and tearing enabled feels smooth on the latest generation of high-end cards. Otherwise Oxide is still experimenting with the limits of the hardware and the API; they’ve told us that so far they’ve found that there’s plenty of bandwidth over PCIe for shared textures, and meanwhile they’re incurring a roughly 2ms penalty in transferring data via GPUs.

With that said and to be very clear here, the game itself is still in its alpha state, and the multi-adapter support is not even at alpha (ed: nor is it in the public release at this time). So Ashes’ explicit multi-adapter support is a true tech demo, intended to first and foremost show off the capabilities of EMA rather than what performance will be like in the retail game. As it stands the AFR-enabled build of Ashes occasionally crashes at load time for no obvious reason when AFR is enabled. Furthermore there are stability/corruption issues with newer AMD and NVIDIA drivers, which has required us to use slightly older drivers that have been validated to work. Overall while AMD and NVIDIA have their DirectX 12 drivers up and running, as has been the case with past API launches it’s going to take some time for the two GPU firms to lock down every new feature of the API and driver model and to fully knock out all of their driver bugs.

Finally, Oxide tells us that going forward they will be developing support for additional EMA modes in Ashes. As the current unlinked EMA implementation is stabilized, the next thing on their list will be to add support for linked EMA for better performance on similar GPUs. Oxide is still exploring linked EMA, but somewhat surprisingly they tell us that unlinked EMA already unlocks much of the performance of their AFR implementation. A linked EMA implementation in turn may only improve multi-GPU scaling by a further 5-10%. Beyond that, they will also be looking into alternative implementations of multi-GPU rendering (e.g. work sharing of individual frames), though that is farther off and will likely hinge on other factors such as hardware capabilities and the state of DX12 drivers from each vendor.

The Test

For our look at Ashes’ multi-adapter performance, we’re using Windows 10 with the latest updates on our GPU testbed. This provides plenty of CPU power for the game, and we’ve selected sufficiently high settings to ensure that we’re GPU-bound at all times.

For GPUs we’re using NVIDIA’s GeForce GTX Titan X and GTX 980 Ti, along with AMD’s Radeon R9 Fury X and R9 Fury for the bulk of our testing. As roughly comparable cards in price and performance, the GTX 980 Ti and R9 Fury X are our core comparison cards, with the additional GTX and Fury cards to back them up. Meanwhile we’ve also done a limited amount of testing with the GeForce GTX 680 and Radeon HD 7970 to showcase how well Ashes’ multi-adapter support works on older cards.

Finally, on the driver side of matters we’re using the most recent drivers from AMD that work correctly in multi-adapter mode with this build of Ashes. For AMD that’s Catalyst 15.8 and for NVIDIA that’s release 355.98. We’ve also thrown in single-GPU results with the latest drivers (15.0 and 358.50 respectively) to quickly showcase where single-GPU performance stands with these newest drivers.

CPU:	Intel Core i7-4960X @ 4.2GHz
Motherboard:	ASRock Fatal1ty X79 Professional
Power Supply:	Corsair AX1200i
Hard Disk:	Samsung SSD 840 EVO (750GB)
Memory:	G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26)
Case:	NZXT Phantom 630 Windowed Edition
Monitor:	Asus PQ321
Video Cards:	AMD Radeon R9 Fury X ASUS STRIX R9 Fury AMD Radeon HD 7970 NVIDIA GeForce GTX Titan X NVIDIA GeForce GTX 980 Ti NVIDIA GeForce GTX 680
Video Drivers:	NVIDIA Release 355.98 NVIDIA Release 358.50 AMD Catalyst 15.8 Beta AMD Catalyst 15.10 Beta
OS:	Windows 10 Pro

Ashes GPU Performance: Single & Mixed High-End GPUs

Since seeing is believing, we’ll start things off with a look at some direct captured recordings of Ashes running the built-in benchmark. These recordings showcase a Radeon R9 Fury X and a GeForce GTX 980 Ti in AFR mode, with each video swapping the primary and secondary video card. Both videos are captures with our 2560x1440 settings and with v-sync on, though YouTube limits 60fps videos to 1080p at this time.

Overall you’d be hard-pressed to find a difference between the two videos. No matter which GPU is primary, both setups render correctly and without issue, showcasing that DirectX 12 explicit multi-adapter and Ashes’ AFR implementation on top of it is working as expected.

Diving into our results then, let’s start with performance at 2560x1440.

Ashes of the Singularity (Alpha) - 2560x1440 - High Quality - 2x MSAA

It’s interesting to note that when picking these settings, the settings were chosen first and the cards second. So the fact that the GTX 980 Ti and R9 Fury X end up being so close in average performance comes as a pleasant surprise. With AFR performance gains dependent in part on how similar the two cards are, this should give us a better look at performance than cards that differ widely in performance.

In any case what we find is that with a single card setup the GTX 980 Ti and R9 Fury X are within 5% of each other with the older driver sets we needed to use for AFR compatibility. Both AMD and NVIDIA do see some performance gains with newer driver sets, with NVIDIA picking up 8% while AMD picks up 5%.

But more importantly let’s talk about mutli-GPU setups. If everything is working correctly and there are no unexpected bottlenecks, then on paper in mixed GPU setups we should get similar results no matter which card is the primary. And indeed that’s exactly what we find here, with only 1.4fps (2%) separating the GeForce + Radeon setups. Using the Radeon Fury X as the primary card gets the best results at 70.8fps, while swapping the order to let the GTX 980 Ti lead gives us 69.4fps.

However would you believe that the mixed GPU setups are faster than the homogenous setups? Trailing the mixed setups is the R9 Fury X + R9 Fury setup, averaging 67.1fps and trailing the slower mixed setup by 3.5%. Slower still – and unexpectedly so – is the GTX 980 Ti + GTX Titan X setup, which averages just 61.6fps, some 12% slower than the GTX 980 Ti + Fury X setup. The card setup is admittedly somewhat unusual here – in order to consistently use the GTX 980 Ti as the primary card we had to make the secondary card a GTX Titan X, seeing as how we don’t have another GTX 980 Ti or a third-tier GM200 card comparable to the R9 Fury – but even so the only impact here should be that the GTX Titan X doesn’t get to stretch its legs quite as much since it needs to wait on the slightly slower GTX 980 Ti primary in order to stay in sync.

Ashes of the Singularity (Alpha) - 2560x1440 - Multi-GPU Perf. Gains

Looking at performance from the perspective of overall performance gains, the extra performance we see here isn’t going to be chart-topping – an optimized SLI/CF setup can get better than 80% gains – but overall the data here confirms our earlier raw results: we’re actually seeing a significant uptick in performance with the mixed GPU setups. R9 Fury X + GTX 980 Ti is some 75% faster than a single R9 Fury X while GTX 980 Ti + R9 Fury X is 64% faster than a single GTX 980 Ti. Meanwhile the dual AMD setup sees a 66% performance gain, followed by the dual NVIDIA setup at only 46%.

The most surprising thing about all of this is that the greatest gains are with the mixed GPU setups. It’s not immediately clear why this is – if there’s something more efficient about having each vendor and their drivers operating one GPU instead of two – or if AMD and NVIDIA are just more compatible than either company cares to admit. Either way this shows that even with Ashes’ basic AFR implementation, multi-adapter rendering is working and working well. Meanwhile the one outlier, as we briefly discussed before, is the dual NVIDIA setup, which just doesn’t scale quite as well.

Finally let’s take a quick look at the GPU utilization statistics from MSI Afterburner, with our top-performing R9 Fury X + GTX 980 Ti setup. Here we can see the R9 Fury X in its primary card role operate at near-100% load the entire time, while the GTX 980 Ti secondary card is averaging close to 90% utilization. The difference, as best as we can tell, is the fact that the secondary card has to wait on additional synchronization information from the primary card, while the primary card is always either rendering or reading in a frame from the secondary card.

3840x2160

Now we’ll kick things up a notch by increasing the resolution to 3840x2160.

Ashes of the Singularity (Alpha) - 3840x2160 - High Quality - 2x MSAA

Despite the fact that this was just a resolution increase, the performance landscape has shifted by more than we would expect here. The top configuration is still the mixed GPU configuration, with the R9 Fury X + GTX 980 Ti setup taking the top spot. However the inverse configuration of the GTX 980 Ti + R9 Fury X isn’t neck-and-neck this time, rather it’s some 15% slower. Meanwhile the pokey-at-2560 dual NVIDIA setup is now in second place by a hair, trailing the mixed GPU setup by 10%. Following that at just 0.1fps lower is the dual AMD setup with 46.7fps.

These tests are run multiple times and we can consistently get these results, so we are not looking at a fluke. At 3840x2160 the fastest setup by a respectable 10.5% margin is the R9 Fury X + GTX 980 Ti setup, which for an experimental implementation of DX12 unlinked explicit multi-adapter is quite astonishing. “Why not both” may be more than just a meme here…

From a technical perspective the fact that there’s now a wider divergence between the mixed GPU setups is unexpected, but not necessarily irrational. If the R9 Fury X is better at reading shared resources than the GTX 980 Ti – be it due to hardware differences, driver differences, or both – then that would explain what we’re seeing. Though at the same time we can’t rule out the fact that this is an early tech demo of this functionality.

Ashes of the Singularity (Alpha) - 3840x2160 - Multi-GPU Perf. Gains

As you’d expect from those raw numbers, the best perf gains are with the R9 Fury X + GTX 980 Ti, which picks up 66% over the single-GPU setups. After that the dual AMD GPU setup picks up 50%, the dual NVIDA setup a more respectable 46%, and finally the mixed GTX 980 Ti + R9 Fury X setup tops out with a 40% performance gain.

Finally, taking a quick look of GPU frametimes as reported by Ashes internal performance counters, we can see that across all of the mutli-GPU setups there’s a bit of a consistent variance going on. Overall there’s at least a few milliseconds difference between the alternating frames, with the dual NVIDIA setup faring the best while the dual AMD setup fares the worst. Otherwise the performance leader, the R9 Fury X + GTX 980 Ti, only averages a bit more variance than the dual NVIDIA setup. At least in this one instance, there’s some evidence to suggest that NVIDIA secondary cards have a harder time supplying frame data to the primary card (regardless of its make), while NVIDIA and AMD primary cards are similar in read performance.

Ashes GPU Performance: Single & Mixed 2012 GPUs

While Ashes’ mutli-GPU support sees solid performance gains with current-generation high-end GPUs, we wanted to see if those gains would extend to older DirectX 12 GPUs. To that end we’ve put the GeForce GTX 680 and the Radeon HD 7970 through a similar test, running the Ashes’ benchmark at 2560x1440 with Medium image quality and no MSAA.

Ashes of the Singularity (Alpha) - 2560x1440 - Medium Quality - 0x MSAA

First off, unlike our high-end GPUs there’s a distinct performance difference between our AMD and NVIDIA cards. The Radeon HD 7970 performs 22% better here, just averaging 30fps to the GTX 680’s 24.5fps. So right off the bat we’re entering an AFR setup with a moderately unbalanced set of cards.

Once we do turn on AFR, two very different things happen. The GTX 680 + HD 7970 setup is an outright performance regression, with performance 40% from the single GTX 680 Ti. On the other hand the HD 7970 + GTX 680 setup sees an unexpectedly good performance gain from AFR, picking up a further 55% to 46.4fps.

As this test is a smaller number of combinations it’s not clear where the bottlenecks are, but it’s none the less very interesting how we get such widely different results depending on which card is in the lead. In the GTX 680 + HD 7970 setup, either the GTX 680 is a bad leader or the HD 7970 is a bad follower, and this leads to this setup spinning its proverbial wheels. Otherwise letting the HD 7970 lead and GTX 680 follow sees a bigger performance gain than we would have expected for a moderately unbalanced setup with a pair of cards that were never known for their efficient PCIe data transfers. So long as you let the HD 7970 lead, at least in this case you could absolutely get away with a mixed GPU pairing of older GPUs.

First Thoughts

Wrapping up our first look at Ashes of the Singularity and DirectX 12 Explicit Multi-Adapter, when Microsoft first unveiled the technology back at BUILD 2015, I figured it would only be a matter of time until someone put together a game utilizing the technology. After all, Epic and Square already had their tech demos up and running. However with the DirectX 12 ecosystem still coming together here in the final months of 2015 – and that goes for games as well as drivers – I wasn’t expecting something quite this soon.

As it stands the Ashes of the Singularity multi-GPU tech demo is just that, a tech demo for a game that itself is only in Alpha testing. There are still optimizations to be made and numerous bugs to be squashed. But despite all of that, seeing AMD and NVIDIA video cards working together to render a game is damn impressive.

Seeing as this build of Ashes is a tech demo, I’m hesitant to read too much into the precise benchmark numbers we’re seeing. That said, the fact that the fastest multi-GPU setup was a mixed AMD/NVIDIA GPU setup was something I wasn’t expecting and definitely makes it all the more interesting. DirectX 11 games are going to be around for a while longer yet, so we’re likely still some time away from a mixed GPU gaming setup being truly viable, but it will be interesting to see just what Oxide and other developers can pull off with explicit multi-adapter as they become more familiar with the technology and implement more advanced rendering modes.

Meanwhile it’s interesting to note just how far the industry as a whole has come since 2005 or even 2010. GPU architectures have become increasingly similar and tighter API standards have greatly curtailed the number of implementation differences that would prevent interoperability. And with Explicit Multi-Adapter, Microsoft and the GPU vendors have laid down a solid path for allowing game developers to finally tap the performance of multiple GPUs in a system, both integrated and discrete.

The timing couldn’t be any better either. As integrated GPUs have consumed the low-end GPU market and both CPU vendors devote more die space than ever to their respective integrated GPUs, using a discrete GPU leaves an increasingly large amount of silicon unused in the modern gaming system. Explicit multi-adapter in turn isn’t the silver bullet to that problem, but it is a means to finally putting the integrated GPU to good use even when it’s not a system’s primary GPU.

However with that said, it’s important to note that what happens from here is ultimately more in the hands of game developers than hardware developers. Given the nature of the explicit API, it’s now the game developers that have to do most of the legwork on implementing multi-GPU, and I’m left to wonder how many of them are up to the challenge. Hardware developers have an obvious interest in promoting and developing multi-GPU technology in order to sell more GPUs – which is how we got SLI and Crossfire in the first place – but software developers don’t have that same incentive.

Ultimately as gamers all we can do is take a wait-and-see approach to the whole matter. But as DirectX 12 game development ramps up, I am cautiously optimistic that positive experiences like Ashes will help encourage other developers to plan for multi-adapter support as well.