Original Link: https://www.anandtech.com/show/8962/the-directx-12-performance-preview-amd-nvidia-star-swarm



About a year and a half ago AMD kicked off the public half of a race to improve the state of graphics APIs. Dubbed "Mantle", AMD’s in-house API for their Radeon cards stripped away the abstraction and inefficiencies of traditional high-level APIs like DirectX 11 and OpenGL 4, and instead gave developers a means to access the GPU in a low-level, game console-like manner. The impetus: with a low-level API, engine developers could achieve better performance than with a high-level API, sometimes vastly exceeding what DirectX and OpenGL could offer.

While AMD was the first such company to publicly announce their low-level API, they were not the last. 2014 saw the announcement of APIs such as DirectX 12, OpenGL Next, and Apple’s Metal, all of which would implement similar ideas for similar performance reasons. It was a renaissance in the graphics API space after many years of slow progress, and one desperately needed to keep pace with the progress of both GPUs and CPUs.

In the PC graphics space we’ve already seen how early versions of Mantle perform, with Mantle offering some substantial boosts in performance, especially in CPU-bound scenarios. As awesome as Mantle is though, it is currently a de-facto proprietary AMD API, which means it can only be used with AMD GPUs; what about NVIDIA and Intel GPUs? For that we turn towards DirectX, Microsoft’s traditional cross-vendor API that will be making the same jump as Mantle, but using a common API for the benefit of every vendor in the Windows ecosystem.

DirectX 12 was first announced at GDC 2014, where Microsoft unveiled the existence of the new API along with their planned goals, a brief demonstration of very early code, and limited technical details about how the API would work. Since then Microsoft has been hard at work on DirectX 12 as part of the larger Windows 10 development effort, culminating in the release of the latest Windows 10 Technical Preview, Build 9926, which is shipping with an early preview version of DirectX 12.


GDC 2014 - DirectX 12 Unveiled: 3DMark 2011 CPU Time: Direct3D 11 vs. Direct3D 12

With the various pieces of Microsoft’s latest API finally coming together, today we will be taking our first look at the performance future of DirectX. The API is stabilizing, video card drivers are improving, and the first DirectX 12 application has been written; Microsoft and their partners are finally ready to show off DirectX 12. To that end, today we’ll looking at DirectX 12 through Oxide Games’ Star Swarm benchmark, our first DirectX 12 application and a true API efficiency torture test.

Does DirectX 12 bring the same kind of performance benefits we saw with Mantle? Can it resolve the CPU bottlenecking that DirectX 11 struggles with? How well does the concept of a low-level API work for a common API with disparate hardware? Let’s find out!



The Current State of DirectX 12 & WDDM 2.0

Although DirectX 12 is up and running in the latest public release of Windows 10, it and many of its related components are still under development. Windows 10 itself is still feature-incomplete, so what we’re looking at here today doesn’t even qualify as beta software. As a result today’s preview should be taken as just that: an early preview. There are still bugs, and performance and compatibility is subject to change. But as of now everything is far enough along that we can finally get a reasonable look at what DirectX 12 is capable of.

From a technical perspective the DirectX 12 API is just one part of a bigger picture. Like Microsoft’s last couple of DirectX 11 minor version upgrades, DirectX 12 goes hand-in-hand with a new version of the Windows Display Driver Model, WDDM 2.0. In fact WDDM 2.0 is the biggest change to WDDM since the driver model was introduced in Windows Vista, and as a result DirectX 12 itself represents a very large overhaul of the Windows GPU ecosystem.

Top: Radeon R9 290X. Bottom: GeForce GTX 980

Microsoft has not released too many details on WDDM 2.0 so far – more information will be released around GDC 2015 – but WDDM 2.0 is based around enabling DirectX 12, adding the necessary features to the kernel and display drivers in order to support the API above it. Among the features tied to WDDM 2.0 are DX12’s explicit memory management and dynamic resource indexing, both of which wouldn’t have been nearly as performant under WDDM 1.3. WDDM 2.0 is also responsible for some of the baser CPU efficiency optimizations in DX12, such as changes to how memory residency is handled and how DX12 applications can more explicitly control residence.

The overhauling of WDDM for 2.0 means that graphics drivers are impacted as well as the OS, and like Microsoft, NVIDIA and AMD have been preparing for WDDM 2.0 with updated graphics drivers. These drivers are still a work in progress, and as a result not all hardware support is enabled and not all bugs have been worked out.

DirectX 12 Support Status
  Current Status Supported At Launch
AMD GCN 1.2 (285) Working Yes
AMD GCN 1.1 (290/260 Series) Working Yes
AMD GCN 1.0 (7000/200 Series) Buggy Yes
NVIDIA Maxwell 2 (900 Series) Working Yes
NVIDIA Maxwell 1 (750 Series) Working Yes
NVIDIA Kepler (600/700 Series) Working Yes
NVIDIA Fermi (400/500 Series) Not Active Yes

In short, among AMD and NVIDIA their latest products are up and running in WDDM 2.0, but not on all of their earlier products. In AMD’s case GCN 1.0 cards are supported under their WDDM 2.0 driver, but we are encountering texturing issues in Star Swarm that do not occur with GCN 1.1 and later. Meanwhile in NVIDIA’s case, as is common for NVIDIA beta drivers they only ship with support enabled for their newer GPUs – Kepler, Maxwell 1, and Maxwell 2 – with Fermi support disabled. Both AMD and NVIIDA have already committed to supporting DirectX 12 (and by extension WDDM 2.0) on GCN 1.0 and later and Fermi and later respectively, so while we can’t test these products today, they should be working by the time DirectX 12 ships.

Also absent for the moment is a definition for DirectX 12’s Feature Level 12_0 and DirectX 11’s 11_3. Separate from the low-level API itself, DirectX 12 and its high-level counterpart DirectX 11.3 will introduce new rendering features such as volume tiled resources and conservative rasterization. While all of the above listed video cards will support the DirectX 12 low-level API, only the very newest video cards will support FL 12_0, and consequently be fully DX12 compliant on both a feature and API basis. Like so many other aspects of DirectX 12, Microsoft is saving any discussion of feature levels for GDC, at which time we should find out what the final feature requirements will be and which (if any) current cards will fully support FL 12_0.

Finally, with Microsoft’s announcement of their Windows 10 plans last month, Microsoft is also finally clarifying their plans for the deployment of DirectX 12. Because DirectX 12 and WDDM 2.0 are tied at the hip, and by extension tied to Windows 10, DirectX 12 will only be available on Windows 10. Windows 8/8.1 and Windows 7 will not be receiving DirectX 12 support.

DirectX 12 Supported OSes
  Will Support DX12? Required WDDM Version
Windows 10 Yes 2.0
Windows 8.1 No N/A
Windows 8 No N/A
Windows 7 No N/A

Backporting DirectX 12 to earlier OSes would require backporting WDDM 2.0 as well, which brings with it several issues due to the fact that WDDM 2.0 is a kernel component. Microsoft would either have to compromise on WDDM 2.0 features in order to make it work on these older kernels, or alternatively would have to more radically overhaul these kernels to accommodate the full WDDM 2.0 feature set, the latter of which is a significant engineering task and carries a significant risk of breaking earlier Windows installations. Microsoft has already tried this once before in backporting parts of Direct3D 11.1 and WDDM 1.2 to Windows 7, only to discover that even that smaller-scale project had compatibility problems. A backport of DirectX 12 would in turn be even more problematic.

The bright side of all of this is that with Microsoft’s plans to offer Windows 10 as a free upgrade for Windows 7/8/8.1 users, the issue is largely rendered moot. Though DirectX 12 isn’t being backported, Windows users will instead be able to jump forward for free, so unlike Windows 8 this will not require spending money on a new OS just to gain access to the latest version of DirectX. This in turn is consistent with Microsoft’s overall plans to bring all Windows users up to Windows 10 rather than letting the market get fragmented among different Windows versions (and risk repeating another XP), so the revelation that DirectX 12 will not get backported has largely been expected since Microsoft’s Windows 10 announcement.

Meanwhile we won’t dwell on the subject too much, but DirectX 12 being limited to Windows 10 does open up a window of opportunity for Mantle and OpenGL Next. With Mantle already working on Windows 7/8 and OpenGL Next widely expected to be similarly portable, these APIs will be the only low-level APIs available to earlier Windows users.



Star Swarm & The Test

For today’s DirectX 12 preview, Microsoft and Oxide Games have supplied us with a newer version of Oxide’s Star Swarm demo. Originally released in early 2014 as a demonstration of Oxide’s Nitrous engine and the capabilities of Mantle, Star Swarm is a massive space combat demo that is designed to push the limits of high-level APIs and demonstrate the performance advantages of low-level APIs. Due to its use of thousands of units and other effects that generate a high number of draw calls, Star Swarm can push over 100K draw calls, a massive workload that causes high-level APIs to simply crumple.

Because Star Swarm generates so many draw calls, it is essentially a best-case scenario test for low-level APIs, exploiting the fact that high-level APIs can’t effectively spread out the draw call workload over several CPU threads. As a result the performance gains from DirectX 12 in Star Swarm are going to be much greater than most (if not all) video games, but none the less it’s an effective tool to demonstrate the performance capabilities of DirectX 12 and to showcase how it is capable of better distributing work over multiple CPU threads.

It should be noted that while Star Swarm itself is a synthetic benchmark, the underlying Nitrous engine is relevant and is being used in multiple upcoming games. Stardock is using the Nitrous engine for their forthcoming Star Control game, and Oxide is using the engine for their own game, set to be announced at GDC 2015. So although Star Swarm is still a best case scenario, many of its lessons will be applicable to these future games.

As for the benchmark itself, we should also note that Star Swarm is a non-deterministic simulation. The benchmark is based on having two AI fleets fight each other, and as a result the outcome can differ from run to run. The good news is that although it’s not a deterministic benchmark, the benchmark’s RTS mode is reliable enough to keep the run-to-run variation low enough to produce reasonably consistent results. Among individual runs we’ll still see some fluctuations, while the benchmark will reliably demonstrate larger performance trends.


Star Swarm RTS Mode

The Test

For today’s preview Microsoft, NVIDIA, and AMD have provided us with the necessary WDDM 2.0 drivers to enable DirectX 12 under Windows 10. The NVIDIA driver is 349.56 and the AMD driver is 15.200. At this time we do not know when these early WDDM 2.0 drivers will be released to the public, though we would be surprised not to see them released by the time of GDC in early March.

In terms of bugs and other known issues, Microsoft has informed us that there are some known memory and performance regressions in the current WDDM 2.0 path that have since been fixed in interim builds of Windows. In particular the WDDM 2.0 path may see slightly lower performance than the WDDM 1.3 path for older drivers, and there is an issue with memory exhaustion. For this reason Microsoft has suggested that a 3GB card is required to use the Star Swarm DirectX 12 binary, although in our tests we have been able to run it on 2GB cards seemingly without issue. Meanwhile DirectX 11 deferred context support is currently broken in the combination of Star Swarm and NVIDIA's drivers, causing Star Swarm to immediately crash, so these results are with D3D 11 deferred contexts disabled.

For today’s article we are looking at a small range of cards from both AMD and NVIDIA to showcase both performance and compatibility. For NVIDIA we are looking at the GTX 980 (Maxwell 2), GTX 750 Ti (Maxwell 1), and GTX 680 (Kepler). For AMD we are looking at the R9 290X (GCN 1.1), R9 285 (GCN 1.2), and R9 260X (GCN 1.1). As we mentioned earlier support for Fermi and GCN 1.0 cards will be forthcoming in future drivers.

Meanwhile on the CPU front, to showcase the performance scaling of Direct3D we are running the bulk of our tests on our GPU testbed with 3 different settings to roughly emulate high-end Core i7 (6 cores), i5 (4 cores), and i3 (2 cores) processors. Unfortunately we cannot control for our 4960X’s L3 cache size, however that should not be a significant factor in these benchmarks.

DirectX 12 Preview CPU Configurations (i7-4960X)
Configuration Emulating
6C/12T @ 4.2GHz Overclocked Core i7
4C/4T @ 3.8GHz Core i5-4670K
2C/4T @ 3.8GHz Core i3-4370

Though not included in this preview, AMD’s recent APUs should slot between the 2 and 4 core options thanks to the design of AMD’s CPU modules.

CPU: Intel Core i7-4960X @ 4.2GHz
Motherboard: ASRock Fatal1ty X79 Professional
Power Supply: Corsair AX1200i
Hard Disk: Samsung SSD 840 EVO (750GB)
Memory: G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26)
Case: NZXT Phantom 630 Windowed Edition
Monitor: Asus PQ321
Video Cards: AMD Radeon R9 290X
AMD Radeon R9 285
AMD Radeon R7 260X
NVIDIA GeForce GTX 980
NVIDIA GeForce GTX 750 Ti
NVIDIA GeForce GTX 680
Video Drivers: NVIDIA Release 349.56 Beta
AMD Catalyst 15.200 Beta
OS: Windows 10 Technical Preview 2 (Build 9926)

Finally, while we’re going to take a systematic look at DirectX 12 from both a CPU standpoint and a GPU standpoint, we may as well answer the first question on everyone’s mind: does DirectX 12 work as advertised? The short answer: a resounding yes.

Star Swarm GPU Scaling - Extreme Quality (4 Cores)



CPU Scaling

Diving into our look at DirectX 12, let’s start with what is going to be the most critical component for a benchmark like Star Swarm, the CPU scaling.

Because Star Swarm is designed to exploit the threading inefficiencies of DirectX 11, the biggest gains from switching to DirectX 12 on Star Swarm come from removing the CPU bottleneck. Under DirectX 11 the bulk of Star Swarm’s batch submission work happens under a single thread, and as a result the benchmark is effectively bottlenecked by single-threaded performance, unable to scale out with multiple CPU cores. This is one of the issues DirectX 12 sets out to resolve, with the low-level API allowing Oxide to more directly control how work is submitted, and as such better balance it over multiple CPU cores.

Star Swarm CPU Scaling - Extreme Quality - GeForce GTX 980

Star Swarm CPU Scaling - Extreme Quality - Radeon R9 290X

Starting with a look at CPU scaling on our fastest cards, what we find is that besides the absurd performance difference between DirectX 11 and DirectX 12, performance scales roughly as we’d expect among our CPU configurations. Star Swarm's DirectX 11 path, being single-threaded bound, scales very slightly with clockspeed and core count increases. The DirectX 12 path on the other hand scales up moderately well from 2 to 4 cores, but doesn’t scale up beyond that. This is due to the fact that at these settings, even pushing over 100K draw calls, both GPUs are solidly GPU limited. Anything more than 4 cores goes to waste as we’re no longer CPU-bound. Which means that we don’t even need a highly threaded processor to take advantage of DirectX 12’s strengths in this scenario, as even a 4 core processor provides plenty of kick.

Meanwhile this setup also highlights the fact that under DirectX 11, there is a massive difference in performance between AMD and NVIDIA. In both cases we are completely CPU bound, with AMD’s drivers only able to deliver 1/3rd the performance of NVIDIA’s. Given that this is the original Mantle benchmark I’m not sure we should read into the DirectX 11 situation too much since AMD has little incentive to optimize for this game, but there is clearly a massive difference in CPU efficiency under DirectX 11 in this case.

Star Swarm D3D12 CPU Scaling - Extreme Quality

Having effectively ruled out the need for 6 core CPUs for Star Swarm, let’s take a look at a breakdown across all of our cards for performance with 2 and 4 cores. What we find is that Star Swarm and DirectX 12 are so efficient that only our most powerful card, the GTX 980, finds itself CPU-bound with just 2 cores. For the AMD cards and other NVIDIA cards we can get GPU bound with the equivalent of an Intel Core i3 processor, showcasing just how effective DirectX 12’s improved batch submission process can be. In fact it’s so efficient that Oxide is running both batch submission and a complete AI simulation over just 2 cores.

Star Swarm CPU Batch Submission Time (4 Cores)

Speaking of batch submission, if we look at Star Swarm’s statistics we can find out just what’s going on with batch submission. The results are nothing short of incredible, particularly in the case of AMD. Batch submission time is down from dozens of milliseconds or more to just 3-5ms for our fastest cards, an improvement just overof a whole order of magnitude. For all practical purposes the need to spend CPU time to submit batches has been eliminated entirely, with upwards of 120K draw calls being submitted in a handful of milliseconds. It is this optimization that is at the core of Star Swarm’s DirectX 12 performance improvements, and going forward it could potentially benefit many other games as well.


Another metric we can look at is actual CPU usage as reported by the OS, as shown above. In this case CPU usage more or less perfectly matches our earlier expectations: with DirectX 11 both the GTX 980 and R9 290X show very uneven usage with 1-2 cores doing the bulk of the work, whereas with DirectX 12 CPU usage is spread out evenly over all 4 CPU cores.

At the risk of speaking to the point that it’s redundant, what we’re seeing here is exactly why Mantle, DirectX 12, OpenGL Next, and other low-level APIs have been created. With single-threaded performance struggling to increase while GPUs continue to improve by leaps and bounds with each generation, something must be done to allow games to better spread out their rendering & submission workloads over multiple cores. The solution to that problem is to eliminate the abstraction and let the developers do it themselves through APIs like DirectX 12.



GPU Scaling

Switching gears, let’s take a look at performance from a GPU standpoint, including how well Star Swarm performance scales with more powerful GPUs now that we have eliminated the CPU bottleneck. Until now Star Swarm has never been GPU bottlenecked on high-end NVIDIA cards, so this is our first time seeing just how much faster Star Swarm can get until it runs into the limits of the GPU itself.

Star Swarm GPU Scaling - Extreme Quality (4 Cores)

As it stands, with the CPU bottleneck swapped out for a GPU bottleneck, Star Swarm starts to favor NVIDIA GPUs right now. Even accounting for performance differences, NVIDIA ends up coming out well ahead here, with the GTX 980 beating the R9 290X by over 50%, and the GTX 680 some 25% ahead of the R9 285, both values well ahead of their average lead in real-world games. With virtually every aspect of this test still being under development – OS, drivers, and Star Swarm – we would advise not reading into this too much right now, but it will be interesting to see if this trend holds with the final release of DirectX 12.

Meanwhile it’s interesting to note that largely due to their poor DirectX 11 performance in this benchmark, AMD sees the greatest gains from DirectX 12 on a relative basis and comes close to seeing the greatest gains on an absolute basis as well. The GTX 980’s performance improves by 150% and 40.1fps when switching APIs; the R9 290X improves by 416% and 34.6fps. As for AMD’s Mantle, we’ll get back to that in a bit.

Star Swarm GPU Scaling - Extreme Quality (2 Cores)

Having already established that even 2 CPU cores is enough to keep Star Swarm fed on anything less than a GTX 980, the results are much the same here for our 2 core configuration. Other than the GTX 980 being CPU limited, the gains from enabling DirectX 12 are consistent with what we saw for the 4 core configuration. Which is to say that even a relatively weak CPU can benefit from DirectX 12, at least when paired with a strong GPU.

However the GTX 750 Ti result in particular also highlights the fact that until a powerful GPU comes into play, the benefits today from DirectX 12 aren’t nearly as great. Though the GTX 750 Ti does improve in performance by 26%, this is far cry from the 150% of the GTX 980, or even the gains for the GTX 680. While AMD is terminally CPU limited here, NVIDIA can get just enough out of DirectX 11 that a 2 core configuration can almost feed the GTX 750 Ti. Consequently in the NVIDIA case, a weak CPU paired with a weak GPU does not currently see the same benefits that we get elsewhere. However as DirectX 12 is meant to be forward looking – to be out before it’s too late – as GPU performance gains continue to outstrip CPU performance gains, the benefits even for low-end configurations will continue to increase.



DirectX 12 vs. Mantle, Power Consumption

Although the bulk of our coverage today is going to be focused on DirectX 12 versus DirectX 11, we also wanted to take a moment to also stop and look at DirectX 12 and how it compares to AMD’s Mantle. Mantle offers an interesting point of contrast being that it has been in beta longer than DirectX 12, but also due to the fact that it’s an even lower level API than DirectX 12. Since Mantle only needs to work on AMD’s GPUs and can be tweaked for AMD’s architectures, it offers AMD the chance to exploit their GPUs in a few additional ways that a common, cross-vendor API like DirectX 12 cannot.

Star Swarm - Direct3D 12 vs. Mantle (4 Cores) - Extreme Quality

With 4 cores we find that AMD achieves better results with Mantle than DirectX 12 across the board. The gains are never very great – a few percent here and there – but they are consistent and just outside our window of variability for the Star Swarm benchmark. With such a small gain there are a number of factors that can possibly explain this outcome – better developed drivers, better developed application, further benefits of working with a known hardware platform – so we can’t credit any one factor. But it’s safe to say that at least in this one instance, at this time, Star Swarm’s Mantle rendering path produces even better results than its DirectX 12 path on AMD cards.

Star Swarm - Direct3D 12 vs. Mantle (2 Cores) - Extreme Quality

On the other hand, Mantle doesn’t seem to be able to accommodate a two-core situation as well, with the 290X seeing a small but distinct performance regression from switching to Mantle from DirectX 12. Though we didn’t have time to look at an AMD APU for this article, it would be interesting to see if this regression occurs on their 2M/4C parts as well as it does here; AMD is banking heavily on low-level APIs like Mantle to help level the CPU playing field with Intel, so if Mantle needs 4 CPU cores to fully spread its wings with faster cards, that might be a problem.

Star Swarm CPU Batch Submission Time (4 Cores) - D3D vs. Mantle - Extreme Quality

Diving deeper, we can see that part of the explanation for our Mantle performance regression may come from the batch submission process. DirectX 12 is unexpectedly well ahead of Mantle here, with batch submission taking on average a bit more than half as long as it does under Mantle. As batch submission times are highly correlated to CPU bottlenecking on Star Swarm, this would imply that DirectX 12 would bottleneck later than Mantle in this instance. That said, since we’re so strongly GPU-bound right now it’s not at all clear if either API would be CPU bottlenecked any time soon.

Update: Oxide Games has emailed us this evening with a bit more detail about what's going on under the hood, and why Mantle batch submission times are higher. When working with large numbers of very small batches, Star Swarm is capable of throwing enough work at the GPU such that the GPU's command processor becomes the bottleneck. For this reason the Mantle path includes an optimization routine for small batches (OptimizeSmallBatch=1), which trades GPU power for CPU power, doing a second pass on the batches in the CPU to combine some of them before submitting them to the GPU. This bypasses the command processor bottleneck, but it increases the amount of work the CPU needs to do (though note that in AMD's case, it's still several times faster than DX11).

This feature is enabled by default in our build, and by combining those small batches this is the likely reason that the Mantle path holds a slight performance edge over the DX12 path on our AMD cards. The tradeoff is that in a 2 core configuration, the extra CPU workload from the optimization pass is just enough to cause Star Swarm to start bottlenecking at the CPU again. For the time being this is a user-adjustable feature in Star Swarm, and Oxide notes that in any shipping game the small batch feature would likely be turned off by default on slower CPUs.

Star Swarm CPU Batch Submission Time (4 Cores) - Small Batch Optimization

Star Swarm - Direct3D 12 vs. Mantle (4 Cores) - Small Batch Optimization

If we turn off the small batch optimization feature, what we find is that Mantle' s batch submission time drops nearly in half, to an average of 4.4ms. With the second pass removed, Mantle and DirectX 12 take roughly the same amount of time to submit batches in a single pass. However as Oxide noted, there is a performance hit; the Mantle rendering path's performance goes from being ahead of DirectX 12 to trailing it. So given sufficient CPU power to pay the price for batch optimization, it can have a signifcant impact (16%) on improving performance under Mantle.

Star Swarm System Power Consumption (6 Cores)

Finally, we wanted to take a quick look at power consumption among cards and APIs. To once again repeat what we said earlier, Star Swarm is an imperfect, non-deterministic benchmark, and coupled with the in-development status of DirectX 12 everything here is subject to change. However we thought this was interesting enough to include in our evaluation.

As expected, the increased throughput from DirectX 12 and Mantle drive up system power consumption. With the CPU no longer the bottleneck, the GPU never gets a chance to idle and video card power consumption ramps up to full load.



Mid Quality Performance

Since our evaluation so far has been focused on performance with Star Swarm’s most resource intensive Extreme setting, we wanted to shake things up by trying a lower quality setting.

In this case Star Swarm’s various quality levels adjust both the CPU and GPU workload, with the Mid quality setting reducing both the number of draw calls generated and the amount of work generated per frame for the GPU. As a result we’re not adjusting just the CPU or the GPU workload, but it can give us an idea of what to expect from DirectX 12 and Star Swarm at lower settings more suitable for weaker systems.

Star Swarm D3D12 CPU Scaling - Mid Quality

Even with this lower quality setting, the CPU results tell us that only the GTX 980 is truly CPU bottlenecked with 2 cores. Everything else from the 290X on down can reach its GPU limit with a relatively weak CPU.

Star Swarm GPU Scaling - Mid Quality (4 Cores)

Star Swarm GPU Scaling - Mid Quality (2 Cores)

Overall the numbers are different, but the lineup is the same whether it’s Extreme quality or Mid quality. Every vendor still sees massive gains from enabling DirectX 12, though the overall gains aren’t quite as great as with Extreme quality. Meanwhile GTX 750 Ti in particular continues to see the weakest gains from DirectX 12, at only 14% for a 2 core configuration, thanks to a combination of NVIDIA’s lower CPU consumption and earlier GPU bottleneck.



Frame Time Consistency & Recordings

Last, but not least, we wanted to also look at frame time consistency across Star Swarm, our two vendors, and the various APIs available to them. Next to CPU efficiency gains, one of the other touted benefits of low-level APIs like DirectX 12 is the ability for developers to better control frame time pacing due to the fact that the API and driver are doing fewer things under the hood and behind an application’s back. Inefficient memory management operations, resource allocation, and shader compiling in particular can result in unexpected and undesirable momentary drops in performance. However, while low-level APIs can improve on this aspect, it doesn’t necessarily mean high-level APIs are bad at it. So it is an important distinction between bad/good and good/better.

On a technical note, these frame times are measured within (and logged by) Star Swarm itself. So these are not “FCAT” results that are measuring the end of the pipeline, nor is that possible right now due to the lack of an overlay option for DirectX 12.

Starting with the GTX 980, we can immediately see why we can’t always write-off high-level APIs. Benchmark non-determinism aside, both DirectX 11 and DirectX 12 produce consistent frame times; one is just much, much faster than the other. Both on paper and subjectively in practice, Star Swarm has little trouble maintaining consistent frame times on the GTX 980. Even if DirectX 11 is slow, it is at least consistent.

The story is much the same for the R9 290X. DirectX 11 and DirectX 12 both produce consistent results, with neither API experiencing frame time swings. Meanwhile Mantle falls into the same category as DirectX 12, producing similarly consistent performance and frame times.

Ultimately it’s clear from these results that if DirectX 12 is going to lead to any major differences in frame time consistency, Star Swarm is not the best showcase for it. With DirectX 11 already producing consistent results, DirectX 12 has little to improve on.

Finally, along with our frame time consistency graphs, we have also recorded videos of shorter run-throughs on both the GeForce GTX 980 and Radeon R9 290X. With YouTube now supporting 60fps, these videos are frame-accurate representations of what we see when we run the Star Swarm benchmark, showing first-hand the overall frame time consistency among all configurations, and of course the massive difference in performance.



First Thoughts

Bringing our preview of DirectX 12 to a close, what we’re seeing today is both a promising sign of what has been accomplished so far and a reminder of what is left to do. As it stands much of DirectX 12’s story remains to be told – features, feature levels, developer support, and more will only finally be unveiled by Microsoft next month at GDC 2015. So today’s preview is much more of a beginning than an end when it comes to sizing up the future of DirectX.

But for the time being we’re finally at a point where we can say the pieces are coming together, and we can finally see parts of the bigger picture. Drivers, APIs, and applications are starting to arrive, giving us our first look at DirectX 12’s performance. And we have to say we like what we’ve seen so far.

With DirectX 12 Microsoft and its partners set out to create a cross-vendor but still low-level API, and while there was admittedly little doubt they could pull it off, there has always been the question of how well they could do it. What kind of improvements and performance could you truly wring out of a new API when it has to work across different products and can never entirely avoid abstraction? The answer as it turns out is that you can still enjoy all of the major benefits of a low-level API, not the least of which are the incredible improvements in CPU efficiency and multi-threading.

That said, any time we’re looking at an early preview it’s important to keep our expectations in check, and that is especially the case with DirectX 12. Star Swarm is a best case scenario and designed to be a best case scenario; it isn’t so much a measure of real world performance as it is technological potential.

But to that end, it’s clear that DirectX 12 has a lot of potential in the right hands and the right circumstances. It isn’t going to be easy to master, and I suspect it won’t be a quick transition, but I am very interested in seeing what developers can do with this API. With the reduced overhead, the better threading, and ultimately a vastly more efficient means of submitting draw calls, there’s a lot of potential waiting to be exploited.

Log in

Don't have an account? Sign up now