Original Link: https://www.anandtech.com/show/9659/fable-legends-directx-12-benchmark-analysis
Fable Legends Early Preview: DirectX 12 Benchmark Analysis
by Ryan Smith, Ian Cutress & Daniel Williams on September 24, 2015 9:00 AM ESTUpdate 2016/03/07: Well so much for that. Fable Legends has been canceled. So it will ultimately be another game that gets to claim the right as the first Unreal Engine 4 based DX12 game.
DirectX 12 is now out in the wild as a part of Windows 10 and the updated driver model WDDM 2.0 that comes with it. Unlike DX11, there are no major gaming titles at launch - we are now waiting for games to take advantage of DX12 and what difference it will make for the game playing experience. One of the main focal points of DX12 is draw calls, leveraging multiple processor cores to dispatch GPU workloads, rather than the previous model of a single core doing most of the work. DX12 brings about a lot of changes with the goal of increasing performance, offering an even more immersive experience, but it does shift some of the support requirements to the engine developers such as SLI or Crossfire. We tackled two synthetic tests earlier this year, Star Swarm and 3DMark, but due to timing and other industry events, we are waiting for a better time to test the Ashes of the Singularity benchmark as the game nears completion. Until that point, a PR team got in contact with us regarding the upcoming Fable Legends title using the Unreal 4 engine, and an early access preview benchmark that came with it. Here are our results so far.
Fable Legends
Fable Legends is an Xbox One/Windows 10 exclusive free to play title built by Lionhead Studios in Unreal Engine 4. The game, styled as a ‘cooperative action RPG’, consists of asymmetrical multiplayer matches with attackers trying to raid a base and the defender playing more of a tower defense position.
The benchmark provided is more of a graphics showpiece than a representation of the gameplay, in order to show off the capabilities of the engine and the DX12 implementation. Unfortunately we didn't get to see any gameplay in this benchmark as a result, which would seem to focus more on combat. This is the one of the first DirectX 12 benchmarks available - Ashes of the Singularity by Stardock was released just before IDF, but due to scheduling we have not had a chance to dig into that one yet. This will be our first look at a DirectX 12 game engine with a game attached as a result.
Official Trailer
This benchmark pans through several outdoor scenes in a fashion similar to the Unigene Valley benchmark, focusing more on landscapes, distance drawing and tessellation rather than an upfront first-person perspective. Graphical effects such as dynamic global illumination are computed on the fly, making subtle differences in the lighting and it shows the day/night cycle being accelerated, similar to the large Grand Theft Auto benchmark. The engine itself draws on DX12 explicit features such as ‘asynchronous compute, manual resource barrier tracking, and explicit memory management’ that either allow the application to better take advantage of available hardware or open up options that allow developers to better manage multi-threaded applications and GPU memory resources respectively. The updated engine has had several additions to implement these visual effects and has promised that use of DirectX 12 will help to improve both the experience and performance.
The Test
The software provided to us is a prerelease version of Fable Legends, with early drivers, so ultimately the performance at this point is most likely not representative of the game at launch and should improve before release. What we will see here is more of a broad picture painting how different GPUs will scale when DX12 features are thrown into the mix. In fact, AMD sent us a note that there is a new driver available specifically for this benchmark which should improve the scores on the Fury X, although it arrived too late for this pre-release look at Fable Legends (Ryan did the testing but is covering Samsung’s 950 Pro launch in Korea at this time). It can underscore just how early in the game and driver development cycle DirectX 12 is for all players. But as with most important titles, we expect drivers and software updates to continue to drive performance forward as developers and engineers come to understand how the new version of DirectX works.
With that being said, there does not appear to be any stability issues with the benchmark as it stands, and we have had time to test graphics cards going back a few generations for both AMD and NVIDIA. Our pre-release package came with three test standards at 1280x720, 1920x1080 and 4K. We also attempted to test a number of these combinations multiple CPU core and thread count simulations in order to emulate a number of popular CPUs in the market.
CPU: | Intel Core i7-4960X in 3 modes: 'Core i7' - 6 Cores, 12 Threads at 4.2 GHz 'Core i5' - 4 Cores, 4 Threads at 3.8 GHz 'Core i3' - 2 Cores, 4 Threads at 3.8 GHz |
Motherboard: | ASRock Fatal1ty X79 Professional |
Power Supply: | Corsair AX1200i |
Hard Disk: | Samsung SSD 840 EVO (750GB) |
Memory: | G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26) |
Case: | NZXT Phantom 630 Windowed Edition |
Monitor: | Asus PQ321 |
Video Cards: | AMD Radeon R9 Fury X AMD Radeon R9 290X AMD Radeon R9 285 AMD Radeon HD 7970 NVIDIA GeForce GTX 980 Ti NVIDIA GeForce GTX 970 (EVGA) NVIDIA GeForce GTX 960 NVIDIA GeForce GTX 680 NVIDIA GeForce GTX 750 Ti |
Video Drivers: | NVIDIA Release 355.82 AMD Catalyst Cat 15.201.1102 |
OS: | Windows 10 |
This Test
All the results in this piece are on discrete GPUs. The benchmark outputs a score, which is merely the average frame rate multiplied by a hundred, but it also dumps an extensive data log where it tracks over 186 different elements of the system every frame, such as compute time for various effects for each frame. Our testing takes on three roles – direct GPU comparison of average frame rates at 1080p and 720p in our i7-4960X mode, CPU scaling at each resolution with the GTX 980 Ti and AMD Fury, X and then a deep analysis of the percentile data of these two graphics cards at each resolution and each CPU configuration.
Graphics Performance Comparison
With the background and context of the benchmark covered, we now dig into the data and see what we have to look forward to with DirectX 12 game performance. This benchmark has preconfigured batch files that will launch the utility at either 3840x2160 (4K) with settings at ultra, 1920x1080 (1080p) also on ultra, or 1280x720 (720p) with low settings more suited for integrated graphics environments.
When dealing with 3840x2160 resolution, the GTX 980 Ti has a single digit percentage lead over the AMD Fury X, but both are above the bare minimum of 30 FPS no matter what the CPU.
When dealing with the i5 and i7 at 1920x1080 ultra settings, the GTX 980 Ti still has that single digit percentage lead, but at Core i3 levels of CPU power the difference is next to zero, suggesting we are CPU limited even though the frame difference from i3 to i5 is minimal. If we look at the range of cards under the Core i7 at this point, the interesting thing here is that the GTX 970 just about hits that 60 FPS mark, while some of the older generation cards (7970/GTX 680) would require compromises in the settings to push it over the 60 FPS barrier at this resolution. The GTX 750 Ti doesn’t come anywhere close, suggesting that this game (under these settings) is targeting upper mainstream to lower high end cards. It would be interesting to see if there is an overriding game setting that ends up crippling this level of GPU.
At the 720p low settings, the Core i7 pushes everything above 60 FPS, but you need at least an AMD 7970/GTX 960 to start going for 120 FPS if only for high refresh rate panels. We are likely being held back by CPU performance as illustrated by the GTX 970 and GTX 980 Ti being practically tied and the R9 290X stepping ahead of the pack. This makes it interesting when we consider integrated graphics, which we might test for a later article. It is worth noting that at the low resolution, the R9 290X and Fury X pull out a minor lead over the NVIDIA cards. The Fury X expands this lead with the i5 and i3 configurations, just rolling over to the double digit percentage gains.
CPU Scaling
When it comes to how well a game scales with a processor, DirectX 12 is somewhat of a mixed bag. This is due to two reasons – it allows GPU commands to be issued by each CPU core, therefore removing the single core performance limit that hindered a number of DX11 titles and aiding configurations with fewer core counts or lower clock speeds. On the other side of the coin is that it because it allows all the threads in a system to issue commands, it can pile on the work during heavy scenes, moving the cliff edge for high powered cards further down the line or making the visual effects at the high end very impressive, which is perhaps something benchmarking like this won’t capture.
For our CPU scaling tests, we took the two high end cards tested and placed them in each of our Core i7 (6C/12T), Core i5 (4C/4T) and Core i3 (2C/4T) environments, at three different resolution/setting configurations similar to the previous page, and recorded the results.
Looking solely at the GTX 980 Ti to begin, and we see that for now the Fable Benchmark only scales at the low resolution and graphics quality. Moving up to 1080p or 4K sees similar performance no matter what the processor – perhaps even a slight decrease at 4K but this is well within a 2% variation.
On the Fury X, the tale is similar and yet stranger. The Fable benchmark is canned, so it should be running the same data each time – but in all three circumstances the Core i7 trails behind the Core i5. Perhaps in this instance there are too many threads on the processor contesting for bandwidth, giving some slight cache pressure (one wonders if some eDRAM might help). But again we see no real scaling improvement moving from Core i3 to Core i7 for our 1920x1080 and 3840x2160.
As we’ve seen in previous reviews, the effects of CPU scaling with regards resolution are dependent on both the CPU architecture and the GPU architecture, with each GPU manufacturer performing differently and two different models in the same silicon family also differing in scaling results. To that end, we actually see a boost at 1280x720 with the AMD 7970 and the GTX 680 when moving from the Core i3 to the Core i7.
If we look at the rendering time breakdown between GPUs on high end configurations, we get the following data. Numbers here are listed in milliseconds, so lower is better:
Looking at the 980Ti and Fury X we see that NVIDIA is significantly faster at GBuffer rendering, Dynamic Global Illumination, and Compute Shader Simulation & Culling. Meanwhile AMD pulls narrower leads in every other category including the ambiguous 'other'.
Dropping down a couple of tiers with the GTX 970 and R9 290X, we see some minor variations. The R9 290X has good leads in dynamic lighting, and 'other', with smaller leads in Compute Shader Simulation & Culling and Post Processing. The GTX 970 benefits on dynamic global illumination significantly.
What do these numbers mean? Overall it appears that NVIDIA has a strong hold on deferred rendering and global illumination and AMD has benefits with dynamic lighting and compute.
Discussing Percentiles and Minimum Frame Rates
Up until this point we have only discussed average frame rates, which is an easy number to generate from a benchmark run. Discussing minimum frame rates is a little tricky, because it could be argued that the time taken to render the worst frame should be the minimum. All it then takes is a bad GPU request (misaligned texture cache) which happens infrequently to provide skewed data. To this end, thanks to the logging functionality of the benchmark, we are able to report the frame rate profiles of each run and percentile numbers.
For the GTX 980 and AMD Fury X, we pulled out the 90th, 95th and 99th percentile data from the outputs, as well as plotting full graphs. For each of these data points, the 90th percentile should represent the frame rate (we’ll stick to reporting frame rates to simplify the matter) a game will achieve during 90% of the frames. Similar logic applies to the 95th and 99th percentile data, where these are closer to the absolute maximum but should be more consistent between runs.
This page (and the next) is going to be data heavy, but our analysis will discuss the effect of CPU scaling on percentile data on both GPUs in all three resolutions using all three CPUs. Starting with the GTX 980 Ti:
All three arrangements at 3840x2160 perform similarly, though there are slight regressions moving from the i3 to the i7 along most of the range, perhaps suggesting that having an excess of thread data has some issues. The Core i7 arrangement seems to have the upper hand at the low percentile (2%-4%) numbers as well.
At 1080p, the Core i7 gives greater results when the frame rate is above the average and we see some scaling effects when the scenes are simple (giving high frame rates). But for whatever reason, when the going gets tough the i7 seems to bottom out as we go beyond the 80th percentile.
If we ever wanted to see a good representation of CPU scaling, the 720p graph is practically there – all except for the 85th percentile and up which makes the data points pulled out in this region perhaps unrepresentative of the whole. This issue might be the same issue when it comes to the 1080p results as well.
Discussing Percentiles and Minimum Frame Rates
Continuing from the previous page, we performed a similar analysis on AMD's Fury X graphics card. Same rules apply - all three resolution/setting combinations using all three system configurations. Results are given as frame rate profiles showing percentiles as well as choosing the 90th, 95th and 99th percentile values to get an indication of minimum frame rates.
Moving on to the Fury X at 4K and we see all three processor lineups performing similarly, giving us an indication that we are more GPU limited here. There is a slight underline on the Core i7 though, giving slightly lower frame rates in easier scenes but a better frame rate when the going gets tough beyond the 95th percentile.
For 1080p, the results take a twist. It almost seems as if we have some form of reverse scaling, whereby more cores is doing more damage to the results. If we have a look at the breakdown provided by the in-game benchmark (given in milliseconds, so lower is better):
Three areas stand out as benefitting from fewer cores: Transparency and Effects, GBuffer Rendering and Dynamic Lighting. All three are related to illumination and how the illumination interacts with its surroundings. One reason springs to mind on this – with large core counts, too many threads are issuing work to the graphics card causing thread contention in the cache or giving the thread scheduler a hard time depending on what comes in as high priority.
Nevertheless, the situation changes when we move down again to 720p:
Here the Core i3 takes a nose dive as we become CPU limited to pushing out the frames.
Comparing Percentile Numbers Between the GTX 980 Ti and Fury X
As the two top end cards from both graphics silicon manufacturers were released this year, there was all a big buzz about which is best for what. Ryan’s extensive review of the Fury X put the two cards head to head on a variety of contests. For DirectX 12, the situation is a little less clear cut for a number of reasons – games are yet to mature, drivers are also still in the development stage, and both sides competing here are having to rethink their strategies when it comes to game engine integration and the benefits that might provide. Up until this point DX12 contests have either been synthetic or having some controversial issues. So for Fable Legends, we did some extra percentile based analysis for NVIDIA vs. AMD at the top end.
For this set of benchmarks we ran our 1080p Ultra test with any adaptive frame rate technology enabled and recorded the result:
For these tests, usual rules apply – GTX 980 and Fury X, in our Core i7/i5/i3 configurations at all three resolution/setting combinations (3840x2160 Ultra, 1920x1080 Ultra and 1280x720 Low). Data is given in the form of frame rate profile graphs, similar to those on the last page.
As always, Fable Legends is still in early access preview mode and these results may not be indicative of the final version, but at this point they still provide an interesting comparison.
At 3840x2160, both frame rate profiles from each card looks the same no matter the processor used (one could argue that the Fury X is mildly ahead on the i3 at low frame rates), but the 980 Ti has a consistent gap across most of the profile range.
At 1920x1080, the Core i7 model gives a healthy boost to the GTX 980 Ti in high frame rate scenarios, though this seems to be accompanied by an extended drop off region in high frame rate areas. It is also interesting that in the Core i3 mode, the Fury X results jump up and match the GTX 980 Ti almost across the entire range. This again points to some of the data we saw on the previous page – at 1080p somehow having fewer cores gave the results a boost due to lighting scenarios.
At 1280x720, as we saw in the initial GPU comparison page on average frame rates, the Fury X has the upper hand here in all system configurations. Two other obvious points are noticeable here – moving from the Core i5 to the Core i7, especially on the GTX 980 Ti, makes the easy frames go quicker and the harder frames take longer, but also when we move to the Core i3, performance across the board drops like a stone, indicating a CPU limited environment. This is despite the fact that with these cards, 1280x720 at low settings is unlikely to be used anyway.
Final Words
Non-final benchmarks are a tough element to define. On one hand, they do not show the full range of both performance and graphical enhancements and could be subject to critical rendering paths that cause performance issues. On the other side, they are near-final representations and aspirations of the game developers, with the game engine almost at the point of being comfortable. To say that a preview benchmark is somewhere from 50% to 90% representative of the final product is not much of a bold statement to make in these circumstances, but between those two numbers can be a world of difference.
Fable Legends, developed by Lionhead Studios and published by Microsoft, uses EPIC’s Unreal 4 engine. All the elements of that previous sentence have gravitas in the gaming industry: Fable is a well-known franchise, Lionhead is a successful game developer, Microsoft is Microsoft, and EPIC’s Unreal engines have powered triple-A gaming titles for the best part of two decades. With the right ingredients, therein lies the potential for that melt-in-the-mouth cake as long as the oven is set just right.
Convoluted cake metaphors aside, this article set out to test the new Fable Legends benchmark in DirectX 12. As it stands, the software build we received indicated that the benchmark and game is still in 'early access preview' mode, so improvements may happen down the line. Users are interested in how DX12 games will both perform and scale on different hardware and different settings, and we aimed to fill in some of those blanks today. We used several AMD and NVIDIA GPUs, mainly focusing on NVIDIA’s GTX 980 Ti and AMD’s Fury X, with Core i7-X (six cores with HyperThreading), Core i5 (quad core, no HT) and Core i3 (two cores, HT) system configurations. These two GPUs were also tested at 3840x2160 (4K) with Ultra settings, 1920x1080 with Ultra settings and 1280x720 with low settings.
On pure average frame rate numbers, we saw NVIDIA’s GTX 980 Ti by just under 10% in all configurations except for the 1280x720 settings which gave the Fury X a substantial (10%+ on i5 and i3) lead. Looking at CPU scaling, this showed that scaling only ever really occurred at the 1280x720 settings anyway, with both AMD and NVIDIA showing a 20-25% gain moving from a Core i3 to a Core i7. Some of the older cards showed a smaller 7% improvement over the same test.
Looking through the frame rate profile data, specifically looking for minimum benchmark percentile numbers, we saw an interesting correlation with using a Core i7 (six core, HT) platform and the frame rates on complex frames being beaten by the Core i5 and even the Core i3 setups, despite the fact that during the easier frames to compute the Core i7 performed better. In our graphs, it gave a tilted axis akin to a seesaw:
When comparing the separate compute profile time data provided by the benchmark, it showed that the Core i7 was taking longer for a few of the lighting techniques, perhaps relating to cache or scheduling issues either at the CPU end or the GPU end which was alleviated with fewer cores in the mix. This may come down to a memory controller not being bombarded with higher priority requests causing a shuffle in the data request queue.
When we do a direct comparison for AMD’s Fury X and NVIDIA’s GTX 980 Ti in the render sub-category results for 4K using a Core i7, both AMD and NVIDIA have their strong points in this benchmark. NVIDIA favors illumination, compute shader work and GBuffer rendering where AMD favors post processing, transparency and dynamic lighting.
DirectX 12 is coming in with new effects to make games look better with new features to allow developers to extract performance out of our hardware. Fable Legends uses EPIC’s Unreal Engine 4 with added effects and represents a multi-year effort to develop the engine around DX12's feature set and ultimately improve performance over DX11. With this benchmark we have begun to peek a little in to what actual graphics performance in games might be like, and if DX12 benefits users on low powered CPUs or high-end GPUs more. That being said, there is a good chance that the performance we’ve seen today will change by release due to driver updates and/or optimizing the game code. Nevertheless, at this point it does appear that a reasonably strong card such as the 290X or GTX 970 are needed to get a smooth 1080p experience (at Ultra settings) with this demo.