Original Link: https://www.anandtech.com/show/11726/retesting-amd-ryzen-threadrippers-game-mode-halving-cores-for-more-performance
Retesting AMD Ryzen Threadripper’s Game Mode: Halving Cores for More Performance
by Ian Cutress on August 17, 2017 12:01 PM ESTFor the launch of AMD’s Ryzen Threadripper processors, one of the features being advertised was Game Mode. This was a special profile under the updated Ryzen Master software that was designed to give the Threadripper CPU more performance in gaming, at the expense of peak performance in hard CPU tasks. AMD’s goal, as described to us, was to enable the user to have a choice: a CPU that can be fit for both CPU tasks and for gaming at the flick of a switch (and a reboot) by disabling half of the chip.
Initially, we interpreted this via one of AMD’s slides as half of the threads (simultaneous multi-threading off), as per the exact wording. However, in other places AMD had stated that it actually disables half the cores: AMD returned to us and said it was actually disabling one of the two active dies in the Threadripper processor. We swallowed our pride and set about retesting the effect of Game Mode.
A Rose By Any Other Name
It’s not very often we have to retract some of our content at AnandTech – research is paramount. However in this instance a couple of things led to confusion. First was assumption related: in the original piece, we had assumed that AMD was making Game mode available through both the BIOS and through the Ryzen Master software. Second was communication: AMD had described Game Mode (and specifically, the Legacy Compatibility Mode switch it uses) at the pre-briefing at SIGGRAPH as having half the threads, but offered in diagrams that it was half the cores.
Based on the wording, we had interpreted that this was the equivalent of SMT being disabled, and adjusted the BIOS as such. After our review went live, AMD published and also reached out to us to inform of the error: where we had tested the part of Game Mode that deals with legacy core counts, we had disabled SMT rather than disabling a die and made the 16C/32T into to a 16C/16T system rather than an 8C/16T system. We were informed that the settings that deal with this feature are more complex than simply SMT being disabled, and as such was being offered primarily through Ryzen Master.
From AMD's Gaming Blog. Emphasis ours.
So for this review, we’re going to set the record straight, and test Threadripper in its Game Mode 8C/16T version. The previous review will be updated appropriately.
So What Is Game Mode?
For Ryzen Threadripper, AMD has defined two modes of operation depending on the use case. The first is Creator Mode, which is enabled by default. This enables full cores, full threads, and gives the maximum available bandwidth across the two active Threadripper silicon dies in the package, at the expense of some potential peak latency. In our original review, we measured the performance of Creator Mode in our benchmarks as the default setting, but also looked into the memory latency.
Each die can communicate to all four memory channels, but is only directly connected to two memory channels. Depending on where the data in DRAM is located, a core may have to search in near memory (the two channels closest) or far memory (the two channels over). This is commonly referred to a non-uniform memory architecture (NUMA). In a unified memory system (UMA), such as Creator mode, the system sees no difference between near memory and far memory, citing a single latency value for both which is typically the average between the near latency and the far latency. At DDR4-2400, we recorded this as 108 nanoseconds.
Game Mode does two things over Creator Mode. First, it changes the memory from UMA to NUMA, so the system can determine between near and far memory. At DDR4-2400, that 108ns ‘average’ latency becomes 79ns for near memory and 136ns for far memory (as per our testing). The system will ensure to use up all available near memory first, before moving to the higher latency far memory.
Second, Game Mode disables the cores in one of the silicon dies. This isn’t a full shutdown of the 8-core Zeppelin die, just the cores. The PCIe lanes, the DRAM channels and the various IO are still active, but the cores themselves are power gated such that the system does not use them or migrate threads to them. In essence, the 16C/32T processor becomes 8C/16T, but with quad-channel memory and 60 PCIe lanes still: the 1950X becomes an uber 1800X, and the 1920X becomes an uber 1600X. The act of disabling dies is called ‘Legacy Compatibility Mode’, which ensures that all active cores have access to near memory at the expensive of immediate bandwidth but enables games that cannot handle more than 20 cores (some legacy titles) to run smoothly.
The core count on the left is the absolute core count, not the core count in Game Mode. Which is confusing.
Some users might see paying $999 for a processor then disabling almost half of it as a major frustration (insert something about Intel integrated graphics). AMD’s argument is that the CPU is still good for gaming, and can offer a better gaming experience when given the choice. However if we consider the mantra surrounding these big processors around gaming adaptability: the ability to stream, transcode and game at the same time. It’s expected that in this mega-tasking (Intel’s term) scenario, having a beefy CPU helps even though there will be some game losses. Moving down to only 8 cores is likely to make this worse, and the only situation Game Mode assists is for a user who purely wants a gaming machine but quad-channel memory and all the PCIe lanes. There’s also a frequency argument – in a dual die configuration, active threads can be positioned at thermally beneficial points of the design to ensure the maximum frequency. Again, AMD reiterates that it offers choice, and users who want to stick with all or half the cores are free to do so, as this change in settings would have been available in BIOS even if AMD did not give a quick button to it.
As always, the proof is in the pudding. If there’s a significant advantage to gaming, then Game Mode will be a plus point in AMD’s cap.
With regards how the memory and memory latency operates, Game Mode still incorporates NUMA, ensuring near memory is used first. The memory latency results are still the same as we tested before:
For the 1950X in the two modes, the results are essentially equal until we hit 8MB, which is the L3 cache limit per CCX. After this, the core bounces out to main memory, where the Game mode sits around 79ns when it probes near memory while the Creator mode is at 108 ns average. By comparison the Ryzen 5 1600X seems to have a lower latency at 8MB (20ns vs 41 ns), and then sits between the Creator and Game modes at 87 ns. It would appear that the bigger downside of Creator mode in this way is the fact that main memory accesses are much slower than normal Ryzen or in Game mode.
If we crank up the DRAM frequency to DDR4-3200 for the Threadripper 1950X, the numbers change a fair bit:
Up until the 8MB boundary where L3 hits main memory, everything is pretty much equal. At 8MB however, the latency at DDR4-2400 is 41ns compared to 18ns at DDR4-3200. Then out into full main memory sees a pattern: Creator mode at DDR4-3200 is close to Game Mode at DDR4-2400 (87ns vs 79ns), but taking Game mode to DDR4-3200 drops the latency down to 65ns.
Testing, Testing, One Two One Two
In our last review, we put the CPU in NUMA mode and disabled SMT. Both of the active dies were still active, although each thread had full CPU resources, and each set of CPUs would communicate to the nearest memory, however there would be potential die-to-die communication and more potential for far-memory access.
In this new testing, we use Ryzen Master to Game Mode, which enables NUMA and disables one of the silicon dies giving 8 cores and 16 threads.
Related Reading
- The AMD Ryzen Threadripper 1950X and 1920X: CPUs on Steroids
- The AMD Zen and Ryzen 7 Review: A Deep Dive on 1800X, 1700X and 1700
- The AMD Ryzen 5 1600X vs Core i5 Review: Twelve Threads vs Four at $250
- The AMD Ryzen 3 1300X and Ryzen 3 1200 CPU Review: Zen on a Budget
Test Bed
As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer.
Test Setup | ||
Processor | AMD Ryzen Threadripper 1950X (16C/32T, 3.4G, 180W) | $999 |
Motherboards | ASUS X399 ROG Zenith Extreme | $549 |
Cooling | AMD's FX-9590 Bundled Liquid Cooler (220W) | ~$80 |
Power Supply | Corsair AX860i | $198 |
Memory | G.Skill Trident Z RGB DDR4-3200 C14 4x8GB | $440 |
Settings | DDR4-2400 C15 (2DPC Support) DDR4-3200 C14 (Overclock) |
|
Video Cards | MSI GTX 1080 Gaming X 8GB ASUS GTX 1060 Strix 6GB Sapphire Nitro R9 Fury 4GB Sapphire Nitro RX 480 8GB Sapphire Nitro RX 460 4GB (CPU Tests) ASUS GTX 950 2GB 75W (SYSmark) |
$599 $349 $628 $399 $163 $??? |
Hard Drive | Crucial MX200 1TB Crucial MX300 1TB (SYSmark) |
$310 $289 |
Optical Drive | LG GH22NS50 | |
Case | Open Test Bed | |
OS | Windows 10 Pro 64-bit | $126 |
Where possible, we will extend out testing to include faster memory modules either at the same time as the review or a later date.
All Ryzen 7 1800X numbers in this review were redone on AGESA 1006 for an upcoming article.
Many thanks to...
We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.
Thank you to Sapphire for providing us with several of their AMD GPUs. We met with Sapphire back at Computex 2016 and discussed a platform for our future testing on AMD GPUs with their hardware for several upcoming projects. As a result, they were able to sample us the latest silicon that AMD has to offer. At the top of the list was a pair of Sapphire Nitro R9 Fury 4GB GPUs, based on the first generation of HBM technology and AMD’s Fiji platform. As the first consumer GPU to use HDM, the R9 Fury is a key moment in graphics history, and this Nitro cards come with 3584 SPs running at 1050 MHz on the GPU with 4GB of 4096-bit HBM memory at 1000 MHz.
Further Reading: AnandTech’s Sapphire Nitro R9 Fury Review
Following the Fury, Sapphire also supplied a pair of their latest Nitro RX 480 8GB cards to represent AMD’s current performance silicon on 14nm (as of March 2017). The move to 14nm yielded significant power consumption improvements for AMD, which combined with the latest version of GCN helped bring the target of a VR-ready graphics card as close to $200 as possible. The Sapphire Nitro RX 480 8GB OC graphics card is designed to be a premium member of the RX 480 family, having a full set of 8GB of GDDR5 memory at 6 Gbps with 2304 SPs at 1208/1342 MHz engine clocks.
Further Reading: AnandTech’s AMD RX 480 Review
With the R9 Fury and RX 480 assigned to our gaming tests, Sapphire also passed on a pair of RX 460s to be used as our CPU testing cards. The amount of GPU power available can have a direct effect on CPU performance, especially if the CPU has to spend all its time dealing with the GPU display. The RX 460 is a nice card to have here, as it is powerful yet low on power consumption and does not require any additional power connectors. The Sapphire Nitro RX 460 2GB still follows on from the Nitro philosophy, and in this case is designed to provide power at a low price point. Its 896 SPs run at 1090/1216 MHz frequencies, and it is paired with 2GB of GDDR5 at an effective 7000 MHz.
We must also say thank you to MSI for providing us with their GTX 1080 Gaming X 8GB GPUs. Despite the size of AnandTech, securing high-end graphics cards for CPU gaming tests is rather difficult. MSI stepped up to the plate in good fashion and high spirits with a pair of their high-end graphics. The MSI GTX 1080 Gaming X 8GB graphics card is their premium air cooled product, sitting below the water cooled Seahawk but above the Aero and Armor versions. The card is large with twin Torx fans, a custom PCB design, Zero-Frozr technology, enhanced PWM and a big backplate to assist with cooling. The card uses a GP104-400 silicon die from a 16nm TSMC process, contains 2560 CUDA cores, and can run up to 1847 MHz in OC mode (or 1607-1733 MHz in Silent mode). The memory interface is 8GB of GDDR5X, running at 10010 MHz. For a good amount of time, the GTX 1080 was the card at the king of the hill.
Further Reading: AnandTech’s NVIDIA GTX 1080 Founders Edition Review
Thank you to ASUS for providing us with their GTX 1060 6GB Strix GPU. To complete the high/low cases for both AMD and NVIDIA GPUs, we looked towards the GTX 1060 6GB cards to balance price and performance while giving a hefty crack at >1080p gaming in a single graphics card. ASUS lended a hand here, supplying a Strix variant of the GTX 1060. This card is even longer than our GTX 1080, with three fans and LEDs crammed under the hood. STRIX is now ASUS’ lower cost gaming brand behind ROG, and the Strix 1060 sits at nearly half a 1080, with 1280 CUDA cores but running at 1506 MHz base frequency up to 1746 MHz in OC mode. The 6 GB of GDDR5 runs at a healthy 8008 MHz across a 192-bit memory interface.
Further Reading: AnandTech’s ASUS GTX 1060 6GB STRIX Review
Thank you to Crucial for providing us with MX200 SSDs. Crucial stepped up to the plate as our benchmark list grows larger with newer benchmarks and titles, and the 1TB MX200 units are strong performers. Based on Marvell's 88SS9189 controller and using Micron's 16nm 128Gbit MLC flash, these are 7mm high, 2.5-inch drives rated for 100K random read IOPs and 555/500 MB/s sequential read and write speeds. The 1TB models we are using here support TCG Opal 2.0 and IEEE-1667 (eDrive) encryption and have a 320TB rated endurance with a three-year warranty.
Further Reading: AnandTech's Crucial MX200 (250 GB, 500 GB & 1TB) Review
Thank you to Corsair for providing us with an AX1200i PSU. The AX1200i was the first power supply to offer digital control and management via Corsair's Link system, but under the hood it commands a 1200W rating at 50C with 80 PLUS Platinum certification. This allows for a minimum 89-92% efficiency at 115V and 90-94% at 230V. The AX1200i is completely modular, running the larger 200mm design, with a dual ball bearing 140mm fan to assist high-performance use. The AX1200i is designed to be a workhorse, with up to 8 PCIe connectors for suitable four-way GPU setups. The AX1200i also comes with a Zero RPM mode for the fan, which due to the design allows the fan to be switched off when the power supply is under 30% load.
Further Reading: AnandTech's Corsair AX1500i Power Supply Review
Thank you to G.Skill for providing us with memory. G.Skill has been a long-time supporter of AnandTech over the years, for testing beyond our CPU and motherboard memory reviews. We've reported on their high capacity and high-frequency kits, and every year at Computex G.Skill holds a world overclocking tournament with liquid nitrogen right on the show floor.
Further Reading: AnandTech's Memory Scaling on Haswell Review, with G.Skill DDR3-3000
The 2017 Benchmark Suite
For our review, we are implementing our fresh CPU testing benchmark suite, using new scripts developed specifically for this testing. This means that with a fresh OS install, we can configure the OS to be more consistent, install the new benchmarks, maintain version consistency without random updates and start running the tests in under 5 minutes. After that it's a one button press to start an 8-10hr test (with a high-performance core) with nearly 100 relevant data points in the benchmarks given below for CPUs, followed by our CPU gaming tests which run for 4-5 hours for each of the GPUs used. The CPU tests cover a wide range of segments, some of which will be familiar but some of the tests are new to benchmarking in general, but still highly relevant for the markets they come from.
Our new CPU tests go through six main areas. We cover the Web (we've got an un-updateable version of Chrome 56), general system tests (opening tricky PDFs, emulation, brain simulation, AI, 2D image to 3D model conversion), rendering (ray tracing, modeling), encoding (compression, AES, h264 and HEVC), office based tests (PCMark and others), and our legacy tests, throwbacks from another generation of bad code but interesting to compare.
All of our benchmark results can also be found in our benchmark engine, Bench.
A side note on OS preparation. As we're using Windows 10, there's a large opportunity for something to come in and disrupt our testing. So our default strategy is multiple: disable the ability to update as much as possible, disable Windows Defender, uninstall OneDrive, disable Cortana as much as possible, implement the high performance mode in the power options, and disable the internal platform clock which can drift away from being accurate if the base frequency drifts (and thus the timing ends up inaccurate).
Web Tests on Chrome 56
Sunspider 1.0.2
Mozilla Kraken 1.1
Google Octane 2.0
WebXPRT15
System Tests
PDF Opening
FCAT
3DPM v2.1
Dolphin v5.0
DigiCortex v1.20
Agisoft PhotoScan v1.0
Rendering Tests
Corona 1.3
Blender 2.78
LuxMark v3.1 CPU C++
LuxMark v3.1 CPU OpenCL
POV-Ray 3.7.1b4
Cinebench R15 ST
Cinebench R15 MT
Encoding Tests
7-Zip 9.2
WinRAR 5.40
AES Encoding (TrueCrypt 7.2)
HandBrake v1.0.2 x264 LQ
HandBrake v1.0.2 x264-HQ
HandBrake v1.0.2 HEVC-4K
Office / Professional
PCMark8
Chromium Compile (v56)
Legacy Tests
3DPM v1 ST / MT
x264 HD 3 Pass 1, Pass 2
Cinebench R11.5 ST / MT
Cinebench R10 ST / MT
CPU Gaming Tests
For our new set of GPU tests, we wanted to think big. There are a lot of users in the ecosystem that prioritize gaming above all else, especially when it comes to choosing the correct CPU. If there's a chance to save $50 and get a better graphics card for no loss in performance, then this is the route that gamers would prefer to tread. The angle here though is tough - lots of games have different requirements and cause different stresses on a system, with various graphics cards having different reactions to the code flow of a game. Then users also have different resolutions and different perceptions of what feels 'normal'. This all amounts to more degrees of freedom than we could hope to test in a lifetime, only for the data to become irrelevant in a few months when a new game or new GPU comes into the mix. Just for good measure, let us add in DirectX 12 titles that make it easier to use more CPU cores in a game to enhance fidelity.
Our original list of nine games planned in February quickly became six, due to the lack of professional-grade controls on Ubisoft titles. If you want to see For Honor, Steep or Ghost Recon: Wildlands benchmarked on AnandTech, please point Ubisoft Annecy or Ubisoft Montreal in my direction. While these games have in-game benchmarks worth using, unfortunately they do not provide enough frame-by-frame detail to the end user, despite using it internally to produce the data the user eventually sees (and it typically ends up obfuscated by another layer as well). I would instead perhaps choose to automate these benchmarks via inputs, however the extremely variable loading time is a strong barrier to this.
So we have the following benchmarks as part of our 4/2 script, automated to the point of a one-button run and out pops the results four hours later, per GPU. Also listed are the resolutions and settings used.
- Civilization 6 (1080p Ultra, 4K Ultra)
- Ashes of the Singularity: Escalation* (1080p Extreme, 4K Extreme)
- Shadow of Mordor (1080p Ultra, 4K Ultra)
- Rise of the Tomb Raider #1 - GeoValley (1080p High, 4K Medium)
- Rise of the Tomb Raider #2 - Prophets (1080p High, 4K Medium)
- Rise of the Tomb Raider #3 - Mountain (1080p High, 4K Medium)
- Rocket League (1080p Ultra, 4K Ultra)
- Grand Theft Auto V (1080p Very High, 4K High)
For each of the GPUs in our testing, these games (at each resolution/setting combination) are run four times each, with outliers discarded. Average frame rates, 99th percentiles and 'Time Under x FPS' data is sorted, and the raw data is archived.
The four GPUs we've managed to obtain for these tests are:
- MSI GTX 1080 Gaming X 8G
- ASUS GTX 1060 Strix 6G
- Sapphire Nitro R9 Fury 4GB
- Sapphire Nitro RX 480 8GB
In our testing script, we save a couple of special things for the GTX 1080 here. The following tests are also added:
- Civilization 6 (8K Ultra, 16K Lowest)
This benchmark, with a little coercion, are able to be run beyond the specifications of the monitor being used, allowing for 'future' testing of GPUs at 8K and 16K with some amusing results. We are only running these tests on the GTX 1080, because there's no point watching a slideshow more than once.
CPU System Tests
Our first set of tests is our general system tests. These set of tests are meant to emulate more about what people usually do on a system, like opening large files or processing small stacks of data. This is a bit different to our office testing, which uses more industry standard benchmarks, and a few of the benchmarks here are relatively new and different.
All of our benchmark results can also be found in our benchmark engine, Bench.
PDF Opening
First up is a self-penned test using a monstrous PDF we once received in advance of attending an event. While the PDF was only a single page, it had so many high-quality layers embedded it was taking north of 15 seconds to open and to gain control on the mid-range notebook I was using at the time. This put it as a great candidate for our 'let's open an obnoxious PDF' test. Here we use Adobe Reader DC, and disable all the update functionality within. The benchmark sets the screen to 1080p, opens the PDF to in fit-to-screen mode, and measures the time from sending the command to open the PDF until it is fully displayed and the user can take control of the software again. The test is repeated ten times, and the average time taken. Results are in milliseconds.
There's not much between the Threadripper CPUs here, but frequency wins the day.
FCAT Processing: link
One of the more interesting workloads that has crossed our desks in recent quarters is FCAT - the tool we use to measure and visually analyze stuttering in gaming due to dropped or runt frames. The FCAT process requires enabling a color-based overlay onto a game, recording the gameplay, and then parsing the video file through the analysis software. The software is mostly single-threaded, however because the video is basically in a raw format, the file size is large and requires moving a lot of data around. For our test, we take a 90-second clip of the Rise of the Tomb Raider benchmark running on a GTX 980 Ti at 1440p, which comes in around 21 GB, and measure the time it takes to process through the visual analysis tool.
Similar to PDF opening, single threaded performance wins out.
Dolphin Benchmark: link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.
Dolphin likes single thread performance as well, although interpreting this graph is giving me somewhat of a headache. Game Mode seems to give a small improvement here.
3D Movement Algorithm Test v2.1: link
This is the latest version of the self-penned 3DPM benchmark. The goal of 3DPM is to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in. It affords a ~25% speed-up over v2.0, which means new data.
Our first pure multithreaded test, and the 1950X wins with 32 threads. The 1920X beats the 1950X in Game mode, due to 24 threads beating 16 cores. The 1800X edges out the 1950X-GM due to frequency.
DigiCortex v1.20: link
Despite being a couple of years old, the DigiCortex software is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation. The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.
Unfortunately we had issues with the 1920X posting a result.
DigiCortex requires a mash of CPU frequency and DRAM performance to get a good result, although the 1950X in any mode regresses the result, even in Game Mode, suggesting it is more sensitive to overall DRAM latency.
Agisoft Photoscan 1.0: link
Photoscan stays in our benchmark suite from the previous version, however now we are running on Windows 10 so features such as Speed Shift on the latest processors come into play. The concept of Photoscan is translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the model. The algorithm has four stages, some single threaded and some multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.
The variable threaded nature of Agisoft shows that in our workflow, it's a mix of cores, IPC and frequency required to win. The quad-channel memory and lower crosstalk of the 1950X in Game Mode seems to get a marginal improvement over the 1950X.
CPU Rendering Tests
Rendering tests are a long-time favorite of reviewers and benchmarkers, as the code used by rendering packages is usually highly optimized to squeeze every little bit of performance out. Sometimes rendering programs end up being heavily memory dependent as well - when you have that many threads flying about with a ton of data, having low latency memory can be key to everything. Here we take a few of the usual rendering packages under Windows 10, as well as a few new interesting benchmarks.
All of our benchmark results can also be found in our benchmark engine, Bench.
Corona 1.3: link
Corona is a standalone package designed to assist software like 3ds Max and Maya with photorealism via ray tracing. It's simple - shoot rays, get pixels. OK, it's more complicated than that, but the benchmark renders a fixed scene six times and offers results in terms of time and rays per second. The official benchmark tables list user submitted results in terms of time, however I feel rays per second is a better metric (in general, scores where higher is better seem to be easier to explain anyway). Corona likes to pile on the threads, so the results end up being very staggered based on thread count.
Corona loves threads. Game Mode goes behind the 1800X due to frequency.
Blender 2.78: link
For a render that has been around for what seems like ages, Blender is still a highly popular tool. We managed to wrap up a standard workload into the February 5 nightly build of Blender and measure the time it takes to render the first frame of the scene. Being one of the bigger open source tools out there, it means both AMD and Intel work actively to help improve the codebase, for better or for worse on their own/each other's microarchitecture.
Blender loves threads.
LuxMark v3.1: Link
As a synthetic, LuxMark might come across as somewhat arbitrary as a renderer, given that it's mainly used to test GPUs, but it does offer both an OpenCL and a standard C++ mode. In this instance, aside from seeing the comparison in each coding mode for cores and IPC, we also get to see the difference in performance moving from a C++ based code-stack to an OpenCL one with a CPU as the main host.
Like Blender, LuxMark is all about the thread count. Ray tracing is very nearly a textbook case for easy multi-threaded scaling, although a couple of things pop up in the OpenCL version. Aside from the scores being lower, the jump from 1920X to 1950X isn't that great, and the quad-channel DRAM of the 1950X in Game Mode puts it over the 1800X.
POV-Ray 3.7.1b4: link
Another regular benchmark in most suites, POV-Ray is another ray-tracer but has been around for many years. It just so happens that during the run up to AMD's Ryzen launch, the code base started to get active again with developers making changes to the code and pushing out updates. Our version and benchmarking started just before that was happening, but given time we will see where the POV-Ray code ends up and adjust in due course.
POV-Ray loves threads.
Cinebench R15: link
The latest version of CineBench has also become one of those 'used everywhere' benchmarks, particularly as an indicator of single thread performance. High IPC and high frequency gives performance in ST, whereas having good scaling and many cores is where the MT test wins out.
Multithreaded results are as expected, and single thread seems to benefit a bit from more DRAM channels, although 200 MHz is enough to put the 1800X over the 1950X in Game Mode.
CPU Web Tests
One of the issues when running web-based tests is the nature of modern browsers to automatically install updates. This means any sustained period of benchmarking will invariably fall foul of the 'it's updated beyond the state of comparison' rule, especially when browsers will update if you give them half a second to think about it. Despite this, we were able to find a series of commands to create an un-updatable version of Chrome 56 for our 2017 test suite. While this means we might not be on the bleeding edge of the latest browser, it makes the scores between CPUs comparable.
All of our benchmark results can also be found in our benchmark engine, Bench.
SunSpider 1.0.2: link
The oldest web-based benchmark in this portion of our test is SunSpider. This is a very basic javascript algorithm tool, and ends up being more a measure of IPC and latency than anything else, with most high-performance CPUs scoring around about the same. The basic test is looped 10 times and the average taken. We run the basic test 4 times.
Mozilla Kraken 1.1: link
Kraken is another Javascript based benchmark, using the same test harness as SunSpider, but focusing on more stringent real-world use cases and libraries, such as audio processing and image filters. Again, the basic test is looped ten times, and we run the basic test four times.
Google Octane 2.0: link
Along with Mozilla, as Google is a major browser developer, having peak JS performance is typically a critical asset when comparing against the other OS developers. In the same way that SunSpider is a very early JS benchmark, and Kraken is a bit newer, Octane aims to be more relevant to real workloads, especially in power constrained devices such as smartphones and tablets.
WebXPRT 2015: link
While the previous three benchmarks do calculations in the background and represent a score, WebXPRT is designed to be a better interpretation of visual workloads that a professional user might have, such as browser based applications, graphing, image editing, sort/analysis, scientific analysis and financial tools.
Overall, all of our web benchmarks show a similar trend. Very few web frameworks offer multi-threading – the browsers themselves are barely multi-threaded at times – so Threadripper's vast thread count is underutilized. What wins the day on the web are a handful of fast cores with high single-threaded performance, and it becomes a balance between cores and cross-core communication.
CPU Encoding Tests
One of the interesting elements on modern processors is encoding performance. This includes encryption/decryption, as well as video transcoding from one video format to another. In the encrypt/decrypt scenario, this remains pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security. Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.
All of our benchmark results can also be found in our benchmark engine, Bench.
7-Zip 9.2: link
One of the freeware compression tools that offers good scaling performance between processors is 7-Zip. It runs under an open-source licence, is fast, and easy to use tool for power users. We run the benchmark mode via the command line for four loops and take the output score.
At the request of a few users, we've gone back through our saved benchmark data and pulled out compression/decompression numbers for 7-zip. AMD clearly makes a win here in decompression by a long way with all the threads, and the 1800X beats the 1950X in Game Mode due to frequency.
WinRAR 5.40: link
For the 2017 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack (33 video files in 1.37 GB, 2834 smaller website files in 370 folders in 150 MB) of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test 10 times and take the average of the last five runs when the benchmark is in a steady state.
WinRAR encoding is another test that doesn't scale up especially well with thread counts. After only a few threads, most of its MT performance gains have been achieved. The balance here is with memory and frequency, to which the 1800X wins. The 1800X takes a sizeable gain over the 1950X in Game Mode too, likely due to far memory latency.
AES Encoding
Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.
HandBrake v1.0.2 H264 and HEVC: link
As mentioned above, video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codec, VP9, there are two others that are taking hold: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content.
Handbrake is a favored tool for transcoding, and so our test regime takes care of three areas.
Low Quality/Resolution H264: Here we transcode a 640x266 H264 rip of a 2 hour film, and change the encoding from Main profile to High profile, using the very-fast preset.
High Quality/Resolution H264: A similar test, but this time we take a ten-minute double 4K (3840x4320) file running at 60 Hz and transcode from Main to High, using the very-fast preset.
HEVC Test: Using the same video in HQ, we change the resolution and codec of the original video from 4K60 in H264 into 4K60 HEVC.
CPU Office Tests
The office programs we use for benchmarking aren't specific programs per-se, but industry standard tests that hold weight with professionals. The goal of these tests is to use an array of software and techniques that a typical office user might encounter, such as video conferencing, document editing, architectural modeling, and so on and so forth.
All of our benchmark results can also be found in our benchmark engine, Bench.
Chromium Compile (v56)
Our new compilation test uses Windows 10 Pro, VS Community 2015.3 with the Win10 SDK to compile a nightly build of Chromium. We've fixed the test for a build in late March 2017, and we run a fresh full compile in our test. Compilation is the typical example given of a variable threaded workload - some of the compile and linking is linear, whereas other parts are multithreaded.
For our test, we compile a version of v56 under MSVC and report the time in 'Compiles per Day', a more scalable metric to represent over time. Other publications might perfom this test differently (Ars Technica uses a clang-cl compiler with VC++ linking, for example).
One of the interesting data points in our test is the Compile. Because this test requires a lot of cross-core communication and DRAM, we get an interesting metric where the 1950X still comes out on top due to the core counts, but because the 1920X has fewer cores per CCX, it actually falls behind the 1950X in Game Mode and the 1800X despite having more cores.
PCMark8: link
Despite originally coming out in 2008/2009, Futuremark has maintained PCMark8 to remain relevant in 2017. On the scale of complicated tasks, PCMark focuses more on the low-to-mid range of professional workloads, making it a good indicator for what people consider 'office' work. We run the benchmark from the commandline in 'conventional' mode, meaning C++ over OpenCL, to remove the graphics card from the equation and focus purely on the CPU. PCMark8 offers Home, Work and Creative workloads, with some software tests shared and others unique to each benchmark set.
Strangely, PCMark 8's Creative test seems to be failing across the board. We're trying to narrow down the issue.
CPU Legacy Tests
Our legacy tests represent benchmarks that were once at the height of their time. Some of these are industry standard synthetics, and we have data going back over 10 years. All of the data here has been rerun on Windows 10, and we plan to go back several generations of components to see how performance has evolved.
All of our benchmark results can also be found in our benchmark engine, Bench.
3D Particle Movement v1
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. This is the original version, written in the style of a typical non-computer science student coding up an algorithm for their theoretical problem, and comes without any non-obvious optimizations not already performed by the compiler, such as false sharing.
CineBench 11.5 and 10
Cinebench is a widely known benchmarking tool for measuring performance relative to MAXON's animation software Cinema 4D. Cinebench has been optimized over a decade and focuses on purely CPU horsepower, meaning if there is a discrepancy in pure throughput characteristics, Cinebench is likely to show that discrepancy. Arguably other software doesn't make use of all the tools available, so the real world relevance might purely be academic, but given our large database of data for Cinebench it seems difficult to ignore a small five minute test. We run the modern version 15 in this test, as well as the older 11.5 and 10 due to our back data.
x264 HD 3.0
Similarly, the x264 HD 3.0 package we use here is also kept for historic regressional data. The latest version is 5.0.1, and encodes a 1080p video clip into a high-quality x264 file. Version 3.0 only performs the same test on a 720p file, and in most circumstances the software performance hits its limit on high-end processors, but still works well for mainstream and low-end. Also, this version only takes a few minutes, whereas the latest can take over 90 minutes to run.
The 1950X: the first CPU to score higher on the 2nd pass of this test than it does on the first pass.
Civilization 6
First up in our CPU gaming tests is Civilization 6. Originally penned by Sid Meier and his team, the Civ series of turn-based strategy games are a cult classic, and many an excuse for an all-nighter trying to get Gandhi to declare war on you due to an integer overflow. Truth be told I never actually played the first version, but every edition from the second to the sixth, including the fourth as voiced by the late Leonard Nimoy, it a game that is easy to pick up, but hard to master.
Benchmarking Civilization has always been somewhat of an oxymoron – for a turn based strategy game, the frame rate is not necessarily the important thing here and even in the right mood, something as low as 5 frames per second can be enough. With Civilization 6 however, Firaxis went hardcore on visual fidelity, trying to pull you into the game. As a result, Civilization can taxing on graphics and CPUs as we crank up the details, especially in DirectX 12.
Perhaps a more poignant benchmark would be during the late game, when in the older versions of Civilization it could take 20 minutes to cycle around the AI players before the human regained control. The new version of Civilization has an integrated ‘AI Benchmark’, although it is not currently part of our benchmark portfolio yet, due to technical reasons which we are trying to solve. Instead, we run the graphics test, which provides an example of a mid-game setup at our settings.
At both 1920x1080 and 4K resolutions, we run the same settings. Civilization 6 has sliders for MSAA, Performance Impact and Memory Impact. The latter two refer to detail and texture size respectively, and are rated between 0 (lowest) to 5 (extreme). We run our Civ6 benchmark in position four for performance (ultra) and 0 on memory, with MSAA set to 2x.
For reviews where we include 8K and 16K benchmarks (Civ6 allows us to benchmark extreme resolutions on any monitor) on our GTX 1080, we run the 8K tests similar to the 4K tests, but the 16K tests are set to the lowest option for Performance.
All of our benchmark results can also be found in our benchmark engine, Bench.
We tested both of the 1950X and the 1920X at their default settings, as well as the 1950X in Game Mode as shown by 1950X-G.
MSI GTX 1080 Gaming 8G Performance
1080p
4K
8K
16K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
Ashes of the Singularity Escalation
Seen as the holy child of DirectX12, Ashes of the Singularity (AoTS, or just Ashes) has been the first title to actively go explore as many of DirectX12s features as it possibly can. Stardock, the developer behind the Nitrous engine which powers the game, has ensured that the real-time strategy title takes advantage of multiple cores and multiple graphics cards, in as many configurations as possible.
As a real-time strategy title, Ashes is all about responsiveness during both wide open shots but also concentrated battles. With DirectX12 at the helm, the ability to implement more draw calls per second allows the engine to work with substantial unit depth and effects that other RTS titles had to rely on combined draw calls to achieve, making some combined unit structures ultimately very rigid.
Stardock clearly understand the importance of an in-game benchmark, ensuring that such a tool was available and capable from day one, especially with all the additional DX12 features used and being able to characterize how they affected the title for the developer was important. The in-game benchmark performs a four minute fixed seed battle environment with a variety of shots, and outputs a vast amount of data to analyze.
For our benchmark, we run a fixed v2.11 version of the game due to some peculiarities of the splash screen added after the merger with the standalone Escalation expansion, and have an automated tool to call the benchmark on the command line. (Prior to v2.11, the benchmark also supported 8K/16K testing, however v2.11 has odd behavior which nukes this.)
At both 1920x1080 and 4K resolutions, we run the same settings. Ashes has dropdown options for MSAA, Light Quality, Object Quality, Shading Samples, Shadow Quality, Textures, and separate options for the terrain. There are several presents, from Very Low to Extreme: we run our benchmarks at Extreme settings, and take the frame-time output for our average, percentile, and time under analysis.
All of our benchmark results can also be found in our benchmark engine, Bench.
MSI GTX 1080 Gaming 8G Performance
1080p
4K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
Shadow of Mordor
The next title in our testing is a battle of system performance with the open world action-adventure title, Middle Earth: Shadow of Mordor (SoM for short). Produced by Monolith and using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.
A 2014 game is fairly old to be testing now, however SoM has a stable code and player base, and can still stress a PC down to the ones and zeroes. At the time, SoM was unique, offering a dynamic screen resolution setting allowing users to render at high resolutions that are then scaled down to the monitor. This form of natural oversampling was designed to let the user experience a truer vision of what the developers wanted, assuming you had the graphics hardware to power it but had a sub-4K monitor.
The title has an in-game benchmark, for which we run with an automated script implement the graphics settings, select the benchmark, and parse the frame-time output which is dumped on the drive. The graphics settings include standard options such as Graphical Quality, Lighting, Mesh, Motion Blur, Shadow Quality, Textures, Vegetation Range, Depth of Field, Transparency and Tessellation. There are standard presets as well.
We run the benchmark at 1080p and a native 4K, using our 4K monitors, at the Ultra preset. Results are averaged across four runs and we report the average frame rate, 99th percentile frame rate, and time under analysis.
All of our benchmark results can also be found in our benchmark engine, Bench.
MSI GTX 1080 Gaming 8G Performance
1080p
4K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
Rise of the Tomb Raider (1080p, 4K)
One of the newest games in the gaming benchmark suite is Rise of the Tomb Raider (RoTR), developed by Crystal Dynamics, and the sequel to the popular Tomb Raider which was loved for its automated benchmark mode. But don’t let that fool you: the benchmark mode in RoTR is very much different this time around.
Visually, the previous Tomb Raider pushed realism to the limits with features such as TressFX, and the new RoTR goes one stage further when it comes to graphics fidelity. This leads to an interesting set of requirements in hardware: some sections of the game are typically GPU limited, whereas others with a lot of long-range physics can be CPU limited, depending on how the driver can translate the DirectX 12 workload.
Where the old game had one benchmark scene, the new game has three different scenes with different requirements: Spine of the Mountain (1-Valley), Prophet’s Tomb (2-Prophet) and Geothermal Valley (3-Mountain) - and we test all three (and yes, I need to relabel them - I got them wrong when I set up the tests). These are three scenes designed to be taken from the game, but it has been noted that scenes like 2-Prophet shown in the benchmark can be the most CPU limited elements of that entire level, and the scene shown is only a small portion of that level. Because of this, we report the results for each scene on each graphics card separately.
Graphics options for RoTR are similar to other games in this type, offering some presets or allowing the user to configure texture quality, anisotropic filter levels, shadow quality, soft shadows, occlusion, depth of field, tessellation, reflections, foliage, bloom, and features like PureHair which updates on TressFX in the previous game.
Again, we test at 1920x1080 and 4K using our native 4K displays. At 1080p we run the High preset, while at 4K we use the Medium preset which still takes a sizable hit in frame rate.
It is worth noting that RoTR is a little different to our other benchmarks in that it keeps its graphics settings in the registry rather than a standard ini file, and unlike the previous TR game the benchmark cannot be called from the command-line. Nonetheless we scripted around these issues to automate the benchmark four times and parse the results. From the frame time data, we report the averages, 99th percentiles, and our time under analysis.
All of our benchmark results can also be found in our benchmark engine, Bench.
#1 Geothermal Valley Spine of the Mountain
MSI GTX 1080 Gaming 8G Performance
1080p
4K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
#2 Prophet’s Tomb
MSI GTX 1080 Gaming 8G Performance
1080p
4K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
#3 Spine of the Mountain Geothermal Valley
MSI GTX 1080 Gaming 8G Performance
1080p
4K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
Rocket League
Hilariously simple pick-up-and-play games are great fun. I'm a massive fan of the Katamari franchise for that reason — passing start on a controller and rolling around, picking up things to get bigger, is extremely simple. Until we get a PC version of Katamari that I can benchmark, we'll focus on Rocket League.
Rocket League combines the elements of pick-up-and-play, allowing users to jump into a game with other people (or bots) to play football with cars with zero rules. The title is built on Unreal Engine 3, which is somewhat old at this point, but it allows users to run the game on super-low-end systems while still taxing the big ones. Since the release in 2015, it has sold over 5 million copies and seems to be a fixture at LANs and game shows. Users who train get very serious, playing in teams and leagues with very few settings to configure, and everyone is on the same level. Rocket League is quickly becoming one of the favored titles for e-sports tournaments, especially when e-sports contests can be viewed directly from the game interface.
Based on these factors, plus the fact that it is an extremely fun title to load and play, we set out to find the best way to benchmark it. Unfortunately for the most part automatic benchmark modes for games are few and far between. Partly because of this, but also on the basis that it is built on the Unreal 3 engine, Rocket League does not have a benchmark mode. In this case, we have to develop a consistent run and record the frame rate.
Read our initial analysis on our Rocket League benchmark on low-end graphics here.
With Rocket League, there is no benchmark mode, so we have to perform a series of automated actions, similar to a racing game having a fixed number of laps. We take the following approach: Using Fraps to record the time taken to show each frame (and the overall frame rates), we use an automation tool to set up a consistent 4v4 bot match on easy, with the system applying a series of inputs throughout the run, such as switching camera angles and driving around.
It turns out that this method is nicely indicative of a real bot match, driving up walls, boosting and even putting in the odd assist, save and/or goal, as weird as that sounds for an automated set of commands. To maintain consistency, the commands we apply are not random but time-fixed, and we also keep the map the same (Aquadome, known to be a tough map for GPUs due to water/transparency) and the car customization constant. We start recording just after a match starts, and record for 4 minutes of game time (think 5 laps of a DIRT: Rally benchmark), with average frame rates, 99th percentile and frame times all provided.
The graphics settings for Rocket League come in four broad, generic settings: Low, Medium, High and High FXAA. There are advanced settings in place for shadows and details; however, for these tests, we keep to the generic settings. For both 1920x1080 and 4K resolutions, we test at the High preset with an unlimited frame cap.
All of our benchmark results can also be found in our benchmark engine, Bench.
MSI GTX 1080 Gaming 8G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
With Ryzen, we encounted some odd performance issues when using NVIDIA-based video cards that caused those cards to significantly underperform. However equally strangely, the issues we have with Ryzen on Rocket League with NVIDIA GPUs seem to almost vanish when using Threadripper. Again, still no easy wins here as Intel seems to take Rocket League in its stride, but Game mode still helps the 1950X. The Time Under graphs give some cause for concern, with the 1950X consistently being at the bottom of that graph.
Grand Theft Auto
The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
For our test we have scripted a version of the in-game benchmark. The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data.
There are no presets for the graphics options on GTA, allowing the user to adjust options such as population density and distance scaling on sliders, but others such as texture/shadow/shader/water quality from Low to Very High. Other options include MSAA, soft shadows, post effects, shadow resolution and extended draw distance options. There is a handy option at the top which shows how much video memory the options are expected to consume, with obvious repercussions if a user requests more video memory than is present on the card (although there’s no obvious indication if you have a low-end GPU with lots of GPU memory, like an R7 240 4GB).
To that end, we run the benchmark at 1920x1080 using an average of Very High on the settings, and also at 4K using High on most of them. We take the average results of four runs, reporting frame rate averages, 99th percentiles, and our time under analysis.
All of our benchmark results can also be found in our benchmark engine, Bench.
MSI GTX 1080 Gaming 8G Performance
1080p
4K
ASUS GTX 1060 Strix 6G Performance
1080p
4K
Sapphire Nitro R9 Fury 4G Performance
1080p
4K
Sapphire Nitro RX 480 8G Performance
1080p
4K
Analyzing Creator Mode and Game Mode
In this review, we posted every graph with both the Creator Mode results (as default) and the Game Mode results (as 1950X-GM) for the Threadripper 1950X. There were a number of trends worth pointing out.
The first big answer is that in (almost) every multi threaded benchmark that relied on all the threads pushing out data, Game Mode scored considerably less than Creator Mode. In our test suite, I earmarked 19 different tests that are designed to scale with thread count, and the results ranged from +1% (Octane) down to -48% (Corona) and -45% (LuxMark). To summarize, anything that wanted serious throughput, Game Mode was not the right mode to be in. But anyone could have told you that.
The next element is the single threaded tests in the suite. There are 10 of these if we include the four legacy benchmarks, and for the most part these are all within 5% of the Creator Mode results – some are above and some are below, but nothing majorly drastic. Two of the benchmarks, however, did get significant jumps from using Game Mode: Dolphin (+9%) and Agisoft Stage 3 (+38%). Agisoft is probably a hollow victory as overall that test only gains by 1%.
We do run a few variable threaded loads, and the results here really depend on how much of a parallel task it is. As stated before, Agisoft goes up 1%, and perhaps surprisingly our Compile benchmark goes down 14%. One would have thought that the faster memory latency of Game Mode might counteract the lack of threads, especially when the L3 victim cache is of little use, but overall it would seem that our compile test likes the threads instead. WinRAR is a known memory-loving test, so Game Mode picked up a 3% win, and the web benchmarks that are variable threaded such as WebXPRT also picked up a 9% win.
CPU Gaming Tests
Now we turn to our gaming tests. Because we test six different games with four different GPUs at two different resolutions, and in each case take averages and 99th percentiles, I’m going to present this data in a set of different ways. First, the overall gains based on the resolution:
Game Mode Gains Over Creative Mode | ||
1080p | 4K | |
Average | +0.6% | +0.6% |
99th | +14.3% | +8.0% |
The two elements we can draw here are that Game Mode is beneficial mostly for 99th percentiles, but also it affects 1080p gaming over 4K gaming more.
This next table breaks it down by graphics card:
Game Mode Gains Over Creative Mode | ||
NVIDIA GTX 1080 | 1080p | 4K |
Average | -3.1% | 0.0% |
99th | +1.6% | +1.9% |
NVIDIA GTX 1060 | 1080p | 4K |
Average | -0.6% | +0.1% |
99th | +3.1% | +1.9% |
AMD R9 Fury | 1080p | 4K |
Average | +2.5% | +1.5% |
99th | +26.2% | +14.4% |
AMD RX 480 | 1080p | 4K |
Average | +3.6% | +0.6% |
99th | +25.0% | +14.1% |
Again, the data shows that 99th percentiles fare better over averages, although the AMD cards get a better uplift than the NVIDIA cards.
Now let us break it down by game tests.
Game Mode Gains Over Creative Mode | ||
Civilization 6 | 1080p | 4K |
Average | -2.1% | -1.8% |
99th | -5.3% | -3.1% |
Ashes of the Singularity | 1080p | 4K |
Average | -3.2% | -0.1% |
99th | -2.2% | -0.6% |
Shadow of Mordor | 1080p | 4K |
Average | -0.3% | 0.0% |
99th | -4.5% | +0.1% |
Both Civilization 6 and Ashes of The Singularity slight decreases running in Game Mode, with 4K Civilization even regresses 5% in 99th percentile data. Shadow of Mordor has some gains at 4K, mainly with 99th percentile data, but well within the margin of error.
Game Mode Gains Over Creative Mode | ||
RoTR-1 | 1080p | 4K |
Average | -1.3% | +0.1% |
99th | +4.2% | +0.4% |
RoTR-2 | 1080p | 4K |
Average | +2.4% | +1.8% |
99th | +43.7% | +21.9% |
RoTR-3 | 1080p | 4K |
Average | +2.3% | +1.4% |
99th | +17.9% | +11.7% |
Rise of the Tomb Raider has three test stages, and almost all of them benefit from Game Mode. Again, 99th percentiles go up (+43.7% for the Prophets Tomb test), and 1080p gets the better deal over 4K data.
Game Mode Gains Over Creative Mode | ||
Rocket League | 1080p | 4K |
Average | +0.8% | +1.0% |
99th | +9.1% | +2.9% |
Grand Theft Auto V | 1080p | 4K |
Average | +6.9% | +2.2% |
99th | +49.2% | +29.4% |
The last two games are Rocket League and Grand Theft Auto, with Rocket League getting a small bump in 99th percentiles but GTA jumps up double digits. For GTA, those big number spikes at 1080p come from ~100% gains on AMD cards. Similarly at 4K, while NVIDIA cards get nearly no benefit, AMD cards gain 50-73%.
Conclusions on CPU Gaming
Looking at the overall data, the worst loss was a -10% at 4K for Civilization 6, and it's almost a complete mix of positive and negative results across the 256 data points we tested. The takeaway is that on average Game Mode affects certain games really, really well, like RoTR and GTA, but not games like Ashes or Shadow of Mordor. On average that equates to a +8% boost in 99th percentile frame rates at 4K or a +14% boost in 99th percentile frame rates at 1080p, and mostly limited to AMD cards.
If a user wants to use Threadripper to play certain games when using an AMD card, they should be in Game Mode. There are some losses in some titles, but as a catch all situation, the gains for games where it does work are noticable, espeically at lower resolutions.
How Does it Compare to How We Tested on 16C/16T
Interestingly, the results for almost all benchmarks were lower in 8C/16T mode over 16C/16T mode. Despite moving down to a single die worth of cores, it would appear that having the raw cores at the disposal counteracts most of the cross communication losses, especially if each die of cores preferentially communicates with its own DRAM channels where possible.
In the following table,
On the left is AMD's Game Mode vs Creative Mode.
On the right is SMT disabled vs Creative Mode.
Both non-Creative data sets have NUMA enabled.
For example, at 16C/16T we saw a +4% average FPS improvement at 1080p, but now at 8C/16T this is only 0.6%. Before we had a +26.5% gain in 99th percentile numbers at 1080p, but now this is only +14.3%. The individual game numbers are matched similarly - on the right at 1080p at 16C/16T, we get an ~0.1% difference in the results for Game Mode compared to Creator mode, but on the left at 8C/16T we see an average loss of 3% for some of the tests. In the pure CPU benchmarks, at 16C/16T some benchmarks like Dolphin had a +33% increase, but at 8C/16T it is only a +9% increase.
The only upside to running at 8C/16T over 16C/16T would seem to be power consumption. In 8C/16T Game Mode, we saw an all-thread power consumption of 125W. In the non-SMT mode, this was 170W, closer to the default Creative Mode of 177W. One of AMD's reasons for implementing Game Mode like this was due to certain games not accepting the number of threads on offer - in the situations above, both of the new modes tested have 16 threads, at which point disabling SMT would appear to be preferable for performance.
Conclusions
In this mini-test, we compared AMD’s Game Mode as originally envisioned by AMD. Game Mode sits as an extra option in the AMD Ryzen Master software, compared to Creator Mode which is enabled by default. Game Mode does two things: firstly, it adjusts the memory configuration. Rather than seeing the DRAM as one uniform block of memory with an ‘average’ latency, the system splits the memory into near memory closest to the active CPU, and far memory for DRAM connected via the other silicon die. The second thing that Game Mode does is disable the cores on one of the silicon dies, but retains the PCIe lanes, IO, and DRAM support. This disables cross-die thread migration, offers faster memory for applications that need it, and aims to lower the latency of the cores used for gaming by simplifying the layout. The downside of Game Mode is raw performance when peak CPU is needed: by disabling half the cores, any throughput limited task is going to be cut by losing half of the throughput resources. The argument here is that Game mode is designed for games, which rarely use above 8 cores, while optimizing the memory latency and PCIe connectivity.
A simpler way to imagine Game Mode is this: enabling Game Mode brings the top tier Threadripper 1950X down to the level of a Ryzen 7 processor for core count at around the same frequency, but still gets the benefits of quad channel memory and all 60 PCIe lanes for add-in cards. In this mode, the CPU will preferentially use the lower latency memory available first, attempting to ensure a better immediate experience. You end up with an uber-Ryzen 7 for connectivity.
AMD states that a Threadripper in Game Mode will have lower latency than a Ryzen 7, as well as a higher boost and larger boost window (up to 4 cores rather than 2)
In our testing, we did the full gamut of CPU and CPU Gaming tests, at 1080p and 4K with Game Mode enabled.
On the CPU results, they were perhaps to be expected: single threaded tests with Game Mode enabled performed similar to Ryzen 7 and the 1950X, but multithreaded tests were almost halved to the 1950X, and slightly slower than the Ryzen 7 1800X due to the lower all-core turbo.
The CPU gaming tests were instead a mixed bunch. Any performance difference from Game Mode over Creator Mode was highly dependant on the game, on the graphics card, and on the resolution. Overall, the results could be placed into buckets:
- Noted minor losses in Civilization 6, Ashes of the Singularity and Shadow of Mordor
- Minor loss to minor gain on GTX 1080 and GTX 1060 overall in all games
- Minor gain for AMD cards on Average Frame Rates, particularly RoTR and GTA
- Sizeable (10-25%) gain for AMD cards on 99th Percentile Frame Rates, particularly RoTR and GTA
- Gains are more noticable for 1080p gaming than 4K gaming
- Most gains across the board are on 99th Percentile data
Which leads to the following conclusions
- No real benefit on GTX 1080 or GTX 1060, stay in Creator Mode
- Benefits for Rise of the Tomb Raider, Rocket League and GTA
- Benefit more at 1080p, but still gains at 4K
The pros and cons of enabling Game Mode are meant to be along the lines of faster and lower latency gaming, at the expense of raw compute power. The fact that it requires a reboot to switch between Creator Mode and Game Mode is a main detractor - if it were a simple in-OS switch, then it could be enabled for specific titles on specific graphics cards just before the game is launched. This will not ever be possible, due to how PCs decide what resources are available when. That being said however, perhaps AMD has missed a trick here.
Could AMD have Implemented Game Mode Differently?
By virtue of misinterpreting AMD's slide deck, and testing a bunch of data with SMT disabled instead, we have an interesting avenue as to how users might do something akin to Game Mode but not specifically AMD's game mode. This also leads to the question if AMD implemented and labeled the Game Mode environment in the right way.
By enabling NUMA and disabling SMT, the 16C/32T chip moves down to 16C/16T. It still has 16 full cores, but has to deal with communication across the two eight-core silicon dies. Nonetheless it still satisfies the need for cores to access the lowest latency memory near to that specific core, as well as enabling certain games that need fewer total threads to actually work. It should, by the description alone, enable the 'legacy' part of legacy gaming.
The underlying performance analysis between the two modes becomes this:
When in 16C/16T mode, performance in CPU benchmarks was higher than in 8C/16T mode.
When in 16C/16T mode, performance in CPU gaming benchmarks was higher than in 8C/16T mode.
Some of the suggestions from comparing AMD's defined 8C/16T Game Mode for CPU gaming actually change when in 16C/16T mode: games that saw slight regressions with 8 cores became neutral at 16C or even had slight improvements, particularly at 1080p.
One of the main detractors to the 8C/16T mode is that it requires a full restart in order to enable it. Disabling SMT could theoretically be done at the OS level before certain games come in to play. If the OS is able to determine which core IDs are associated to standard threads and which ones would be hyperthreads, it is perhaps possible for the OS just to refuse to dispatch threads in flight to the hyperthreads, allowing only one thread per core. (There's a small matter of statically shared resources to deal with as well.) The mobile world deals with thread migration between fast cores and slow cores every day, and some cores can be hotplug disabled on the fly. One could postulate that Windows could do something similar with the equivalent of hyperthreads.
Would this issue need to be solved by Windows, or by AMD? I suspect a combination of both, really.
Update:
Robert Hallock on AMD's Threadripper webcast has stated that Windows Scheduler is not capable of specifically zeroing out a full die to have the same effect. The UMA/NUMA implementation can be managed with Windows Scheduler to assign threads to where the data is (or assign data to where the threads are), but as far as fully disabling a die in the OS requires a restart.