Original Link: https://www.anandtech.com/show/11857/memory-scaling-on-ryzen-7-with-team-groups-night-hawk-rgb
Memory Scaling on Ryzen 7 with Team Group's Night Hawk RGB
by Ian Cutress & Gavin Bonshor on September 27, 2017 11:05 AM ESTA large number of column inches have been put towards describing and explaining AMD's new underlying scalable interconnect: the Infinity Fabric. A superset of HyperTransport, this interconnect is designed to enable both the CPUs and GPUs from AMD to communicate quickly, at high bandwidth, low latency, and with low power with the ability to scale out to large systems. One of the results of the implementation of Infinity Fabric on the processor side is that it runs at the frequency of the DRAM in the system, with a secondary potential uplift in performance when using faster memory. The debate between enthusiasts, consumers and the general populous in regards to Ryzen's memory performance and has been an ever-raging topic since the AGESA 1.0.0.6 BIOS updates were introduced several weeks ago. We dedicated some time to test the effect of high-performance memory on Ryzen using Team Group's latest Night Hawk RGB memory.
Memory Scaling on Ryzen 7: AMD's Infinity Fabric
Typically overlooked by many when outlining components for a new system, memory can a key role in system operation. For the last ten years, memory performance for consumers has been generally inconseqential on memory speed: we tested this for DDR3 for Haswell and DDR4 for Haswell-E, and two major conclusions came out of that testing:
- As long as a user buys something above the bargain basement specification, performance is better than the worst,
- Performance tapers to a point with memory, very quickly hitting large price increases for little gain,
- The only major performance gain that scales comes from integrated gaming
So it is perhaps not surprising to read in forums that the general pervasive commentary is that “memory speed over DDR4-2400 does not matter and is a con by manufacturers”. This has the potential to change with AMD's Infinity Fabric, where the interconnect speed between sets of cores is directly linked with the memory speed. For any workload that transfers data between cores or out to main memory, the speed of the Infinity Fabric can potentially directly influence the performance. Despite the fact that pure speed isn’t always the ‘be all and end all’ of establishing performance gains, it has the potential to provide some gains with this new interconnect design.
The Infinity Fabric (hereafter shortened to IF) consists of two fabric planes: the Scalable Control Fabric (SCF) and the Scalable Data Fabric (SDF).
The SCF is all about control: power management, remote management and security and IO. Essentially when data has to flow to different elements of the processor other than main memory, the SCF is in control.
The SDF is where main memory access comes into play. There's still management here - being able to organize buffers and queues in order of priority assists with latency, and the organization also relies on a speedy implementaiton. The slide below is aimed more towards the IF implementation in AMD's server products, such as power control on individual memory channels, but still relevant to accelerating consumer workflow.
AMD's goal with IF was to develop an interconnect that could scale beyond CPUs, groups of CPUs, and GPUs. In the EPYC server product line, IF connects not only cores within the same piece of silicon, but silicon within the same processor and also processor to processor. Two important factors come into the design here: power (usually measured in energy per bit transferred) and bandwidth.
The bandwidth of the IF is designed to match the bandwidth of each channel of main memory, creating a solution that should potentially be unified without resorting to large buffers or delays.
Discussing IF in the server context is a bit beyond the scope of what we are testing in this article, but the point we're trying to get across is that IF was built with a wide scope of products in mind. On the consumer platform, while IF isn't necessarily used to such a large degree as in server, the potential for the speed of IF to affect performance is just as high.
AGESA 1.0.0.6 (aka AGESA 1006) and Memory Support
At the time of the launch of Ryzen, a number of industry sources privately disclosed to us that the platform side of the product line was rushed. There was little time to do full DRAM compatibility lists, even with standard memory kits in the marketplace, and this lead to a few issues for early adopters to try and get matching kits that worked well without some tweaking. Within a few weeks this was ironed out when the memory vendors and motherboard vendors had time to test and adjust their firmware.
Overriding this was a lower than expected level of DRAM frequency support. During the launch, AMD had promized that Ryzen would be compatible with high speed memory, however reviewers and customers were having issues with higher speed memory kits (3200 MT/s and above) . These issues have been addressed via a wave of motherboard BIOS updates built upon an updated version of the AGESA (AMD Generic Encapsulated Software Architecture), specifically up to version 1.0.0.6.
Given that the Ryzen platform itself has matured over the last couple of months, now is the time for a quick test on the scalability on AMDs Zen architecture to see if performance can scale consistency with raw memory frequency, or if any performance gains are achieved at all. For this testing we are using Team Group's latest Night Hawk RGB memory kit at several different memory straps under our shorter CPU and CPU gaming benchmark suites.
Recommended Reading
- The AMD Zen and Ryzen 7 Review: A Deep Dive on 1800X, 1700X and 1700
- The AMD Ryzen 5 1600X vs Core i5 Review: Twelve Threads vs Four at $250
- The AMD Ryzen 3 1300X and Ryzen 3 1200 CPU Review: Zen on a Budget
- The AMD Ryzen Threadripper 1950X and 1920X: CPUs on Steroids
- Retesting AMD Ryzen Threadripper’s Game Mode: Halving Cores for More Performance
Team Group's Night Hawk RGB Memory
16GB of DDR4-3000 CL16 (TF2D416G3000HC16CDC01)
For our testing, Team Group provided us with a dual channel kit from its T-Force Night Hawk RGB range. This is a 2x8 GB kit rated at DDR4-3000 with latency timings of 16-18-18-38. The T-Force Night Hawk DDR4-3000’s are a fairly middle of the road (spec wise) dual channel offering when compared to some of the high-end kits, but the Night Hawk kit comes aims at a much more reasonable balance of price point and performance. With the high-performance kits, the price is paid on the way the memory is binned - with kits like the Night Hawk RGB, there is a small 'RGB tax' over non-RGB mono-color variants, which usually comes in at around $5-$10 depending on the manufacturer. The RGB element is purely for aesthetics, so while on paper and financially it makes less sense to opt for the RGB option over the mono-colored version, our discussions with vendors gave an insight into this market. As far as we were told, RGB sales are growing faster than anyone ever expected - system design customisability is becoming an important consideration of a PC build.
For the memory itself, Team Group has gone with a rather eccentric hawk inspired heatsink design as the brand. The modules end up 1.73”/44mm in height, so for context, the Noctua NH-D14 CPU cooler has a clearance for memory of up to 45mm, so this kit should just about fit.
Under the heat sinks, Team Group has opted to use single sided Samsung B-die ICs. These memory ICs are highly favoured by extreme overclockers for their potential overclockability and frequency scaling, as well as the ability to really tighten the latencies; at very high frequencies and tight latencies, some of the more synthetic tests that competitive overclockers love make a difference, and memory manufacturers use that as a marketing tool when it comes to B-die. That being said, despite sending us a memory kit using B-die, Team Group did say however that in future it could change the ICs in the kits depending on market pricing and availability of such modules. This is disappointing, but not completely unexpected as other companies also do this. Our normal policy applies when this is the case: if this were to occur, we would want the model number would change to reflect this. There are attempts online by competitive overclockers to identify which memory modules use certain ICs, so if one model number had several IC versions, it would be very confusing to organise.
The Team Group T-Force Night Hawk RGB DDR4-3000 kit comes with a global lifetime guarantee in the US, and supports RGB LED customisation. This particular kit is synchronizable with ASUS motherboards via ASUS Aura Sync. The purpose of platforms like Aura Sync is to allow users looking to colour match their existing products through products such as peripherals, motherboards, VGA and even RGB LED strips such as BitFenix’s Alchemy range and virtually all of Cablemod’s current line-up.
Memory Straps
In the interest of achieving and obtaining consistent results, the Team Group Night Hawk RGB DDR4-3000 kit has been left at the rated latencies, which was achieved by enabling the XMP profile, and changing the memory strap/multiplier to achieve the desired frequency each time through the GIGABYTE UEFI BIOS. On Ryzen in this BIOS, due to memory strap limitations and the inability to support 100 MHz straps, the only way to run the memory at DDR4-3000 would be to run the 2933 memory strap and then adjust the base clock up from 100 MHz to 102.3 MHz. This would technically overclock the processor and in doing so, would skew the results from the other straps tested such as 2400MT/s etc.
So to keep everything on an even keel throughout, all of the settings in the BIOS except the memory settings remained at default. The memory had its XMP profile enabled, the base clock put back to 100 MHz, and different memory straps were tested with identical latency timings. All other tests were run at 16-18-18, as per the memory kit.
For the testing we are using a Ryzen 7 processor, specifically the Ryzen 7 1700. AMD's official listed support for this processor depends on the amount of memory and the memory type. The short answer to this support is that when using one memory module per channel, the Ryzen 7 1700 is designed to support DDR4-2666, but when two memory modules per channel are used, the support drops to DDR4-2400. For this test, because downclocking is easy enough, we start test from the DDR4-2400 data rate and go through the rated memory speed of the processor to the speed the memory is rated to.
We also overclock the memory beyond its rated speed. Each kit will offer different levels of overclocking performance, as it depends on the quality of the kit, the processor memory controller, and the motherboard, but we were able to push this memory kit all the way to DDR4-3333. At this speed it was stable in all our testing, and equates to a 10% increase on the rated frequency of the kit. It was interesting to see how much of an effect this speed would really manifest in our testing.
For users that are unfamiliar with this sort of image, this is the common CPU-Z tool that most professionals use to quickly probe the underlying hardware and speeds in the system. This is the main memory tab of the software, showing that we are using DDR4 in dual channel mode and have a total of 16 gigabytes. The NB Frequency, where NB historically stands for 'North Bridge', is the frequency that the Infinity Fabric is running at. In this case above, we get it running at 1663 MHz.
Below is the frequency and sub-timings for the memory itself. It shows a kit running at 1663.3 MHz, with 16-18-18 sub timings and a 1T command rate. I can already hear some of our readers with questions: why does it say the memory is running at 1663.3 MHz? I thought it was being run at DDR4-3333? So the key thing here is the difference between the frequency of the memory and how the memory works.
For a given frequency of the memory, say 1000 MHz, the system will perform 1000 million full clock cycles every second. These are full cycles, alternating from a peak voltage to a low voltage and back again within a single cycle. Modern memory, such as DDR4, is memory that runs at a Double Data Rate - this is what DDR stands for. What this means is that an action or a transfer can occur twice per cycle, usually each time the voltage alternates from peak to trough. This is also referred to as transferring on the clock cycle edges. The final result is that every cycle we get two transfers, so DDR4 at 1666 MHz is another way of saying DDR4 at 3333 mega transfers per second, or MT/s. Memory is quoted in terms of transfers per second, hence DDR4-3000 or DDR4-3333.
There is often user confusion here, with memory kits being listed as DDR4 at 3000 MHz when they mean DDR4 at 3000 MT/s (Ed: I'm pretty sure everyone on the AnandTech staff is guilty of this at some point). For this review, and any memory reviews going forward, AnandTech is going to keep consistency in how we represent numbers. Typically we will quote the MT/s value, as this is what is listed on the kit, and specifically state when we are talking about the frequency (in Hz) or the data rate (MT/s), and use 'speed' as the generic term.
In this review, we will be testing the following combinations of data rate and latencies:
- DDR4-2400 16-18-18
- DDR4-2666 16-18-18 (Ryzen 7 Supported at 1DPC)
- DDR4-2800 16-18-18
- DDR4-2933 16-18-18 (Nearest to memory kit rating)
- DDR4-3066 16-18-18
- DDR4-3200 16-18-18
- DDR4-3333 16-18-18 (10%+ overclock)
Test Bed Setup
As per our testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory. With this test setup, we are using the BIOS to set the frequency using the provided straps on the GIGABYTE Aorus AX370-Gaming 5 motherboard.
Test Setup | |||
Processor | AMD Ryzen 7 1700, 65W, $300 MSRP, 8 Cores, 16 Threads 3.0 GHz Base, 3.7 GHz Turbo |
||
Motherboard | GIGABYTE AX370-GAMING 5 | ||
Cooling | Thermaltake Floe Riing RGB 360 | ||
Power Supply | Thermaltake Toughpower Grand 1200 W Gold PSU | ||
Memory | Team Group Night Hawk RGB DDR4-3000 16-18-18 2x8 GB 1.35 V |
||
Video Card | ASUS GTX 980 STRIX 1178 MHz Base, 1279 MHz Boost) |
||
Hard Drive | Crucial MX300 1 TB | ||
Case | Open Test Bed | ||
Operating System | Windows 10 Pro |
Many thanks to...
We must thank the following companies for kindly providing hardware for our multiple test beds.
Thank you to ASUS for providing us with GTX 980 Strix GPUs. At the time of release, the STRIX brand from ASUS was aimed at silent running, or to use the marketing term: '0dB Silent Gaming'. This enables the card to disable the fans when the GPU is dealing with low loads well within temperature specifications. These cards equip the GTX 980 silicon with ASUS' Direct CU II cooler and 10-phase digital VRMs, aimed at high-efficiency conversion. Along with the card, ASUS bundles GPU Tweak software for overclocking and streaming assistance.
The GTX 980 uses NVIDIA's GM204 silicon die, built upon their Maxwell architecture. This die is 5.2 billion transistors for a die size of 298 mm2, built on TMSC's 28nm process. A GTX 980 uses the full GM204 core, with 2048 CUDA Cores and 64 ROPs with a 256-bit memory bus to GDDR5. The official power rating for the GTX 980 is 165W.
The ASUS GTX 980 Strix 4GB (or the full name of STRIX-GTX980-DC2OC-4GD5) runs a reasonable overclock over a reference GTX 980 card, with frequencies in the range of 1178-1279 MHz. The memory runs at stock, in this case, 7010 MHz. Video outputs include three DisplayPort connectors, one HDMI 2.0 connector, and a DVI-I.
Further Reading: AnandTech's NVIDIA GTX 980 Review
Thank you to Crucial for providing us with MX300 SSDs. Crucial stepped up to the plate as our benchmark list grows larger with newer benchmarks and titles, and the 1TB MX300 units are strong performers. Based on Marvell's 88SS1074 controller and using Micron's 384Gbit 32-layer 3D TLC NAND, these are 7mm high, 2.5-inch drives rated for 92K random read IOPS and 530/510 MB/s sequential read and write speeds.
The 1TB models we are using here support TCG Opal 2.0 and IEEE-1667 (eDrive) encryption and have a 360TB rated endurance with a three-year warranty.
Further Reading: AnandTech's Crucial MX300 (750 GB) Review
CPU Performance, Short Form
For our quick reviews, we use our short form testing method.
Video Conversion – Handbrake v1.0.2: link
Video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codec, VP9, there are two others that are taking hold: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content.
Handbrake is a favored tool for transcoding, and so our test regime takes care of three areas.
Low Quality/Resolution H264: Here we transcode a 640x266 H264 rip of a 2 hour film, and change the encoding from Main profile to High profile, using the very-fast preset.
High Quality/Resolution H264: A similar test, but this time we take a ten-minute double 4K (3840x4320) file running at 60 Hz and transcode from Main to High, using the very-fast preset.
HEVC Test: Using the same video in HQ, we change the resolution and codec of the original video from 4K60 in H264 into 4K60 HEVC.
The biggest gains in Handbrake came in the HQ test where we gained up to an extra +21% in performance for DDR4-3333 over DDR4-2400. The fact that we don't see the same gains in the HEVC test is likely down to the algorithm.
Compression – WinRAR 5.40: link
For the 2017 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack (33 video files in 1.37 GB, 2834 smaller website files in 370 folders in 150 MB) of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test 10 times and take the average of the last five runs when the benchmark is in a steady state.
Like with Handbrake, the system seemed to scale pretty well in WinRAR with a ~16% performance gain going from DDR4-2400 to DDR4-3333.
3D Movement Algorithm Test v2.1
This is the latest version of the self-penned 3DPM benchmark. The goal of 3DPM is to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in. It affords a ~25% speed-up over v2.0, which means new data.
Although more of a raw CPU benchmark, it shows here that memory isn’t a massive factor, as regardless of memory speed, we encountered marginal performance gains.
POV-Ray 3.7: link
Another regular benchmark in most suites, POV-Ray is another ray-tracer but has been around for many years. It just so happens that during the run up to AMD's Ryzen launch, the code base started to get active again with developers making changes to the code and pushing out updates. Our version and benchmarking started just before that was happening, but given time we will see where the POV-Ray code ends up and adjust in due course.
POV-Ray might be a fruitful benchmark for testing memory stability, but our performance variation between memory speeds was within the margin of error.
7-Zip 9.2: link
One of the freeware compression tools that offers good scaling performance between processors is 7-Zip. It runs under an open-source licence, is fast, and easy to use tool for power users. We run the benchmark mode via the command line for four loops and take the output score.
Some compression tools can be susceptible to memory performance and it shows in our results such as WinRAR. 7-zip has a small performance boost as we rise up through the stack, although the differences above DDR4-2666 are fairly minimal.
Gaming Performance
Ashes of the Singularity (DX12)
Seen as the holy child of DX12, Ashes of the Singularity (AoTS, or just Ashes) has been the first title to actively go and explore as many of the DX12 features as it possibly can. Stardock, the developer behind the Nitrous engine which powers the game, has ensured that the real-time strategy title takes advantage of multiple cores and multiple graphics cards, in as many configurations as possible.
Performance with Ashes over our different memory settings was varied at best. The DDR4-2400 value can certainly be characterized as the lowest number near to ~45-46 FPS, while everything else is rounded to 50 FPS or above. Depending on the configuration, this could be an 8-10% difference in frame rates by not selecting the worst memory.
Rise Of The Tomb Raider (DX12)
One of the newest games in the gaming benchmark suite is Rise of the Tomb Raider (RoTR), developed by Crystal Dynamics, and the sequel to the popular Tomb Raider which was loved for its automated benchmark mode. But don’t let that fool you: the benchmark mode in RoTR is very much different this time around.
Visually, the previous Tomb Raider pushed realism to the limits with features such as TressFX, and the new RoTR goes one stage further when it comes to graphics fidelity. This leads to an interesting set of requirements in hardware: some sections of the game are typically GPU limited, whereas others with a lot of long-range physics can be CPU limited, depending on how the driver can translate the DirectX 12 workload.
We encountered insignificant performance differences in RoTR on the GTX 980. The 3.3 FPS increase at average framerates from top to bottom does not exactly justify the price cost between DDR4-2400 and DDR4-3333 when using a GTX 980 - not in this particular game at least.
Thief
Thief has been a long-standing title in the hearts of PC gamers since the introduction of the very first iteration back in 1998 (Thief: The Dark Project). Thief is the latest reboot in the long-standing series and renowned publisher Square Enix took over the task from where Eidos Interactive left off back in 2004. The game itself uses the UE3 engine and is known for optimised and improved destructible environments, large crowd simulation and soft body dynamics.
For Thief, there are some small gains to be had from moving through from DDR4-2400 to DDR4-2933, around 5% or so, however after this the performance levels out.
Total War: WARHAMMER
Not only is the Total War franchise one of the most popular real time tactical strategy titles of all time, but Sega has delved into multiple worlds such as the Roman Empire, the Napoleonic era, and even Attila the Hun. More recently the franchise has tackeld the popular WARHAMMER series. The developers Creative Assembly have integrated DX12 into their latest RTS battle title, it aims to take benefits that DX12 can provide. The game itself can come across as very CPU intensive, and is capable of pushing any top end system to their limits.
Even though Total War: WARHAMMER is very CPU performance focused benchmark, memory had barely any effect on the results.
Conclusions on Ryzen DDR4 Scaling
It is pretty clear to see that Ryzen can be fairly dependant on memory frequency, but it depends very much on the sort of test and the nature of the workload on memory accesses. On the benchmarks where it matters, our memory kit was above to push performance up and over 20%, although despite the few benchmarks where this happened, it was outnumbered by benchmarks that had zero or a very minor effect. Some gaming titles had up to a 5-10% difference in average frame rates, but others had zero change.
To Infinity and Beyond
Determing the sweet spot for Ryzen from our small batch of testing is not so straightforward. From our quick testing, it would seem to suggest that there are performance gains to be had, with slow progress as the data rate increases. A few benchmarks seemed to hit the performance inflextion point around the DDR4-2933/3066 boundary - or basically where the Team Group Night Hawk RGB DDR4-3000 memory kit is positioned.
Aside from the fact of having fasting memory, the speed directly adjusts the potential in AMD’s Infinity Fabric. The IF is AMD's new scalable interconnect found in the Zen CPUs, Vega GPUs, and likely the next few generations of products. Infinity Fabric connects and manages the data flow from each of the cores to each other, as well as to the additional controllers on board. But the effect of faster DRAM and faster IF, on paper, should be a mutually beneficial improvement, and one would take a reasonable guess that AMD will aim to increase both as new generations of products come to market.
Final Thoughts
Depending on how the results are digested, and how the software can effectively use the new AMD Zen microarchitecture, a relatively decent set of DDR4-3000 (or there abouts) memory seems to be a good inflection point for users that want to invest in faster memory. Obviously using tighter sub-timimgs should help as well, which we'll likely explore in a separate review.
The Team Group Night Hawk RGB memory has served our testing needs well out of the box and it seems like a very reasonable purchase for Ryzen users looking to add a high-performance memory kit. Unfortunately there is no guarantee in the quality of the ICs on board, with Team Group stating that the type of ICs could change over the life time of the product - this will mean that the overclocking capabilities may change depending on the ICs. The memory kit we used in this testing is currently available from Newegg for $173 with a white heatspreader, or $156 with a black heatspreader. Interestingly the black version running at a faster DDR4-3200 is listed at a cheaper $164, but is currently out of stock.
DRAM Price Comparison: 2x8GB DDR4-3000 with RGB (9/27) | ||
Black Headspreader | White Heatspreader | |
Team Group Night Hawk RGB |
$156 (Newegg) CL16-18-18 |
$173 (Newegg) CL16-18-18 |
Corsair Vengeance RGB |
$160 (Amazon) CL15-17-17 |
$180 (Newegg) CL15-17-17 |
G.Skill Trident Z RGB |
$186 (Newegg) CL15-16-16 |
|
GeIL Super Luce RGB |
- | $160 (Newegg) CL16-18-18 |
ADATA XPG Spectrix RGB |
$180 (Amazon) CL16-18-18 |
- |
For other RGB-based kits running 2x8 GB at DDR4-3000 with white heatsinks, Corsair's Vengeance RGB are $180 in white or $160 in black, with GeIL's Super Luce in black also at $160. By comparison, ADATA and G.Skill offer similar kits but in black, both at the $180 price point.
Testing and Analysis by Gavin Bonshor
Additional Commentary by Ian Cutress