Original Link: https://www.anandtech.com/show/7364/memory-scaling-on-haswell



‘How much does memory speed matter?’ is a question often asked when dealing with mainstream processor lines.  Depending on the platform, the answers might very well be different.  Similar to our comparisons with Ivy Bridge, today we publish our results for 26 different memory timings across 45 benchmarks, all using a G.Skill memory kit.

In our previous memory scaling article with an Ivy Bridge CPU, the results of memory testing between DDR3-1333 to DDR3-2400 afforded two main results – (a) the high end memory kit offered up to a 20% improvement, but (b) this improvement was restricted to certain memory limited tests.  In order to be more thorough, our tests in this article take a single memory kit, the G.Skill 2x4GB DDR3-3000 12-14-14 1.65V kit, through 26 different combinations of memory speed and CAS latency to see if it is better to choose one set of timings over the other.  Benchmarks chosen include my standard array of real world benchmarks, some of which are memory limited, as well as several gaming titles on IGP, single GPU and multi-GPU setups, recording both average and minimum frame rates.

The Problem with Memory Speed

As mentioned in the Ivy Bridge memory scaling article, one of the main issues with reporting memory speeds is the exclusion of the CAS Latency, or tCL.  When a user purchases memory, it comes with an associated number of sticks, each stick is of a certain size, memory speed, set of subtimings and voltage.  In fact the importance of order is such that:

1.      Amount of memory

2.      Number of sticks of memory

3.      Placement of those sticks in the motherboard

4.      The MHz of the memory

5.      If XMP/AMP is enabled

6.      The subtimings of the memory

I use this order on the basis that point 1 is more important than point 3:

  • A system will be slow due to lack of memory before the speed of the memory is an issue (point 1)
  • In order to take advantage of the number of memory channels of the CPU we must have a number of sticks that have a factor of the memory channels (point 2), known as dual channel/tri channel/quad channel.
  • In order to ensure that we have dual (or tri/quad) channel operation these sticks need to be in the right slots of the motherboard – most motherboards support two DIMM slots per channel and we need at least one memory stick for each channel
  • If the MHz of the memory is more than CPU is rated for (1333, 1600, 1866+), then the user needs to apply XMP/AMP in order to benefit from the additional speed.  Otherwise the system will run at the CPU defaults.
  • Subtimings, such as tCL, are used in conjunction with the MHz to provide the overall picture when it comes to performance.

A user can go out and buy two memory kits, both DDR3-2400, but in reality (as shown in this review), they can perform different and have different prices.  The reason for this will be in the sub-timings of each memory kit: one might be 9-11-10 (2400 C9), and the other 11-11-11 (2400 C11).  So whenever someone boasts about a particular memory speed, ask for subtimings.

G.Skill DDR3-3000 C12 2x4GB Memory Kit: F3-3000C12D-8GTXDG

For this review, G.Skill supplied us with a pair of DDR3 modules from their TridentX range, rated at DDR3-3000.  This is at the absolute high end of memory kits, with very few memory kits going faster in terms of MHz.  Of course, in this MHz race, it comes at a price premium: $690 for 8 GB.  This memory kit uses single-sided Hynix MFR ICs, known for their high MHz numbers, and while there are large heat-spreaders on each stick, these can be removed reducing the height from 5.4 cm to 3.9 cm.

Hynix MFR based memory kits are used by extreme overclockers to hit the high MHz numbers.  Recently YoungPro from Australia took one of these memory sticks and hit DDR3-4400 MHz (13-31-31 sub-timings) to reach #1 in the world in pure MHz.

Test Setup

Test Setup
Processor Intel Core i7-4770K Retail @ 4.0 GHz
4 Cores, 8 Threads, 3.5 GHz (3.9 GHz Turbo)
Motherboards ASRock Z87 OC Formula/AC
Cooling Corsair H80i
Intel Stock Cooler (pre-testing)
Power Supply Corsair AX1200i Platinum PSU
Memory G.Skill TridentX 2x4 GB DDR3-3000 12-14-14 Kit
Memory Settings 1333 C7 to XMP (3000 12-14-14)
Discrete Video Cards AMD HD5970
AMD HD5870
Video Drivers Catalyst 13.6
Hard Drive OCZ Vertex 3 256GB
Optical Drive LG GH22NS50
Case Open Test Bed
Operating System Windows 7 64-bit
USB 3 Testing OCZ Vertex 3 240GB with SATA->USB Adaptor

With this test setup, we are using the BIOS to set the following combinations of MHz and subtimings:

Almost all of these combinations are available for purchase.  For any combination of MHz and CAS, we attempt that CAS for all sub-timings, e.g. 2400 9-9-9 1T at 1.65 volts.  If this setting is unstable, we move to 9-10-9, 9-10-10 then 9-11-10 and so on until the combination is stable.

There is an odd twist when dealing with DDR3-3000.  In order to reach 3000 MHz, as Haswell does not accept the DDR3-3000 memory strap, we actually have to use the DDR3-2933 strap and boost the CPU speed to 102.3 MHz.  This leads to a slight advantage in terms of CPU throughput when using DDR3-3000 which does come through in several benchmarks.  In order to keep things even, our 4.0 GHz CPU has the multiplier reduced for 3000 C12 in order to keep the overall system speed the same, albeit with a slight BCLK advantage.

At the time of testing, DDR3-3000 C12 was the highest MHz memory kit available, but since then there are now 3100 C12 memory kits on the market taking price margins even higher at $1000 for 8 GB.  The problem at this speed is the actual overclocking of the CPU aspect of the system will skew the performance results in favor of the high end kit.

Benchmarks

For this test, we use the following real world and compute benchmarks:

CPU Real World:
 - WinRAR 4.2
 - FastStone Image Viewer
 - Xilisoft Video Converter
 - x264 HD Benchmark 4.0
 - TrueCrypt v7.1a AES
 - USB 3.0 MaxCPU Copy Test

CPU Compute:
 - 3D Particle Movement, Single Threaded and MultiThreaded
 - SystemCompute ‘2D Explicit’
 - SystemCompute ‘3D Explicit’
 - SystemCompute nBody
 - SystemCompute 2D Implicit

IGP Compute:
 - SystemCompute ‘2D Explicit’
 - SystemCompute ‘3D Explicit’
 - SystemCompute nBody
 - SystemCompute MatrixMultiplication
 - SystemCompute 3D Particle Movement

For what should be obvious reasons, there is no point in running synthetic tests when dealing with memory.  A synthetic test will tell you if the peak speed or latency is higher or lower – that is not a number that necessarily translates into the real world unless you can detect the type and size of all the memory accesses used within a real world environment.  The real world is more complex than a simple boost in memory read/write peak speeds.

For each of the 3D benchmarks we use an ASUS HD 6950 (flashed to HD6970) for the single GPU tests, the HD 4600 in the CPU for IGP, and a HD 5970+5870 for a lopsided tri-GPU test.

Gaming:
 - Dirt 3, Avg and Min FPS, 1360x768
 - Bioshock Infinite, Avg and Min FPS, 1360x768
 - Tomb Raider, Avg and Min FPS, 1360x768
 - Sleeping Dogs, Avg and Min FPS, 1360x768

Firstly, I want to go through enabling XMP in the BIOS of all the major vendors.



Enabling XMP with ASUS, GIGABYTE, ASRock and MSI on Z87

By default, memory should adhere to specifications set by JEDEC (formerly known as the Joint Electron Device Engineering Council).  These specifications state what information should be stored in the memory EEPROM, such as manufacturer information, serial number, and other useful information.  Part of this are the memory specifications for standard memory speeds, including (for DDR3) 1066 MHz, 1333 MHz and 1600 MHz, which a system will adhere to in the event of other information not being available.

An XMP, or (Intel-developed) Extreme Memory Profile, is an additional set of values stored in the EEPROM which can be detected by SPD in the BIOS.  Most DRAM has space for two additional SPD profiles, sometimes referred to as an ‘enthusiast’ and an ‘extreme’ profile; however most consumer oriented modules may only have one XMP profile.  The XMP profile is typically the one advertised on the memory kit – if the capability of the memory deviates in any way from specified JEDEC timings, a manufacturer must use an XMP profile.

Thus it is important that the user enables XMP!  It is not plug and play!

At big computing events and gaming LANs there are plenty of enthusiasts who boast about buying the best hardware for their system.  If you ask what memory they are running, then actually probe the system (by using CPU-Z), I sometimes find that the user, after buying expensive memory, has not enabled XMP!  It sounds like a joke story, but this happened several times at my last iSeries LAN in the UK – people boasting about high performance memory, but because they did not enable it in the BIOS, were still running at DDR3-1333 MHz C9.

So enable XMP with your memory!

Here is how:

Step 1: Enter the BIOS

This is typically done by pressing DEL or F2 during POST/startup.  Users who have enabled fast booting under Windows 8 will have to use vendor software to enable ‘Go2BIOS’ or a similar feature.

Step 2: Enable XMP

Depending on your motherboard manufacturer, this will be different.  I have taken images from the major four motherboard manufacturers to show where the setting is on some of the latest Z87 motherboard models.

On the ASUS Z87-Pro, the setting is on the EZ-Mode screen.  Where it says ‘XMP’ in the middle, click on this button and navigate to ‘Profile 1’:

If you do not get an EZ mode (some ROG boards go straight to advanced mode), then the option is under the AI Tweaker tab, in the AI Overclock Tuner option.

For ASRock motherboards, navigate to OC Tweaker and scroll down to the DRAM Timing Configuration.  Adjust the ‘Load XMP Setting’ option to Profile 1.

For GIGABYTE motherboards, such as the Z87X-UD3H in the new HD mode, under Home -> Standard is the separate XMP setting, as shown below:

Finally on MSI motherboards, select to the OC option on the left hand side and XMP should be in front of you:

I understand that setting XMP may seem trivial to most of AnandTech’s regular readers, however for completeness (and the lack of XMP being enabled at events it seems) I wanted to include this mini-guide.  Of course different BIOS versions on different motherboards may have moved the options around a little – either head to enthusiast forums, or if it is a motherboard I have reviewed, I post up all the screenshots of the BIOS I tested with as a guide.



As mentioned previously, real world testing is where users should be feeling the benefits of spending up to 13x on memory, rather than a synthetic test.  A synthetic test exacerbates a specific type of loading to get peak results in terms of memory read/write and latency timings, most of which are not indicative of the pseudo random nature of real-world workloads (opening email, applying logic).  There are several situations which might fall under the typical scrutiny of a real world loading, such as video conversion/video editing.  It is at this point we consider if the CPU caches are too small and the system is relying on frequent memory accesses because the CPU cannot be fed with enough data.  It is these circumstances where memory speed is important, and it is all down to how the video converter is programmed rather than just a carte blanche on all video converters benefitting from memory.  As we will see in the IGP Compute section of this review, anything that can leverage the IGP cores can be a ripe candidate for increased memory speed.

Our tests in the CPU Real World section come from our motherboard reviews in order to emulate potential scenarios that a user may encounter.

USB 3.0 Copy Test with MaxCPU

We transfer a set size of files from the 120GB OCZ Vertex3 connected via SATA 6 Gbps on the motherboard to the 240 GB OCZ Vertex3 SSD with a SATA 6 Gbps to USB 3.0 converter via USB 3.0 using DiskBench, which monitors the time taken to transfer.  The files transferred are a 9.2 GB set of 7539 files across 1011 folders – 95% of these files are small typical website files, and the rest (90% of the size) are precompiled installers.  In an update to pre-Z87 testing, we also run MaxCPU to load up one of the threads during the test which improves general performance up to 15% by causing all the internal pathways to run at full speed.

Results are represented as seconds taken to complete the copy test, where lower is better.

The difference between the slowest and the fastest is around 2%, or 1 second in our test, making the memory have little influence over intended USB speed (at load).

WinRAR 4.2

With 64-bit WinRAR, we compress the set of files used in the motherboard review USB speed tests.  WinRAR x64 3.93 attempts to use multithreading when possible, and provides as a good test for when a system has variable threaded load.  WinRAR 4.2 does this a lot better!  If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.

Up first, WinRAR 3.93, with results expressed in terms of seconds to compress.  Lower is better.

Using the older version of WinRAR shows a 31% advantage moving from 1333 C9 to 3000 C12, although 2400 C9/2666 C10/2800 C11 have a good showing.

WinRAR 4.2 results next:

We see similar results with the later version of WinRAR – here having at least 1866 MHz memory gets above the grade in terms of time, lower CAS Latency helping (1866 C8 / 2133 C9 / 2400 C9 / 2666 C11)

FastStone Image Viewer 4.2

FastStone Image Viewer is a free piece of software I have been using for quite a few years now.  It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters.  It also has a bulk image conversion tool, which we use here.  The software currently operates only in single-thread mode, which should change in later versions of the software.  For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.  Results shown are in seconds, lower is better.

FastStone is purely a CPU limited benchmark, with little variation and no trend in the results.  Discrepancies are part of the statistical variation expected with any result.

Xilisoft Video Converter 7

With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices.  By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs.  For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU.  The time taken to convert these videos gives us our result in seconds, where lower is better.

Similar to WinRAR, to avoid the ultra-slow speeds, anything above 1866 MHz seems to be the right way to go here.

Video Conversion - x264 HD Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps.  This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed.  The benchmark performs a 2-pass encode, and the results shown are the average frame rate of each pass performed four times.  Higher is better this time around.

The higher frequency memory performs the best, but to get at least 5% speed up, DDR3-1866 comes along again.

For whatever reason the 1333 C9 and 3000 C12 get a bad showing, but it seems as long as we avoid 1333 C9, any speed is reasonable for a 5-6% increase.

TrueCrypt v7.1a AES

One of Anand’s common CPU benchmarks is TrueCrypt, a tool designed to encrypt data on a hard-drive using a variety of algorithms.  We take the program and run the benchmark mode using the fastest AES encryption protocol over a 1GB slice, calculating the speed in GB/s.  Higher is better.

Similar to FastStone, there is nothing to differentiate the results.  The only oddball here is technically our slowest memory speed: 1333 C9.



One side I like to exploit on CPUs is the ability to compute and whether a variety of mathematical loads can stress the system in a way that real-world usage might not.  For these benchmarks we are ones developed for testing MP servers and workstation systems back in early 2013, such as grid solvers and Brownian motion code.  Please head over to the first of such reviews where the mathematics and small snippets of code are available.

3D Movement Algorithm Test

The algorithms in 3DPM employ uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc.  The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score.  This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark.  The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance.  Results are expressed in millions of particles moved per second, and a higher number is better.

Single threaded results:

For software that deals with a particle movement at once then discards it, there are very few memory accesses that go beyond the caches into the main DRAM.  As a result, we see little differentiation between the memory kits, except perhaps a loose automatic setting with 3000 C12 causing a small decline.

Multi-Threaded:

With all the cores loaded, the caches should be more stressed with data to hold, although in the 3DPM-MT test we see less than a 2% difference in the results and no correlation that would suggest a direction of consistent increase.

N-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity.  Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other.  When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions.  The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code.  We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

Despite co-interaction of many particles, the fact that a simulation of this scale can hold them all in caches between time steps means that memory has no effect on the simulation.

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it.  This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step.  By choosing a regular grid, we reduce the levels of memory access required for irregular grids.  We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Two-Dimensional Grid:

In 2D we get a small bump over at 1600 C9 in terms of calculation speed, with all other results being fairly equal.  This would statistically be an outlier, although the result seemed repeatable.

Three Dimensions:

In three dimensions, the memory jumps required to access new rows of the simulation are far greater, resulting in L3 cache misses and accesses into main memory when the simulation is large enough.  At this boundary it seems that low CAS latencies work well, as do memory speeds > 2400 MHz.  2400 C12 seems a surprising result.

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations.  This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread.  The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing.  For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision.  Again our grid is isotropic with the boundaries acting as sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

2D Implicit:

Despite the nature if implicit calculations, it would seem that as long as 1333 MHz is avoided, results are fairly similar.  1866 C8 being a surprise outlier.



One of the touted benefits of Haswell is the compute capability afforded by the IGP.  For anyone using DirectCompute or C++ AMP, the compute units of the HD 4600 can be exploited as easily as any discrete GPU, although efficiency might come into question.  Shown in some of the benchmarks below, it is faster for some of our computational software to run on the IGP than the CPU (particularly the highly multithreaded scenarios). 

Grid Solvers - Explicit Finite Difference on IGP

As before, we test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Two Dimensional:

The results on the IGP are 50% higher than those on the CPU, and it would seem that memory can make a difference as well.  As long as 1333 MHz is not chosen, there is at least a 2% gain to be had.  Otherwise, the next jump up is at 2666 MHz for another 2%, which might not be cost effective.

Three Dimensional:

The 3D results seem to be a little haphazard, with 1333 C7 and 2400 C9 both performing well.  1600 C11 definitely is out of the running, although anything 2400 MHz or above affords almost a 10%+ benefit.

N-Body Simulation on IGP

As with the CPU compute, we run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

In terms of a workload that calculates FLOPs, the operational workload does not seem to be affected by memory.

3D Particle Movement on IGP

Similar to our CPU Compute algorithm, we calculate the random motion in 3D of free particles involving random number generation and trigonometric functions.  For this application we take the fastest true-3D motion algorithm and test a variety of particle densities to find the peak movement speed.  Results are given in ‘million particle movements calculated per second’, and a higher number is better.

Despite this result being over 35x the equivalent calculation on a fully multithreaded 4770K CPU (200 vs. 7000), again there seems little difference between memory speeds.  3000 C12 gets a small peak over the rest, similar to the n-Body test.

Matrix Multiplication on IGP

Matrix Multiplication occurs in a number of mathematical models, and is typically designed to avoid memory accesses where possible and optimize for a number of reads and writes depending on the registers available to each thread or batch of dispatched threads.  He we have a crude MatMul implementation, and iterate through a variety of matrix sizes to find the peak speed.  Results are given in terms of ‘million nodes per second’ and a higher number is better.

Matrix Multiplication on this scale seems to vary little between memory settings, although a shift towards the lower CL timings gives a marginally (though statistically minor) better result.

3D Particle Movement on IGP

Similar to our 3DPM Multithreaded test, except we run the fastest of our six movement algorithms with several million threads, each moving a particle in a random direction for a fixed number of steps.  Final results are given in million movements per second, and a higher number is better.

While there is a slight dip using 1333 C9, in general almost all of our memory timing settings perform roughly the same.  The peak shown using our memory kit at its XMP rated timings are presumably more due to the adjustments in BCLK which need to be made in order to hit this memory frequency.



The activity cited most often for improved memory speeds is IGP gaming, and as shown in both of our tests of Crystalwell (4950HQ in CRB, 4750HQ in Clevo W740SU), Intel’s version of Haswell with the 128MB of L4 cache, having big and fast memory seems to help in almost all scenarios, especially when there is access to more and more compute units.  In order to pinpoint where exactly the memory helps, we are reporting both average and minimum frame rates from the benchmarks, using the latest Intel drivers available.  All benchmarks are also run at 1360x768 due to monitor limitations (and makes more relevant frame rate numbers).

Bioshock Infinite: Average FPS

Average frame rate numbers for Bioshock Infinite puts a distinct well on anything 1333 MHz.  Move up to 1600 gives a healthy 4-6% boost, and then again to 1866 for a few more percent.  After that point the benefits tend to flatten out, but a bump up again after 2800 MHz might not be cost effective, especially using IGP.

Bioshock Infinite: Minimum FPS

Unfortunately, minimum frame rates for Bioshock Infinite are a little over the place – we see this in both of our dGPU tests, suggesting more an issue with the title itself than the hardware.

Tomb Raider: Average FPS

Similar to Bioshock Infinite, there is a distinct well at 1333 MHz memory.  Moving to 1866 MHz makes the problem go away, but as the MHz rises we get another noticeable bump over 2800 MHz.

Tomb Raider: Minimum FPS

The minimum FPS rates shows that hole at 1333 MHz still, but everything over 1866 MHz gets away from it.

Sleeping Dogs: Average FPS

Sleeping Dogs seems to love memory – 1333 MHz is a dud but 2133 MHz is the real sweet spot (but 1866 MHz still does well).  CL seems to make no difference, and after 2133 MHz the numbers take a small dive, but back up by 2933 again.

Sleeping Dogs: Minimum FPS

Like the average frame rates, it seems that 1333 MHz is a bust, 1866 MHz+ does the business, and 2133 MHz is the sweet spot.



For our single discrete GPU testing, rather than the 7970s which normally adorn my test beds (and were being used for other testing), I plumped for one of the HD 6950 cards I have.  This ASUS DirectCU II card I purchased pre-flashed to 6970 specifications, giving a little more oomph.  Typically discrete GPU options are not often cited as growth areas of memory testing, however we will let the results speak for themselves.

Dirt 3: Average FPS

Dirt 3 commonly benefits from boosts in both CPU and GPU power, showing near-perfect scaling in multi-GPU configurations.  When using our HD6950 however there seems to be little difference between memory settings with no trend.

Dirt 3: Minimum FPS

Minimum frame rates show a different story – Dirt 3 seems to prefer setups with a lower CL – MHz does not seem to have any effect.

Bioshock Infinite: Average FPS

Single GPU frame rates for Bioshock has no direct effect for memory changes with less than 2% covering our range of tests.

Bioshock Infinite: Minimum FPS

One big sink in frame rates seems to be for 1333 C7, although given that C8 and C9 do not have this effect, I would presume that this is more a statistical outlier than an obvious trend.

Tomb Raider: Average FPS

Again, we see no obvious trend in average frame rates for a discrete GPU.

Tomb Raider: Minimum FPS

While minimum frame rates for Tomb Raider seem to have a peak (1600 C8) and a sink (2400 C12), this looks to be an exception rather than the norm, with minimum frame rates typically showing 35.8 – 36.0 FPS.

Sleeping Dogs: Average FPS

Frame rates for Sleeping Dogs vary between 49.3 FPS and 49.6 FPS, showing no distinct improvement for certain memory timings.

Sleeping Dogs: Minimum FPS

The final discrete GPU test shows a small 5% difference from 1600 C11 to 2400 C11, although other kits perform roughly in the middle.



Our final set of tests are a little more on the esoteric side, using a tri-GPU setup with a HD5970 (dual GPU) and a HD5870 in tandem.  While these cards are not necessarily the newest, they do provide some interesting results – particularly when we have memory accesses being diverted to multiple GPUs (or even to multiple GPUs on the same PCB).  The 5970 GPUs are clocked at 800/1000, with the 5870 at 1000/1250.

Dirt 3: Average FPS

It is pretty clear that memory has an effect: +13% moving from 1333 C9 to 2133 C9/2400 C10.  In fact, that 1333 C9 seems to be more of a sink than anything else – above 2133 MHz memory the performance benefits are minor at best.  It all depends if 186.53 FPS is too low for you and you need 200+.

Dirt 3: Minimum FPS

We see a similar trend in minimum FPS for Dirt3: 1333 C9 is a sink, but moving to 2133 C9/2400 C10 gives at least a 20% jump in minimum frame rates.

Bioshock Infinite: Average FPS

While differences in Bioshock Infinite Minimum FPS are minor at best, 1333 MHz and 1600 C10/C11 are certainly at the lower end.  Anything 1866 MHz or 2133 MHz seems to be the best bet here, especially in our case if we wanted to push for 120 FPS gaming.

Bioshock Infinite: Minimum FPS

Similar to Bioshock on IGP, minimum frame rates across the board seem to be very low, with minor differences giving large % rises.

Tomb Raider: Average FPS

Tomb Raider remains resilient to change across our benchmarks, with 1 FPS difference between the top and bottom average FPS results in our tri-GPU setup.

Tomb Raider: Minimum FPS

With our tri-GPU setup being a little odd (two GPUs on one PCB), Tomb Raider cannot seem to find much consistency for minimum frame rates, showing up to a 15% difference when compared to our 1600 C10 result which seems to be a lot lower than the rest.

Sleeping Dogs: Average FPS

Similar to other results, 1333 and 1600 MHz results give lower frame rates, along with the slower 1866 MHz C10/C11 options.  Anything 2133 MHz and above gives up to 8% more performance than 1333 C9.

Sleeping Dogs: Minimum FPS:

Minimum frame rates are a little random in our setup, except for one constant – 1333 MHz memory does not perform.  Everything beyond that seems to be at the whim of statistical variance.



Pricing and the Effect of the Hynix Fire

When I started testing for this overview, I naturally headed over to Newegg in order to see the prices for memory kits using each of the timings we used.  A 2x4 GB memory kit covers most of the major user scenarios, and a 2x8 GB of each is often available for near-double the pricing.  As it stood at the beginning of August, we had the following pricing:

At the time, a 1333 C9 was the cheapest at $50, moving up through to $700 for our extreme DDR3-3000 C12 kit.  Anything 2666 MHz and above requires a larger bump in price, however the movement from 1333 C9 to 2400 C11 in the grand scheme of things was relatively small ($13) but jumping to 2400 C9 is a 2.16x increase.

However, on September 4th, fire struck Fab 1 and Fab 2 of SK Hynix’s operation in Wuxi, China.

Source: Kitguru

Reports vary, with some suggesting that these Fabs were used for production of NVIDIA GDDR5, and others stating they were part of a general plant manufacturing DRAM.  In a statement, SK Hynix has stated that ‘there was no material damage to the fab equipment in the clean room, and thus we expect to resume operations in a short time period so that overall production and supply volume would not be materially affected’.

To put this into context, these Fabs combined produce 12-15% of the world’s supply of DRAM silicon:  Hynix themselves command 30% of the memory chip market and Reuters reports that this plant produces around 40-50 percent of Hynix’s total output.

Of course the initial reaction to the incident was directed at pricing.  Any suspension of manufactured goods can cause other companies to raise their base line, or the reduction of supply will cause other companies to react and make the most of their production.  Memory kits have been rising in price per Gigabyte over the past year anyway, and the prediction of a 10-20% bump in price is not welcome.  Using price tracking website camelcamelcamel.com, we chose a few 2x4 GB kits to see how prices have spiked:

A few memory kits show a bump around the Sep 4-10th timeframe, such as the Corsair 1866 C9 kit, the Kingston DDR3-2400 C11 kit and the Patriot 2133 C11 kit.  However the majority of kits did not in our small sample.  Going back to the original list of prices I obtained from Newegg, I got a fresh set of numbers:

Some pricing has obviously moved – 1333 C9 is now $15 more expensive, and the budget kits are clearly 1600 C9 and 2400 C11.  Most of the high end has not moved, although 2666 C11 is now under $100 for a 2x4 GB kit.  1866 C9 is $2 cheaper over the timeframe, but 2133 C9 is $8 more expensive than before.  The ultra-high end kits have not adjusted much.



The major enthusiast memory manufacturers are all playing a MHz race, to retail the most MHz possible.  These are pretty much all Hynix MFR memory kits, known for their high MHz, but these kits are still binned in their thousands to get one or two modules that hit the high notes.  With Haswell memory controllers happily taking DDR3-2800 in their stride, it all boils down to how useful is this increase in memory versus the price at which it costs to bin and produce.

In terms of competitive overclocking, the results are real – where every MHz matters and performance is not on the menu, enthusiasts are happy to take this high end kits to over 4000 MHz+ (recent news from G.Skill shows 4x4GB DDR3-4072 in Ivy Bridge-E, the fastest result ever in quad channel).  The reality of that memory is often that enthusiasts are not purchasing but are being seeded the memory, taking the cost-effectiveness out of the ratio.

In terms of real world usage, on our Haswell platform, there are some recommendations to be made.

Avoid DDR3-1333 (and DDR3-1600)

While memory speed did not necessarily affect our single GPU gaming results, for real-world or IGP use, memory speed above these sinks can afford a tangible (5%+) difference in throughput.  Based on current pricing, after the Hynix fire, it may be worthwhile, as memory kits above DDR3-1600 are now around the same price.

MHz Matters more than tCL, unless you compare over large MHz ranges

When discussing memory kits, there is often little difference in our testing when comparing different Command Rate numbers – the only issues ever came along with DDR3-1333 C9 and DDR3-1600 C11, which we already suggested are best avoided.  Even still, above DDR3-2400 the benefits are minimal at best, with perhaps a few % points afforded in multiple GPU setups.

However, tCL might play a role when comparing large MHz differences, such as 2400 C12 vs. 1866 C8.  In order to provide an apt comparison, I mentally use a ‘performance index’, which is a value of MHz divided by tCL:

As a general rule, below 2666 MHz, my Performance Index provides an extremely rough guide as to which kits offer more performance than others.  In general we see 1333 C7 > 1333 C9, but 1333 C7 is worse than 2133 C11, for example.

When presented with two kits, calculate this Performance Index.  If the kits are similar in number (within 10%), then take the kit with the higher MHz.  If we take the results of Dirt 3 minimum frame rates in a CFX configuration, and plot against the Performance Index listed above, we plot the following curve:

The graph shows the trend of diminishing returns in this benchmark - as the PI is higher, we reach an asymptotic limit.  Note the several results below the line at PIs of 167, 190 and 229 - these are the 1333 MHz memory kits, reinforcing the idea that at similar PI values, the higher MHz kit should be the one to go for.

Of course, cost is a factor – price raises more with MHz than with tCL, thus the ‘work benefit’ analysis comes in – if buying a kit boosts your productivity by x percent, how long will it take to recover the cost?  In single GPU gaming, for our setup, the benefits seemed to be minimal, but I can see workload improvements going for something faster than 1600 C9.

Remember the Order of Importance

As I mentioned at the beginning of this overview, the order of importance for memory should be:

1.      Amount of memory

2.      Number of sticks of memory

3.      Placement of those sticks in the motherboard

4.      The MHz of the memory

5.      If XMP/AMP is enabled

6.      The subtimings of the memory

I would always suggest a user buy more memory if they need it over buying a smaller amount of faster memory.  For gamers, common advice to hear on forums is to take sum of the memory of your GPUs and add 4GB or so for Windows 7/8 – that should be the minimum amount of memory in your system.  For most single GPU gamers that would put the number at 8GB (for anything bar a Titan or dual GPU), or dual GPU gamers, above 8GB (suggestion is to stick to a power of two).

G.Skill 2x4GB DDR3-3000 12-14-14 1.65V Kit: Do we need it?

Firstly, many thanks to G.Skill for the memory kit on which these tests were performed – 3000 MHz on air can be a tough thing to do without a kit that is actually rated do it.  This kit has done the rounds on all the major Z87 overclocking motherboards, including the ASRock Z87 OC Formula in this test.

On the face of it, investing in this kit means a small bump to BCLK, which has additional performance gains purely based on bus speed even at a slightly lower CPU multiplier.  Beyond this, there are only very few scenarios where DDR3-3000 C12 beats anything 5x cheaper – a couple of our IGP compute benchmarks, a couple of IGP gaming scenarios and a couple of tri-GPU Crossfire games as well.  However, the argument then becomes if going for the extra cost of the memory is not worth buying a discrete GPU outright (or better GPUs).

The memory kit has one thing going for it – overclockers aiming for high MHz love the stuff.  In our testing, we hit 3100 C12 stable across the kit for daily use, one of the sticks hit 3300 MHz on air at very loose timings, and fellow overclocker K404 got one of the sticks to 3500 MHz on liquid nitrogen.  In the hands of overclockers with much more time on their hands (and knowledge of the subsystem), we have seen DDR3-4400 MHz as well.

At $690 for a 2x4 GB kit, veteran system builders are laughing.  It is a high price for a kit that offers little apart from a number parade.  Perhaps the thing to remember is that plenty of memory manufacturers are also aiming at high MHz – Corsair, Avexir, TeamGroup, Apacer and others.  If I had that money to spend on a daily Haswell system, I might plump for 4x8GB of DDR3-2400 C10 and upgrade the GPU with money left over.

Haswell Recommendations

For discrete GPU users, recommending any kit over another is a tough call.  In light of daily workloads, a good DDR3-1866 C9 MHz kit will hit the curve on the right spot to remain cost effective.  Users with a few extra dollars in their back pocket might look towards 2133 C9/2400 C10, which moves a little up the curve and has the potential should a game come out that is heavily memory dependent.  Ultimately the same advice also applies to multi-GPU users as well as IGP: avoid 1600 MHz and below.

One relevant question to this is whether memory speed matters in the laptop space.  It remains an untapped resource for memory manufacturers to pursue, mainly because it is an area where saving $5 here and there could mean the difference between a good and great priced product.  But even when faced with $2000+ laptops, 1333 MHz C9 and 1600 C9-11 still reign supreme.  I have been told that often XMP is not even an option on many models, meaning there are few opportunities for some pumped up SO-DIMM kits that have recently hit the market.

Addendum:

One point I should address which I failed to in the article.  XMP rating for a memory kit are made for the density of that kit - i.e. a 2x8 GB DDR3-2400 C11 memory kit might not be stable at 2400 C11 when you add the two kits together in the same system.  If a kit has a lot of headroom, then it may be possible, but this is no guarantee.  The only guarantee is if you purchase a single kit (4x8 2400 C11, for example) then it will be confirmed to run at the rated timings.  This in certain circumstances may be slightly more expensive, but it saves a headache if the kits you buy will not work in a full density system.  I would certainly recommend buying a single kit, rather than gambling with two lower density kits, even if they are from the same family.  The rating on the kit is for the density of that kit.

Log in

Don't have an account? Sign up now