Original Link: https://www.anandtech.com/show/9482/intel-broadwell-pt2-overclocking-ipc



In our first part of our Broadwell coverage, we rushed to test both the i7-5775C and the i5-5765C in our new benchmarking suite against the previous generation of Haswell processors as well as AMDs Kaveri lineup. In Part 2, we have spent more time with the architecture to see how it stacks up against the last four years of Intel, as well as probing the high end overclocking capabilities.

Since our Part 1, the news mill from Intel has been an interesting mix of reduced revenue from the PC segments but a rise in expectation as we move from a dull Q2 into an exciting Q3 with back-to-school sales on the horizon and the release of Windows 10. Throw into the mix details about Intel's delayed 10nm process node, the injection of Kaby Lake processors after Skylake to break the tick tock model and it becomes an interesting next few years for the industry.

No matter the state of the tick-tock model (or what seems to be a tick-tock-tock with Kaby Lake), Intel's goals are still the same - improve efficiency of the main processor design and boost peak performance though instruction per clock (IPC) gains with each new processor design release. Simply stating you want an improvement in IPC and actually designing the semiconductor to get a boost in performance are two opposite ends of the difficulty spectrum.

Broadwell vs. Haswell

Intel's line of Haswell (4th generation) processors were released in June 2013, with a small updated design called Haswell Refresh in mid-2014 with improved frequencies and a small a package upgrade to benefit temperatures. Haswell is the name of the architecture, updating from the architecture on Ivy Bridge but on the same 22nm process node as Ivy Bridge. An architecture update incorporates a numerical of things - either a paradigm shift in the underlying semiconductor design, or a step up from the previous orientation by aiming for the low hanging fruit (times which can be updated for the most gain and the least effort). As a result, architecture jumps usually produce big (5-25%) jumps in performance. This is a tock, to use Intel's nomenclature.

Intel's Tick-Tock Cadence
Microarchitecture Process Node Tick or Tock Release Year
Conroe/Merom 65nm Tock 2006
Penryn 45nm Tick 2007
Nehalem 45nm Tock 2008
Westmere 32nm Tick 2010
Sandy Bridge 32nm Tock 2011
Ivy Bridge 22nm Tick 2012
Haswell 22nm Tock 2013
Broadwell 14nm Tick 2014
Skylake 14nm Tock 2015
Kaby Lake (link)? 14nm Tock 2016 ?

The other half of the equation is a tick, or the movement from a larger process node to a smaller process node. This is by and large a scaled reduction in the mask used for the processor, but there are potential benefits based on the die area of the components of the processor and the connections within. Moving down to a smaller node typically does not change the base hardware underneath, but optimizations are made based on that die area reduction. With this in mind, we typically see smaller benefits in performance (5-10%), but better improvements in power consumption due to smaller transistors needing less voltage (although this is a balance between higher leakage currents). Overall, the typical goal of a process node change is typically efficiency, making it favored in mobile platforms.

Moving from Haswell to Broadwell on the desktop is a process node change, migrating from 22nm on Haswell to 14nm on Broadwell. As a result, the first processors released under the Broadwell nomenclature were mobile focused (Core M), and the desktop end of the stack us the last one to be updated. Though the desktop side is more subtle than that - Intel has released mid-powered versions of the processor with high end integrated graphics, an approach normally reserved for mobile devices or integrated devices such as all-in-ones. Perhaps it is then unsurprising that when desktop processors are launched under the -S or -DT naming scheme, Broadwell on the desktop is part of the -H line, normally reserved for mobile processors.

We've commented on Broadwell's minor architecture adjustments over Haswell before. They focus on reducing cache misses and keeping more predicted operations in flight at any one time, reducing the need to move back out of memory and increase throughput. This is mostly achieved by exploiting the available area when function units are reduced in size from the node change - increasing the out-of-order scheduler size, increasing the L2 TLB to allow for both more local misses/larger memory jump requests, and the page miss handler doubles in size.

This, according to Intel, accounts for a 5% increase in IPC (instructions per clock) by focusing on reducing the wait time for data for the traditional CPU part of the Broadwell processor.

In our initial review of the Broadwell processors, we saw that it was not as straightforward as this. The two CPUs we tested, the i7-5775C and the i5-5675C, are built to a 65W thermal design power, compared to the high end models from Haswell which are at the 84/88W level. This means that for users looking for the next most powerful processor, the base processor frequencies of the Broadwell samples we had are lower and less performant due to frequency, more than any IPC increase could overcome.


Core i7-5775C lining up with the Core i7-4790K

It was difficult for Broadwell to win any CPU focused benchmark from a pure frequency (and TDP) handicap. To add an additional element into the mix, almost every Broadwell's memory cache system is also different:

Intel Desktop Processor Cache Comparison
  L1-D L1-I L2 L3 L4
Sandy Bridge i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  
Ivy Bridge i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  
Haswell i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  
Broadwell i7
(Desktop / Iris Pro 6200)
4 x 32 KB 4 x 32 KB 4 x 256 KB 6 MB 128 MB eDRAM

Haswell i5 4 x 32 KB 4 x 32 KB 4 x 256 KB 6 MB  
Broadwell i7
(i7-5700HQ / HD 5600)
4 x 32 KB 4 x 32 KB 4 x 256 KB 6 MB  

Both the Level 1 and Level 2 caches of each processor are the same, but at Level 3 where Haswell i7 has 8MB, the Broadwell i7 only has 6MB. Aside from the improved branch predictor mentioned above to reduce cache misses, Broadwell also has a separate eDRAM in the CPU package, weighing in at 128MB. This acts as a level 4 cache, having a latency between the L3 and moving out to memory, resulting in less trips out to main memory. This combination of architecture improvements and eDRAM on Broadwell combined with the lower L3 cache size makes it an unknown in memory performance.

I have also included the Haswell i5 and the Broadwell-based i7-5700HQ in this table, showing that the Broadwell i7 L1/L2/L3 cache hierarchy is more akin to a desktop i5 processor and that Broadwell is available without the eDRAM. That being said, the i7-5700HQ is a single processor destined for laptops, making any discrete testing nominally impossible, and taken out of the real-world context for the majority of Broadwell desktop owners.

The reason for Broadwell's eDRAM comes from Intel's ‘Crystal Well’ strategy. Crystal Well is a designation given to a processor which has this eDRAM (and typically a larger integrated graphics package as well). Integrated graphics are historically inhibited by memory bandwidth, having to almost always reach out to main memory to process textures in graphic workloads. The eDRAM allows more data to be stored between the graphics core and the memory, and at a higher memory bandwidth, potentially improving output. By combining the high end integrated graphics with the eDRAM, Intel created these Broadwell processors as the fastest integrated graphics solution available on a socketable (replaceable processor) platform.

That being said, due to Broadwell being the latest product from Intel, and it having the most recent (and expensive) process node, equipped with eDRAM which is a separate die on the package, the Broadwell solutions do not come cheap. The pricing is almost in line with previous Haswell mainstream i7 processors, albeit at the lower thermal design and the beefier integrated graphics. As we determined in the Broadwell Part 1 review, the desktop Broadwell has the absolute integrated performance crown, although an AMD APU system will be significantly more cost effective. Both platforms are hoping that multi-GPU possibilities in DirectX12 have a positive outcome to their solutions.

This Review

We said we'd be back for part 2, and this is it. Here I wanted to cover what we couldn't previously due to early BIOS revisions and limited time testing - specifically looking at how Broadwell performs when overclocking, and whether the Broadwell architecture is truly a step up over previous generations of Intel processors. In the last review our comparison point was the i7-4770K from Intel's Haswell line; for part two we also back tested the i7-3770K from the Ivy Bridge platform and the i7-2600K from Sandy Bridge, covering the four most recent Intel processor architectures dating back to January 2011. We also have data on older benchmarks going back further. All four of the most recent architectures are tested at their stock speeds and at a constant 3 GHz (at 1866 C9) to find how IPC improves. (Incidentally I did find an i7-750 and a Q9550 in my CPU bin for the next two generations back, but have no motherboards for testing our more recent benchmarks. I'll see what I can put together for a mini-piece later in the year.)

Test setup

Test Setup
Processor
Intel Core i7-5775C 65W 4C/8T 3.3 GHz / 3.7 GHz Broadwell
Intel Core i7-4770K 84W 4C/8T 3.5 GHz / 3.9 GHz Haswell
Intel Core i7-3770K 77W 4C/8T 3.5 GHz / 3.9 GHz Ivy Bridge
Intel Core i7-2600K 95W 4C/8T 3.4 GHz / 3.8 GHz Sandy Bridge
Motherboards MSI Z97A Gaming 6 (LGA1150)
ASRock Z77 OC Formula (LGA1155)
Cooling Cooler Master Nepton 140XL
Power Supply OCZ 1250W Gold ZX Series
Memory G.Skill RipjawsZ 4x4 GB DDR3-1866 9-11-11 Kit
Video Cards ASUS GTX 980 Strix 4GB
MSI GTX 770 Lightning 2GB (1150/1202 Boost)
ASUS R7 240 2GB
Hard Drive Crucial MX200 1TB
Optical Drive LG GH22NS50
Case Open Test Bed
Operating System Windows 7 64-bit SP1

Many thanks to

Thank you to AMD for providing us with the R9 290X 4GB GPUs.
Thank you to ASUS for providing us with GTX 980 Strix GPUs and the R7 240 DDR3 GPU.
Thank you to ASRock and ASUS for providing us with some IO testing kit.
Thank you to Cooler Master for providing us with Nepton 140XL CLCs.
Thank you to Corsair for providing us with an AX1200i PSU.
Thank you to Crucial for providing us with MX200 SSDs.
Thank you to G.Skill and Corsair for providing us with memory.
Thank you to MSI for providing us with the GTX 770 Lightning GPUs.
Thank you to OCZ for providing us with PSUs.
Thank you to Rosewill for providing us with PSUs and RK-9100 keyboards.



Overclocking Broadwell

For any user that has overclocked an Intel processor since Sandy Bridge, there is not much new to see here. Overclocking, for those unfamiliar with the term, means adjusting the settings of the system to make a component run faster, typically outside its specifications and at the expense of power but with the benefit of a faster system.

There's is a large community around overclocking, with motherboard manufacturers giving special options to make overclocking easier, as well as bigger and better CPU coolers to move the extra heat generated away from the processor faster to keep it cool. Some users use liquid cooling, either prebuilt arrangements or custom designs, on either the processor or the graphics card or both. One original purpose to overclocking was to end up buying a cheap component and ending with performance similar to an expensive component. Since 2011, Intel now restricts overclocking to a few high end models, meaning that the goal is to make the fastest, faster.

Asking a processor to run faster than its specifications requires more power. This is usually provided in terms of voltage. This increases the power into the system and raises energy lost as heat in the system, which has to be removed, and power consumption goes up (usually efficiency also goes down). Financial services and high frequency trading is an example of an industry that relies on ultimate fast response times regardless of efficiency, so overclocking is par for the course to get better results and make the trade faster than the next guy. Typically we are then left with individuals who need to process work quicker, or gamers looking for a better frame rate or the ability to increase settings without losing immersion. There is a separate group of individuals called extreme overclockers that are not concerned with everyday performance and insist on pushing the hardware to the limit by using coolants such as liquid nitrogen to remove the extra heat (350W+) away. These individuals are on the precipice of stability, only needing to be stable enough to run a benchmark and compare scores around the word. The best extreme overclockers are picked up by PC component manufacturers to help build future products (eg HiCookie and Sofos at GIGABYTE, NickShih at ASRock, Coolice and Shamino at ASUS) or at retailers to build a brand (8Pack at OverclockersUK).


Extreme overclocking at MSI’s HQ

Here at AnandTech, we mainly focus on 24/7 stability (although I have roots in the extreme overclocking community) as our diverse readership ranges from the non-clockers to enthusiasts. This means a good high end air cooler or a liquid cooler, namely in this case either the Cooler Master Nepton 140XL liquid cooler in a push/pull configuration with the supplied fans or a 2kg TRUE Copper air cooler with a 150CFM Delta fan. Both of these are more than sufficient to push the hardware for general overclocking and 24/7 use (though I hesitate to recommend the TRUE Copper for a regular system due to its mass unless upright).


The Cooler Master Nepton 140XL

In our testing, we keep it relatively simple. The frequency of a modern Intel processor is determined by the base frequency (~100 MHz) and the multiplier (20-45+). These two numbers are multiplied together to give the final frequency, and our overclocking is performed by raising the multiplier.

The other variable in overclocking is the voltage. All processors have an operating voltage out of the box, known as the VID or stock voltage. In general, the processor architecture will have a stock voltage within a certain range, and processors with that architecture will fall on the spectrum. As time goes on, we might find that the average VID falls on new processors within the same architecture due to improvements in the manufacturing process, but it ultimately is the luck of the draw. When a faster frequency is requested, this draws more power and in order to remain stable, the voltage should be increased. Most motherboards have an auto calibration tool for voltage based on the set frequency, though these tend to be very conservative values to ensure all processors are capable. Users can adjust the voltage with an offset (e.g. +0.100 volts) or in most cases can set the absolute voltage (as in 1.200 volts). For a given frequency, there will be a minimum voltage to which the processor is stable, and the process by-and-large is a case of trial and error. When the system works, the frequency/voltage combination is typically tested for stability using stress tests to ensure proper operation, as well as probing temperatures of the system to avoid overheating which causes the processor to override the settings and induce a low voltage/frequency mode to cool down.

There is a tertiary concern in that when a processor is performing work, the voltage across the processor will drop. This can result in instability, and there are two ways to approach this - a higher initial voltage, or adjusting what is called the load line calibration which will react to this drop. Both methods have their downsides, such as power consumption or temperatures, but where possible most users should adjust the load line calibration. This ensures a constant voltage no matter the processor workload.

At AnandTech, our overclocking regime is thus - we test the system at default settings and acquire the stock voltage for the stock frequency. Then we set the processor multiplier at one higher than normal, and set the voltage to the round down to the 0.1 volt level (e.g. 1.227 VID becomes 1.200). The system is then tested for stability, which our case is a simple stability test consisting of the POV-Ray benchmark, five minutes on the OCCT stress test and a run of 3DMark Firestrike. If this test regime is successful, and the CPU remains below 95C throughout without overheating, we mark it as a success and raise the multiplier by one. If any test fails (either the system does not boot, the system gets stuck or we get a blue screen), we raise the voltage by 0.025 volts and repeat the process at the same multiplier. All the adjustments are made in the BIOS and we get an overall picture of how the processor performance and temperature scales with voltage.

Here are our results with the Broadwell i7-5775C in the MSI Z97A Gaming 6 :

Our top result was 4.2 GHz on all cores, reaching 80C. When we selected 4.3 GHz, even with another 0.300 volts, the system would not be stable.

To a number of people, this is very disappointing. Previous Intel architectures have over clocked from 4.4 GHz to 5.0 GHz, so any increase in base performance for Broadwell is overshadowed by the higher frequency possible on older platforms. This has been a recent unfortunate trend in the overclocking performance of Intel’s high end processors since Sandy Bridge:

Intel 24/7 Overclocking Expected Results in MHz
  Stock Speed Good OC Great OC
Sandy Bridge i7 3400 4700 4900
Ivy Bridge i7 3500 4500 4700
Haswell i7 3500 4300 4500
Broadwell i7 3300 4100 4300

Not mentioned in the table, but for Haswell a Devil's Canyon based processor (such as the i7-4790K) could yield an extra +100-200 MHz in temperature limited situations as we found during our testing.

It is worth noting at least two points here. When Intel reduces the process node size, the elements of the processor are smaller and removing the heat generated is more problematic. Some of this can be mitigated through the fundamental design of the processor, such as not having heat generating logic next to each other and then used in the program in quick succession to make a hotspot. However, if a processor is fundamentally designed as a mobile first platform, overclocking may not even be a consideration at the design phase and merely tacked on as a ‘feature’ to certain models at the end.

Other methods have been used in the past to increase overclockability, such as changing the thermal interface material between the processor and the heatspreader. Intel did this on its Devil’s Canyon line of processors as a ‘Haswell upgrade’ and most results showed that it afforded another 10ºC of headroom. To that extent, many users interested in getting the most out of their Haswell processors found the best ways to remove a heatspreader (voiding the warranty) but getting better overclocking performance.

With all that said, it is important to consider what we are dealing here with Broadwell. This is a Crystal Well design, which looks like this:

This is an image taken for us when we reviewed the i7-4950HQ, the first Crystal Well based processor aimed specifically for high powered laptops and all-in-one devices. On the left is the processor die, and on the right is the eDRAM die, both on the same package. The thing to note here is that when the heatspreader is applied, different parts of the package will generate different amount of heat. As a result, this needs to be planned in accordance with the design.

What I’m specifically getting to here is thermal paste application. Many users here will have different comments about the best way to apply thermal paste, and for those following the industry they will remember how suggested methods change over time based on the silicon in the package. For the most part, the suggested methods revolve around a pea-sized blob in the center of the processor and a heatsink with sufficient force to help spread the paste. This minimizes air bubbles which can cause worse performance.

As a personal side note, I heavily discourage the credit card/spreading method due to the air bubble situation. The only arrangement where the spreading application is used should be for sub-zero overclocking.

With Broadwell, I took the pea-sized blob approach, strapped on a big cooler, and went to work. Almost immediately the processor temperature under load rose to 90ºC, which seemed extremely high. I turned the system off, removed the cooler, and placed it back on without doing anything, and the temperature under load dropped a few degrees. After some trial and error, the best anecdotal temperature arrangement was for a line of thermal paste from top to bottom of the CPU (where the arrow on the corner of the CPU is in the bottom left).

Put bluntly, the reason why this method works better than the pea is down to where the heat generating spots on the CPU are. With a pea sized blob in the middle, with a slightly wrong mounting, it will spread to the eDRAM rather than over the processor. A line ensures that both are covered, transferring heat to the cooler.

Now I should note that this method is useful when you are in a temperature limited overclock situation. It would seem that our CPU merely would not go above 4.2 GHz, regardless of the voltage applied. But in terms of thermal management, thermal paste application became important again.



Comparing IPC: Memory Latency and CPU Benchmarks

Being able to do more with less, in the processor space, allows both the task to be completed quicker and often for less power. While the concept of having multiple cores has allowed many programs to be run at once, such as IM, web, compute and so forth, we are all still limited by the fact that a lot of software is still relying on one line of code after another, pegging each software package to once core unless it can exploit a mulithreaded list of operations. This is referred to as the serial part of the software, and is the basis for many early programming classes – getting the software to compile and complete is more important than speed. But the truth is that having a few fast cores helps more than several thousand super slow cores. This is where Instructions Per Clock (IPC) comes in to play.

The principles behind extracting IPC are quite complex as one might imagine. Ideally every instruction a CPU gets should be read, executed and finished in one cycle, however that is never the case. The processor has to take the instruction, decode the instruction, gather the data (depends on where the data is), perform work on the data, then decide what to do with the result. Moving has never been more complicated, and the ability for a processor to hide latency, pre-prepare data by predicting future events or keeping hold of previous events for potential future use is all part of the plan. All the meanwhile there is an external focus on making sure power consumption is low and the frequency of the processor can scale depending on what the target device actually is.

For the most part, Intel has successfully increased IPC every generation of processor. In most cases, 5-10% with a node change and 5-25% with an architecture change with the most recent large jumps being with the Core architecture and the Sandy Bridge architectures, ushering in new waves of super-fast computational power. As Haswell to Broadwell is a node change with minor silicon updates, we should expect some gain but the main benefit should be efficiency by moving to a smaller node.

For this test we took Intel’s high-end i7 processors from the last four generations and set them to 3.0 GHz and with HyperThreading disabled. As each platform uses DDR3, we set the memory across each to DDR3-1866 with a CAS latency of 9. From a pure cache standpoint, here is how each of the processors performed:

Both Haswell and Broadwell have a small lead through the Level 1 Cache (32kB) and Level 2 Cache (256kB). It all changes from 6MB onwards as a result of the different cache levels between the processors. As the Broadwell based i7-5775C only has 6MB of L3 cache, this seems to effect the 4MB data set range, but between 8MB and 64MB values, the memory latency for Broadwell is substantially lower than any other Intel processor. This comes down to the eDRAM, which sticks around until 128MB.

Most memory accesses happen at lower data set ranges as the system attempts to predict the data needed. When data is not in the L1 cache, it is considered a cache miss and looks for the data in L2. When not in L2, look in L3. When not in L3, look in eDRAM/DDR3. From this perspective, the Broadwell based processors should have a slight advantage when it comes to large amounts of data accesses. Based on our previous testing, this means integrated graphics or high intensity CPU/DRAM workloads such as databases or matrix operations.

Here are the CPU results at 3.0 GHz:

Dolphin Benchmark: link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.

Dolphin Emulation Benchmark

Cinebench R15

Cinebench is a benchmark based around Cinema 4D, and is fairly well known among enthusiasts for stressing the CPU for a provided workload. Results are given as a score, where higher is better.

Cinebench R15 - Single Threaded

Cinebench R15 - Multi-Threaded

Point Calculations – 3D Movement Algorithm Test: link

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. For a brief explanation of the platform agnostic coding behind this benchmark, see my forum post here.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

Compression – WinRAR 5.0.1: link

Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.

WinRAR 5.01, 2867 files, 1.52 GB

Image Manipulation – FastStone Image Viewer 4.9: link

Similarly to WinRAR, the FastStone test us updated for 2014 to the latest version. FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and thus single threaded performance is often the winner.

FastStone Image Viewer 4.9

Video Conversion – Handbrake v0.9.9: link

Handbrake is a media conversion tool that was initially designed to help DVD ISOs and Video CDs into more common video formats. The principle today is still the same, primarily as an output for H.264 + AAC/MP3 audio within an MKV container. In our test we use the same videos as in the Xilisoft test, and results are given in frames per second.

HandBrake v0.9.9 LQ Film

HandBrake v0.9.9 2x4K

Rendering – PovRay 3.7: link

The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.

POV-Ray 3.7 Beta RC4

Synthetic – 7-Zip 9.2: link

As an open source compression tool, 7-Zip is a popular tool for making sets of files easier to handle and transfer. The software offers up its own benchmark, to which we report the result.

7-zip Benchmark

Overall: CPU IPC

*When this section was published initially, the timed benchmarks (those that rely on time rather than score) were caluclated incorrectly. The text has been updated to reflect the new calculations.

Removing WinRAR as a benchmark that obviously benefits from the eDRAM, we get an interesting look at how each generation has evolved over time. Taking Sandy Bridge (i7-2600K) as the base, we get the following:

As we can see, performance gains are everywhere although the total benefit is highly dependent on the benchmark in question. Cinebench in single threaded mode for example gives a 16.7% gain from Sandy Bridge to Broadwell, however Dolphin which is also single threaded gets a 58.1% improvement. Overall, a move from Sandy Bridge to Broadwell from an IPC perspective gives an average ~21% improvement. That is an increase in pure, raw throughput before considering frequency or any differentiator in core counts.

If we adjust this graph to show generation to generation improvement:

This graph shows something a little bit different. From these numbers:

Sandy Bridge to Ivy Bridge: Average ~5.0% Up
Ivy Bridge to Haswell: Average ~11.2% Up
Haswell to Broadwell: Average ~3.3% Up

Thus in a like for like environment, when eDRAM is not explicitly a driver for performance, Broadwell gives a 3.3% gain over Haswell. That’s a take home message worth considering, but it also affords the difference in performance between an architecture update and a node change.

Cycling back to our WinRAR test, things look a little different. Ivy Bridge to Haswell gives only a 3.2% difference, but the eDRAM in Broadwell slaps on another 23.8% performance increase, dropping the benchmark from 76.65 seconds to 63.91 seconds. When eDRAM counts, it counts a lot.



Comparing IPC: Discrete Gaming

One of the big marketing elements of a new platform from Intel is the added benefits to gaming. Given the growth of the gaming industry this decade, it makes sense to target one of the most socially active industries going. There is a potential danger here though – a large portion of the current gaming titles reach the law of limiting returns when it comes to frequency and cores. Windows 10 and DirectX 12 based titles on the horizon is hoping to change that to a certain extent, but currently CPU performance on a good enough processor is rarely enough to push some major pixel power, especially in single card scenarios.

For this set of tests, we kept things simple – a low end single R7 240 DDR3, an ex-high end GTX 770 Lightning and a top line GTX 980 on our standard CPU game set under normal conditions.

Alien: Isolation

If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.

For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.

Alien Isolation on ASUS R7 240 DDR3 2GB ($70)

Alien Isolation on MSI GTX 770 Lightning 2GB ($245)

Alien Isolation on ASUS GTX 980 Strix 4GB ($560)

Total War: Attila

The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.

For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.

Total War: Attila on ASUS R7 240 DDR3 2GB ($70)

Total War: Attila on MSI GTX 770 Lightning 2GB ($245)

Total War: Attila on ASUS GTX 980 Strix 4GB ($560)

Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).

Grand Theft Auto V on ASUS R7 240 DDR3 2GB ($70)

Grand Theft Auto V on MSI GTX 770 Lightning 2GB ($245)

Grand Theft Auto V on ASUS GTX 980 Strix 4GB ($560)

GRID: Autosport

No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.

GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.

GRID: Autosport on ASUS R7 240 DDR3 2GB ($70)

GRID: Autosport on MSI GTX 770 Lightning 2GB ($245)

GRID: Autosport on ASUS GTX 980 Strix 4GB ($560)

Middle-Earth: Shadows of Mordor

The final title in our testing is another battle of system performance with the open world action-adventure title, Shadows of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.

For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.

Shadows of Mordor on ASUS R7 240 DDR3 2GB ($70)

Shadows of Mordor on MSI GTX 770 Lightning 2GB ($245)

Shadows of Mordor on MSI GTX 770 Lightning 2GB ($245)

Shadows of Mordor on ASUS GTX 980 Strix 4GB ($560)

Shadows of Mordor on 2x ASUS GTX 980 Strix 4GB ($560)

Shadows of Mordor on 2x ASUS GTX 980 Strix 4GB ($560)

Conclusions on Gaming

From the graphs, it should be clear what the results say. On the whole, we saw 0-5% improvement in the gaming frame rates for these titles compared to Sandy Bridge, although one could consider that the better the graphics card, the bigger the marginal gain:

- With an R7 240, GTA had a 3.6% gain going from i7-2600K to the i7-5775C.
- The GTX 770 gave some odd results with Mordor, but Attila had a 5.5% gain overall.
- On the GTX 980, Grand Theft Auto had a 13.2% gain from the i7-2600K, and Attila saw a 5.7% gain from Haswell to Broadwell.

At the end of the day, the gains in gaming will be down to specific titles. That being said, Grand Theft Auto has been a good example recently of how to write a game (if you have the budget), and if the DX12 methodology can come to involve the CPU more in draw call latency, we might see a bigger generational gain with DX12 titles in the future.



Generational Tests: Office and Web Benchmarks

For this review, as mentioned on the front page, we retested some of the older CPUs under our new methodology. We did this testing at stock frequency as well as the IPC testing to see the ultimate real world result when you add in HyperThreading and frequency into the mix. If you recall back in our Devil’s Canyon i7-4790K review, the new high 4.4 GHz frequency of the i7-4790K was a tough one to beat for the newer architecture purely because any IPC gains are nullified by the older processor having a lot more frequency. With the Broadwell based i7-5775C being at 3.7 GHz and only 65W, this is a tough task. But what about if you are still running the Sandy Bridge based i7-2600K?

Some users will notice that in our benchmark database Bench, we keep data on the CPUs we’ve tested back over a decade and the benchmarks we were running back then. For a few of these benchmarks, such as Cinebench R10, we do actually run these on the new CPUs as well, although for the sake of brevity and relevance we tend not to put this data in the review. Well here are a few of those numbers too.

Cinebench R10 - Single Threaded Benchmark

Cinebench R10 - Multi-Threaded Benchmark

x264 HD Benchmark - 1st pass - v3.03

x264 HD Benchmark - 2nd pass - v3.03

With some of these benchmarks, due to applications using new instruction sets, having the newer processors with the new instructions can make a lot of difference. Even in Cinebench R10, moving from the Core 2 Quad Q9550 to a Broadwell can get a 2.5x speed-up in this old software.

For the rest of our CPU benchmarks, here is what the landscape looks like with the most recent architectures. All of our benchmark results can also be found in our benchmark engine, Bench.

Office Performance

The dynamics of CPU Turbo modes, both Intel and AMD, can cause concern during environments with a variable threaded workload. There is also an added issue of the motherboard remaining consistent, depending on how the motherboard manufacturer wants to add in their own boosting technologies over the ones that Intel would prefer they used. In order to remain consistent, we implement an OS-level unique high performance mode on all the CPUs we test which should override any motherboard manufacturer performance mode.

Dolphin Benchmark: link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.

Dolphin Emulation Benchmark

WinRAR 5.0.1: link

Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.

WinRAR 5.01, 2867 files, 1.52 GB

3D Particle Movement

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

FastStone Image Viewer 4.9

FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and results are given in seconds.

FastStone Image Viewer 4.9

Web Benchmarks

On the lower end processors, general usability is a big factor of experience, especially as we move into the HTML5 era of web browsing.  For our web benchmarks, we take four well known tests with Chrome 35 as a consistent browser.

Sunspider 1.0.2

Kraken 1.1

WebXPRT

Google Octane v2



Professional Performance: Windows

Agisoft Photoscan – 2D to 3D Image Manipulation: link

Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.

Agisoft PhotoScan Benchmark - Total Time

Cinebench R15

Cinebench is a benchmark based around Cinema 4D, and is fairly well known among enthusiasts for stressing the CPU for a provided workload. Results are given as a score, where higher is better.

Cinebench R15 - Single Threaded

Cinebench R15 - Multi-Threaded

HandBrake v0.9.9: link

For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container.  Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.

HandBrake v0.9.9 LQ Film

HandBrake v0.9.9 2x4K

Hybrid x265

Hybrid is a new benchmark, where we take a 4K 1500 frame video and convert it into an x265 format without audio. Results are given in frames per second.

Hybrid x265, 4K Video



Linux Performance

Built around several freely available benchmarks for Linux, Linux-Bench is a project spearheaded by Patrick at ServeTheHome to streamline about a dozen of these tests in a single neat package run via a set of three commands using an Ubuntu 11.04 LiveCD. These tests include fluid dynamics used by NASA, ray-tracing, OpenSSL, molecular modeling, and a scalable data structure server for web deployments. We run Linux-Bench and have chosen to report a select few of the tests that rely on CPU and DRAM speed.

C-Ray: link

C-Ray is a simple ray-tracing program that focuses almost exclusively on processor performance rather than DRAM access. The test in Linux-Bench renders a heavy complex scene offering a large scalable scenario.

Linux-Bench c-ray 1.1 (Hard)

NAMD, Scalable Molecular Dynamics: link

Developed by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign, NAMD is a set of parallel molecular dynamics codes for extreme parallelization up to and beyond 200,000 cores. The reference paper detailing NAMD has over 4000 citations, and our testing runs a small simulation where the calculation steps per unit time is the output vector.

Linux-Bench NAMD Molecular Dynamics

NPB, Fluid Dynamics: link

Aside from LINPACK, there are many other ways to benchmark supercomputers in terms of how effective they are for various types of mathematical processes. The NAS Parallel Benchmarks (NPB) are a set of small programs originally designed for NASA to test their supercomputers in terms of fluid dynamics simulations, useful for airflow reactions and design.

Linux-Bench NPB Fluid Dynamics

Redis: link

Many of the online applications rely on key-value caches and data structure servers to operate. Redis is an open-source, scalable web technology with a b developer base, but also relies heavily on memory bandwidth as well as CPU performance.

Linux-Bench Redis Memory-Key Store, 1x

Linux-Bench Redis Memory-Key Store, 10x

Linux-Bench Redis Memory-Key Store, 100x



Generational Tests: Gaming Benchmarks on Low End

For our low end tests, we are using the integrated graphics on each CPU as well as the R7 240 as a discrete card. For a couple of the tests, the i7-2600K integrated graphics failed to run as it was not supported.

Alien: Isolation

If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.

For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.

Alien Isolation on Integrated Graphics

Alien Isolation on ASUS R7 240 DDR3 2GB ($70)

Total War: Attila

The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.

For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.

Total War: Attila on Integrated Graphics

Total War: Attila on ASUS R7 240 DDR3 2GB ($70)

Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).

Grand Theft Auto V on Integrated Graphics

Grand Theft Auto V on ASUS R7 240 DDR3 2GB ($70)

GRID: Autosport

No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.

GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.

GRID: Autosport on Integrated Graphics

GRID: Autosport on ASUS R7 240 DDR3 2GB ($70)

Middle-Earth: Shadows of Mordor

The final title in our testing is another battle of system performance with the open world action-adventure title, Shadows of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.

For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.

Shadows of Mordor on Integrated Graphics

Shadows of Mordor on ASUS R7 240 DDR3 2GB ($70)



Gaming Benchmarks: Mid-Range

Our mid-range GPUs are an ex-high end NVIDIA GTX 770, specifically an MSI Lightning edition which can now be purchased second hand for a mid-range amount, and an AMD R9 285 featuring a Tonga GPU and the GCN 1.2 architecture.

Alien: Isolation

If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.

For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.

Alien Isolation on MSI R9 285 Gaming 2GB ($240)

Alien Isolation on MSI GTX 770 Lightning 2GB ($245)

Total War: Attila

The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.

For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.

Total War: Attila on MSI R9 285 Gaming 2GB ($240)

Total War: Attila on MSI GTX 770 Lightning 2GB ($245)

Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).

Grand Theft Auto V on MSI R9 285 Gaming 2GB ($240)

Grand Theft Auto V on MSI GTX 770 Lightning 2GB ($245)

GRID: Autosport

No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.

GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.

GRID: Autosport on MSI R9 285 Gaming 2GB ($240)

GRID: Autosport on MSI GTX 770 Lightning 2GB ($245)

Middle-Earth: Shadows of Mordor

The final title in our testing is another battle of system performance with the open world action-adventure title, Shadows of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.

For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.

Shadows of Mordor on MSI R9 285 Gaming 2GB ($240)

Shadows of Mordor on MSI R9 285 Gaming 2GB ($240)

Shadows of Mordor on MSI GTX 770 Lightning 2GB ($245)

Shadows of Mordor on MSI GTX 770 Lightning 2GB ($245)



Gaming Benchmarks: High End

At the top of the line we take the best GPUs on the market from May 2015 - an AMD R9 290X and an NVIDIA GTX 980.

Alien: Isolation

If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.

For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.

Alien Isolation on MSI R9 290X Gaming LE 4GB ($380)

Alien Isolation on ASUS GTX 980 Strix 4GB ($560)

Total War: Attila

The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.

For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.

Total War: Attila on MSI R9 290X Gaming LE 4GB ($380)

Total War: Attila on ASUS GTX 980 Strix 4GB ($560)

Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).

Grand Theft Auto V on MSI R9 290X Gaming LE 4GB ($380)

Grand Theft Auto V on ASUS GTX 980 Strix 4GB ($560)

GRID: Autosport

No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.

GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.

GRID: Autosport on MSI R9 290X Gaming LE 4GB ($380)

GRID: Autosport on ASUS GTX 980 Strix 4GB ($560)

Middle-Earth: Shadows of Mordor

The final title in our testing is another battle of system performance with the open world action-adventure title, Shadows of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.

For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.

Shadows of Mordor on MSI R9 290X Gaming LE 4GB ($380)

Shadows of Mordor on MSI R9 290X Gaming LE 4GB ($380)

Shadows of Mordor on 2x MSI R9 290X Gaming LE 4GB ($380)

Shadows of Mordor on ASUS GTX 980 Strix 4GB ($560)

Shadows of Mordor on ASUS GTX 980 Strix 4GB ($560)

Shadows of Mordor on 2x ASUS GTX 980 Strix 4GB ($560)



Conclusions: Broadwell Overclocking, IPC and Generational Gain

For everyone who has been in the PC industry for a decade or more, several key moments stand out when it comes to a better processor in the market. The Core architecture made leaps and bounds over the previous Pentium 4 Prescott debacle, primarily due to a refocus on efficiency over raw frequency. The Sandy Bridge architecture also came with a significant boost, moving the Northbridge on die and simplifying design.

Since then, despite the perseverance of (or soon to be mildly delayed) Moore’s Law, performance is measured differently. Efficiency, core count, integrated SIMD graphics, heterogeneous system architecture and specific instruction sets are now used due to the ever expanding and changing paradigm of user experience. Something that is fast for both compute and graphics, and then also uses near-zero power is the holy-grail in design. But let’s snap back to reality here – software is still designed in code one line at a time. The rate at which those lines are processed, particularly in response driven scenarios, is paramount. This is why the ‘instructions per clock/cycle’ metric, IPC, is still an important aspect of modern day computing.

As the movement from Haswell to Broadwell is a reduction in the lithography node, from 22nm to 14nm, with a few silicon changes, Broadwell was a mobile first design and launched in late 2014 with notebook parts. This is typical with node reductions due to the focus on efficiency overall rather than just performance. For the desktop parts, launched over six months later, we end up with an integrated graphics focused implementation purposefully designed for all-in-one PCs and integrated systems rather than a mainstream, high end processor. The i7 and i5 are both targeted at 65W, rather than 84W/88W of the previous architecture. This gives the CPUs a much lower frequency and without a corresponding IPC change, makes the upgrade path more focused for low end Haswell owners, those who are still several generations behind wanting an upgrade or those who specifically want an integrated graphics solution.

In our first look at Broadwell on the desktop, our recommendation that it would only appeal to those who need the best integrated graphics solution regardless of cost still stands. Part 2 has revealed that clock-for-clock, Broadwell gives 3.3% better performance from our tests although DRAM focused workloads (WinRAR) can benefit up to 25%, although those are few and far between. If we compare it back several generations, that small IPC gain is wiped out by processors like the i7-4790K that overpower the CPU performance in pure frequency or even the i7-4770K which still has a frequency advantage. From an overall CPU performance standpoint out of the box, the i7-5775C sits toe-to-toe with the i7-4770K with an average 1% loss. However, moving the comparison up to the i7-4790K and due to that frequency difference, the Broadwell CPU sits an average 12% behind it, except in those specific tests that can use the eDRAM.

There’s nothing much to be gained with overclocking either. Our i7-5775C CPU made 4.2 GHz, in line with Intel’s expectations for these processors. If we compare that to an overclocked 4.6 GHz i7-4790K, the 4790K is still the winner. Overclocking on these Broadwell CPUs still requires care, due to the arrangement of the CPU under the heatspreader with the added DRAM. We suggest the line method of thermal paste application rather than the large-pea method as a result.

Looking back on the generational improvements since Sandy Bridge is actually rather interesting. I remember using the i7-2600K, overclocking it to 5.0 GHz and remembering how stunned I was at the time. Step forward 4.5 years and we have a direct 21% increase in raw performance per clock, along with the added functionality benefits of faster memory and a chipset that offers a lot more functionality. If you’ve been following the technology industry lately, there is plenty of talk surrounding the upcoming launch of Skylake, an architectural update to Intel’s processor line on 14nm. I can’t wait to see how that performs in relation to the four generations tested in this article.

*When this article was initially published, inaccuracies were made in calculating the IPC gain in the timed benchmarks. The article has been updated to reflect this change. In light of the recalculation,overall conclusions are still correct.

Interesting related links:

The Intel Broadwell Desktop Review: Core i7-5775C and Core i5-5675C Tested (Part 1)
AnandTech Bench CPU Comparison Tool

Log in

Don't have an account? Sign up now