Original Link: https://www.anandtech.com/show/14043/upgrading-from-an-intel-core-i7-2600k-testing-sandy-bridge-in-2019



One of the most popular processors of the last decade has been the Intel Core i7-2600K. The design was revolutionary, as it offered a significant jump in single core performance, efficiency, and the top line processor was very overclockable. With the next few generations of processors from Intel being less exciting, or not giving users reasons to upgrade, and the phrase 'I'll stay with my 2600K' became ubiquitous on forums, and is even used today. For this review, we dusted off our box of old CPUs and put it in for a run through our 2019 benchmarks, both at stock and overclocked, to see if it is still a mainstream champion.


The Core i7 Family Photo

If you want to see all of our Core i7 benchmarks for each one of these CPUs, head over to anandtech.com/Bench

 

Why The 2600K Defined a Generation

Sit in a chair, lie back, and dream of 2010. It's a year when you looked at that old Core 2 Duo rig, or Athlon II system, and it was time for an upgrade. You had seen that Nehalem, and that the Core i7-920 was a handy overclocker and kicking some butt. It was a pleasant time, until Intel went and gave the industry a truly disruptive product whose nostalgia still rings with us today. 

 
The Core i7-2600K: The Fastest Sandy Bridge CPU (until 2700K)

That product was Sandy Bridge. AnandTech scored the exclusive on the review, and the results were almost impossible to believe, for many reasons. In our results at the time, it was by far and above a leap ahead of anything else we had seen, especially given the thermal monstrosities that Pentium 4 had produced several years previous. Built on Intel’s 32nm process, the redesign of the core was a turning point in performance on x86, one which has not been felt since. It would be another 8 years for AMD to have its ‘Sandy Bridge’ (or perhaps more appropriately, a 'Conroe') moment with Ryzen. Intel managed to stand on the shoulders of its previous best product and score a Grand Slam.

In that core design, Intel shook things up considerably. One key proponent was the micro-op cache, which means that recently decoded instructions that are needed again are taken already decoded, rather than wasting power being decoded again. For Intel with Sandy Bridge, and more recently with AMD on Ryzen, the inclusion of the micro-op cache has done wonders for single threaded performance. Intel also launched into improving its simultaneous multi-threading, which Intel has branded HyperThreading for generations, slowly improving the core by making more of it dynamically allocated for threads, rather than static and potentially losing performance.

The quad-core design of the highest processor of the family on launch day, the Core i7-2600K, became a staple through Intel’s next five generations of the architecture, all the way through Ivy Bridge, Haswell, Broadwell, Skylake, and Kaby Lake. Since Sandy Bridge, while Intel has moved to smaller process nodes and taken advantage of lower power, Intel has been unable to recreate that singular jump in raw instruction throughput, with incremental 1-7% increases year on year, using that power budget to increase operational buffers, execution ports, and instruction support.

With Intel unable to recreate the uplift of Sandy Bridge, and with the core microarchitecture defining a key moment in x86 performance, users who purchased a Core i7-2600K (I had two) stayed on it for a long time. So much so in fact that a lot of people expecting another big jump became increasingly frustrated – why invest in a Kaby Lake Core i7-7700K quad-core processor at 4.7 GHz turbo when the Sandy Bridge Core i7-2600K quad core processor is still overclocked to 5.0 GHz?

(Intel’s answer was typically for power consumption, and new features like PCIe 3.0 GPUs and storage. But that didn’t sway some users.)

This is why the Core i7-2600K defined a generation. It had staying power, much to Intel’s initial delight then subsequent frustration when users wouldn’t upgrade. We are now in 2019, and appreciate that when Intel moved beyond four cores on the mainstream, if users could stomach the cost of DDR4, either upgraded to a new Intel system, or went down the AMD route. But how does the Core i7-2600K hold up to 2019 workloads and games; or perhaps even better, how does the overclocked Core i7-2600K fare?

Compare and Contrast: Sandy Bridge vs. Kaby Lake vs. Coffee Lake

Truth be told, the Core i7-2600K was not the highest grade Sandy Bridge mainstream desktop processor. Months after the 2600K launched, Intel pushed a slightly higher clocked 2700K into the market. It performed almost the same, and overclocked to a similar amount, but cost a bit more. By this time, users who had made the jump were on the 2600K, and it stuck with us.

The Core i7-2600K was a 32nm quad-core processor with HyperThreading, offering a 3.4 GHz base frequency and a 3.8 GHz turbo frequency, with a listed 95W TDP. Back then, Intel’s TDP was more representative: in our recent test for this article, we measured an 88W peak power consumption when not overclocked. The processor also came with Intel HD 3000 integrated graphics, and supported DDR3-1333 memory as standard. Intel launched the chip with a tray price of $317.

For this article, I used the second i7-2600K I purchased back when they were new. It was tested at both its out of the box frequency, and an overclocked frequency of 4.7 GHz on all cores. This is a middling conservative overclock – the best chips managed 5.0 GHz or 5.1 GHz in a daily system. In fact, I distinctly remember my first Core i7-2600K getting 5.1 GHz all-core and 5.3 GHz all-core during an overclocking event in the middle of the peak district one winter with a room temperature around 2C, where I was using a strong liquid cooler and 720mm of radiators. Unfortunately I crippled that chip over time, and now it won’t even boot at stock frequency and voltage. So we have to use my second chip, which wasn’t so great, but still a good representation of an overclocked processor. For these results, we also used overclocked memory, at DDR3-2400 C11.

It’s worth noting that since the launch of the Core i7-2600K, we have moved on from Windows 7 to Windows 10. The Core i7-2600K doesn’t even support AVX2 instructions, and wasn’t built for Windows 10, so it will be interesting to see where this plays out.

 
The Core i7-7700K: Intel's last Core i7 Quad Core with HyperThreading

The fastest and latest (final?) quad-core processor with HyperThreading that Intel released was the Core i7-7700K, which falls under the Kaby Lake family. This processor was built on Intel’s improved 14nm process, runs at a 4.2 GHz base frequency, and a 4.5 GHz turbo frequency. The 91W rated TDP, at stock, translated to 95W power consumption in our testing. It comes with Intel’s Gen9 HD 630 Graphics, and supports DDR4-2400 memory as standard. Intel launched the chip with a tray price of $339.

The Intel Core i7-7700K (91W) Review: The New Out-of-the-box Performance Champion

At the same time as the 7700K, Intel also launched its first overclockable dual core with hyperthreading, the Core i3-7350K. During that review, we overclocked the Core i3 and compared it directly to the out-of-the-box Core i7-2600K, trying to answer the question if Intel had managed to make a dual-core reach a similar performance to its old flagship processor. While the i3 had the upper hand in single threaded performance and memory performance, the two fewer cores ultimately made most tasks heavy work for the Core i3.

 
The Core i7-9700K: Intel's Latest Top Core i7 (now with 8 cores)

Our final processor for testing is the Core i7-9700K. This is not the flagship of the current Coffee Lake generation (which is the i9-9900K), but has eight cores without hyperthreading. Going for the 9900K with double the cores and threads is just a little overkill, especially when it still has a tray price of $488. By contrast, the Core i7-9700K is ‘only’ sold in bulk at $374, with a 3.6 GHz base frequency and a 4.9 GHz turbo frequency. The 95W TDP falls foul of Intel’s definition of TDP, and in a consumer motherboard will actually consume ~125W at full load. Memory support is DDR4-2666 as standard.

Upgrading an Overclocked Intel Core i7-2600K
Comparison CPUs
  Core
i7-2600K
Core
i7-2600K
at 4.7 GHz
Core
i7-7700K
Core
i7-9700K
Released Jan 2011 Jan 2011 Jan 2017 Oct 2018
Price (1ku) $317 $317 $339 $374
Process 32nm 32nm 14nm 14++
uArch Sandy Bridge Sandy Bridge Kaby Lake Coffee Refresh
Cores 4 plus HT 4 plus HT 4 plus HT 8, no HT
Base Freq 3.4 GHz 4.7 GHz 4.2 GHz 3.6 GHz
Turbo Freq 3.8 GHz - 4.5 GHz 4.9 GHz
GPU Gen 6 6 9 9.5
GPU EUs 12 12 24 24
GPU Freq 1350 1350 1150 1200
DDR Support DDR3-1333 DDR3-2400 DDR4-2400 DDR4-2666
PCIe 2.0 x16 2.0 x16 3.0 x16 3.0 x16
AVX Yes Yes Yes Yes
AVX2 No No Yes Yes
Thermal Solder Solder Grease Solder
TDP 95 W N/A 91 W 95 W

The Core i7-2600K is stuck on DDR3 memory, has PCIe 2.0 rather than PCIe 3.0 support, and although not tested here, isn’t built for NVMe storage. It will be interesting to see just how close the overclocked results are to the Core i7-7700K in our tests, and how much of a direct uplift is seen moving to something like the Core i7-9700K.

Pages In This Review

  1. Tackling the Core i7-2600K in 2019
  2. Sandy Bridge: Inside the Core Microarchitecture
  3. Sandy Bridge: Outside the Core
  4. Test Bed and Setup
  5. 2018 and 2019 Benchmark Suite: Spectre and Meltdown Hardened
  6. CPU Performance: System Tests
  7. CPU Performance: Rendering Tests
  8. CPU Performance: Office Tests
  9. CPU Performance: Encoding Tests
  10. CPU Performance: Web and Legacy Tests
  11. Gaming: World of Tanks enCore
  12. Gaming: Final Fantasy XV
  13. Gaming: Civilization 6
  14. Gaming: Ashes Classic
  15. Gaming: Strange Brigade
  16. Gaming: Grand Theft Auto V
  17. Gaming: Far Cry 5
  18. Gaming: Shadow of the Tomb Raider
  19. Gaming: F1 2018
  20. Power Consumption
  21. Analyzing the Results
  22. Conclusions and Final Words


Sandy Bridge: Inside the Core Microarchitecture

In the modern era, we are talking about chips roughly the size of 100-200mm2 having up to eight high performance cores on the latest variants of Intel’s 14nm process or AMD’s use of GlobalFoundries / upcoming with TSMC. Back with Sandy Bridge, 32nm was a different beast. The manufacturing process was still planar without FinFETs, implementing Intel’s second generation High-K Metal Gate, and achieving 0.7x scaling compared to the larger 45nm previous. The Core i7-2600K was the largest quad core die, running at 216 mm2 and 1.16 billion transistors, which compared to the latest Coffee Lake processors on 14nm offer eight cores at ~170 mm2 and over 2 billion transistors.

The big leap of the era was in the microarchitecture. Sandy Bridge promised (and delivered) a significant uplift in raw clock-for-clock performance over the previous generation Westmere processors, and forms the base schema for Intel’s latest chips almost a decade later. A number of key innovations were first made available at retail through Sandy Bridge, which have been built upon and iterated over many times to get to the high performance we have today.

Through this page, I have largely used Anand’s initial report into the microarchitecture back in 2010 as a base, with additions based on the modern look on this processor design.

A Quick Recap: A Basic Out-of-Order CPU Core

For those new to CPU design, here’s a quick run through of how an out-of-order CPU works. Broadly speaking, a core is divided into the front end and back end, and data first comes into the front end.

In the front end, we have the prefetchers and branch predictors that will predict and pull in instructions from the main memory. The idea here is that if you can predict what data and instructions are needed next before they are needed, then you can save time by having that data close to the core when needed. The instructions are then placed into a decoder, which transforms the byte code instruction into a number of ‘micro-operations’ that the core can then use. There are different types of decoders for simple and complex instructions – simple x86 instructions map easily to one micro-op, whereas more complex instructions can decode to more – the ideal situation is a decode ratio as low as possible, although sometimes instructions can be split into more micro-ops if they can be run in parallel together (instruction level parallelism, or ILP).

If the core has a ‘micro-operation cache’, or uOp cache, then the results from each decoded instruction ends up there. The core can detect before an instruction is decoded if that particular instruction has been decoded recently, and use the result from the previous decode rather than doing a full decode which wastes power.

Now the uOps are now in an allocation queue, which for modern cores usually means that the core can detect if the instructions are part of a simple loop, or if it can fuse uOps together to make the whole thing go quicker, it can. The uOps are then fed into the re-order buffer, which forms the ‘back end’ of the core.

In the back end, starting with the re-order buffer, uOps can be rearranged depending on where the data each micro-op needs is. This buffer can rename and allocate uOps depending on where they need to go (integer vs FP), and depending on the core, it can also act as a retire station for complete instructions. After the re-order buffer, uOps are fed into the scheduler in a desired order to ensure data is ready and the uOp throughput is as high as possible.

In the scheduler, it passes the uOps into the execution ports (what does the compute) as required. Some cores have a unified scheduler between all the ports, however some split the scheduler depending on integer operations or vector style operations. Most out-of-order cores can have anywhere from 4 to 10 ports (or more), and these execution ports will do the math required on the data given the instruction passed through the core. Execution ports can take the form of a load unit (load from cache), a store unit (store into cache), an integer math unit, a floating point math unit, vector math units, special division units, and a few others for special operations. After the execution port is complete, the data can then be held for reuse in a cache, be pushed to main memory, while the instruction feeds into the retire queue, and finally retired.

This brief overview doesn’t touch on some of the mechanisms that modern cores use to help caching and data look up, such as transaction buffers, stream buffers, tagging, etc., some of which get iterative improvements every generation, but usually when we talk about ‘instructions per clock’ as a measure of performance, we aim to get as many instructions through the core (through the front end and back end) as many as possible – this relies on the decode strength of the front end, the prefetchers, the reorder buffers, and maximising the execution port use, along with retiring as many completed instructions as possible every clock cycle.

With this in mind, hopefully it will give context to some of Anand’s analysis back when Sandy Bridge was launched.

Sandy Bridge: The Front End

Sandy Bridge’s CPU architecture is evolutionary from a high level viewpoint but far more revolutionary in terms of the number of transistors that have been changed since Nehalem/Westmere. The biggest change for Sandy Bridge (and all microarchitectures since) is the micro-op cache (uOp cache).

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/uopcache.jpg

In Sandy Bridge, there’s now a micro-op cache that caches instructions as they’re decoded. There’s no sophisticated algorithm here, the cache simply grabs instructions as they’re decoded. When SB’s fetch hardware grabs a new instruction it first checks to see if the instruction is in the micro-op cache, if it is then the cache services the rest of the pipeline and the front end is powered down. The decode hardware is a very complex part of the x86 pipeline, turning it off saves a significant amount of power.

The cache is direct mapped and can store approximately 1.5K micro-ops, which is effectively the equivalent of a 6KB instruction cache. The micro-op cache is fully included in the L1 instructioncache and enjoys approximately an 80% hit rate for most applications. You get slightly higher and more consistent bandwidth from the micro-op cache vs. the instruction cache. The actual L1 instruction and data caches haven’t changed, they’re still 32KB each (for total of 64KB L1).

All instructions that are fed out of the decoder can be cached by this engine and as I mentioned before, it’s a blind cache - all instructions are cached. Least recently used data is evicted as it runs out of space. This may sound a lot like Pentium 4’s trace cache but with one major difference: it doesn’t cache traces. It really looks like an instruction cache that stores micro-ops instead of macro-ops (x86 instructions).

Along with the new micro-op cache, Intel also introduced a completely redesigned branch prediction unit. The new BPU is roughly the same footprint as its predecessor, but is much more accurate. The increase in accuracy is the result of three major innovations.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/BPU.jpg

The standard branch predictor is a 2-bit predictor. Each branch is marked in a table as taken/not taken with an associated confidence (strong/weak). Intel found that nearly all of the branches predicted by this bimodal predictor have a strong confidence. In Sandy Bridge, the bimodal branch predictor uses a single confidence bit for multiple branches rather than using one confidence bit per branch. As a result, you have the same number of bits in your branch history table representing many more branches, which can lead to more accurate predictions in the future.

Branch targets also got an efficiency makeover. In previous architectures there was a single size for branch targets, however it turns out that most targets are relatively close. Rather than storing all branch targets in large structures capable of addressing far away targets, SNB now includes support for multiple branch target sizes. With smaller target sizes there’s less wasted space and now the CPU can keep track of more targets, improving prediction speed.

Finally we have the conventional method of increasing the accuracy of a branch predictor: using more history bits. Unfortunately this only works well for certain types of branches that require looking at long patterns of instructions, and not well for shorter more common branches (e.g. loops, if/else). Sandy Bridge’s BPU partitions branches into those that need a short vs. long history for accurate prediction.

A Physical Register File

Compared to Westmere, Sandy Bridge moves to a physical register file. In Core 2 and Nehalem, every micro-op had a copy of every operand that it needed. This meant the out-of-order execution hardware (scheduler/reorder buffer/associated queues) had to be much larger as it needed to accommodate the micro-ops as well as their associated data. Back in the Core Duo days that was 80-bits of data. When Intel implemented SSE, the burden grew to 128-bits. With AVX however we now have potentially 256-bit operands associated with each instruction, and the amount that the scheduling/reordering hardware would have to grow to support the AVX execution hardware Intel wanted to enable was too much.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/PRF.jpg

A physical register file stores micro-op operands in the register file; as the micro-op travels down the OoO engine it only carries pointers to its operands and not the data itself. This significantly reduces the power of the out of order execution hardware (moving large amounts of data around a chip eats tons of power), and it also reduces die area further down the pipe. The die savings are translated into a larger out of order window.

The die area savings are key as they enable one of Sandy Bridge’s major innovations: AVX performance.

AVX

The AVX instructions support 256-bit operands, which as you can guess can eat up quite a bit of die area. The move to a physical register file enabled Intel to increase the OoO buffers to properly feed a higher throughput floating point engine. Intel clearly believes in AVX as it extended all of its SIMD units to 256-bit wide. The extension is done at minimal die expense. Nehalem has three execution ports and three stacks of execution units:

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/AVX1.jpg

Sandy Bridge allows 256-bit AVX instructions to borrow 128-bits of the integer SIMD datapath. This minimizes the impact of AVX on the execution die area while enabling twice the FP throughput, you get two 256-bit AVX operations per clock (+ one 256-bit AVX load).

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/AVX2.jpg

Granted you can’t mix 256-bit AVX and 128-bit integer SSE ops, however remember SNB now has larger buffers to help extract more instruction level parallelism (ILP).

Load and Store

The improvements to Sandy Bridge’s FP performance increase the demands on the load/store units. In Nehalem/Westmere you had three LS ports: load, store address and store data.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/memory1.jpg

In SNB, the load and store address ports are now symmetric so each port can service a load or store address. This doubles the load bandwidth compared to Westmere, which is important as Intel doubled the peak floating point performance in Sandy Bridge.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/memory2.jpg

There are some integer execution improvements in Sandy Bridge, although they are more limited. Add with carry (ADC) instruction throughput is doubled, while large scale multiplies (64 * 64) see a ~25% speedup.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/otherarch.jpg



Sandy Bridge: Outside the Core

With the growth of multi-core processors, managing how data flows between the cores and memory has been an important topic of late. We have seen a variety of different ways to move the data around a CPU, such as crossbars, rings, meshes, and in the future, completely separate central IO chips. The battle of the next decade (2020+), as mentioned previously here on AnandTech, is going to the battle of the interconnect, and how it develops moving forward.

What makes Sandy Bridge special in this instance is that it was the first consumer CPU from Intel to use a ring bus that connects all the cores, the memory, the last level cache, and the integrated graphics. This is still a similar design to the eight core Coffee Lake parts we see today.

The Ring Bus

With Nehalem/Westmere all of the cores had their own private path to the last level (L3) cache. That’s roughly 1000 wires per core, and more wires consume more power as well as being more difficult to implement the more you have. The problem with this approach is that it doesn’t work well as you scale up in things that need access to the L3 cache.

As Sandy Bridge adds a GPU and video transcoding engine on-die that share the L3 cache, rather than laying out more wires to the L3, Intel introduced a ring bus.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/ringbus.jpg

Architecturally, this is the same ring bus used in Nehalem EX and Westmere EX. Each core, each slice of L3 (LLC) cache, the on-die GPU, media engine and the system agent (fancy word for North Bridge) all have a stop on the ring bus. The bus is made up of four independent rings: a data ring, request ring, acknowledge ring and snoop ring. Each stop for each ring can accept 32-bytes of data per clock. As you increase core count and cache size, your cache bandwidth increases accordingly.

Per core you get the same amount of L3 cache bandwidth as in high end Westmere parts - 96GB/s. Aggregate bandwidth is 4x that in a quad-core system since you get a ring stop per core (384GB/s).

This means that L3 latency is significantly reduced from around 36 cycles in Westmere to 26 - 31 cycles in Sandy Bridge, with some variable cache latency as it depends on what core is accessing what slice of cache. Also unlike Westmere, the L3 cache now runs at the core clock speed - the concept of the un-core still exists but Intel calls it the “system agent” instead and it no longer includes the L3 cache. (The term ‘un-core’ is still in use today to describe interconnects.)

With the L3 cache running at the core clock you get the benefit of a much faster cache. The downside is the L3 underclocks itself in tandem with the processor cores as turbo and idle modes come into play. If the GPU needs the L3 while the CPUs are downclocked, the L3 cache won’t be running as fast as it could had it been independent, or the system has to power on the core and consume extra power.

The L3 cache is divided into slices, one associated with each core. As Sandy Bridge has a fully accessible L3 cache, each core can address the entire cache. Each slice gets its own stop and each slice has a full cache pipeline. In Westmere there was a single cache pipeline and queue that all cores forwarded requests to, but in Sandy Bridge it’s distributed per cache slice. The use of ring wire routing means that there is no big die area impact as more stops are added to the ring. Despite each of the consumers/producers on the ring get their own stop, the ring always takes the shortest path. Bus arbitration is distributed on the ring, each stop knows if there’s an empty slot on the ring one clock before.

The System Agent

For some reason Intel stopped using the term un-core in SB, and for Sandy Bridge it’s called the System Agent. (Again, un-core is now back in vogue for interconnects, IO, and memory controllers). The System Agent houses the traditional North Bridge. You get 16 PCIe 2.0 lanes that can be split into two x8s. There’s a redesigned dual-channel DDR3 memory controller that finally restores memory latency to around Lynnfield levels (Clarkdale moved the memory controller off the CPU die and onto the GPU).

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/systemagent.jpg

The SA also has the DMI interface, display engine and the PCU (Power Control Unit). The SA clock speed is lower than the rest of the core and it is on its own power plane.

Sandy Bridge Graphics

Another large performance improvement on Sandy Bridge vs. Westmere is in the graphics. While the CPU cores show a 10 - 30% improvement in performance, Sandy Bridge graphics performance is easily double what Intel delivered with pre-Westmere (Clarkdale/Arrandale). Despite the jump from 45nm to 32nm, SNB graphics improves through a significant increase in IPC.

The Sandy Bridge GPU is on-die built out of the same 32nm transistors as the CPU cores. The GPU is on its own power island and clock domain. The GPU can be powered down or clocked up independently of the CPU. Graphics turbo is available on both desktop and mobile parts, and you get more graphics turbo on Sandy Bridge.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/sharedL3.jpg

The GPU is treated like an equal citizen in the Sandy Bridge world, it gets equal access to the L3 cache. The graphics driver controls what gets into the L3 cache and you can even limit how much cache the GPU is able to use. Storing graphics data in the cache is particularly important as it saves trips to main memory which are costly from both a performance and power standpoint. Redesigning a GPU to make use of a cache isn’t a simple task.

SNB graphics (internally referred to as Gen 6 graphics) makes extensive use of fixed function hardware. The design mentality was anything that could be described by a fixed function should be implemented in fixed function hardware. The benefit is performance/power/die area efficiency, at the expense of flexibility.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/EUimprovement.jpg

The programmable shader hardware is composed of shaders/cores/execution units that Intel calls EUs. Each EU can dual issue picking instructions from multiple threads. The internal ISA maps one-to-one with most DirectX 10 API instructions resulting in a very CISC-like architecture. Moving to one-to-one API to instruction mapping increases IPC by effectively increasing the width of the EUs.

There are other improvements within the EU. Transcendental math is handled by hardware in the EU and its performance has been sped up considerably. Intel told us that sine and cosine operations are several orders of magnitude faster now than they were in pre-Westmere graphics.

In previous Intel graphics architectures, the register file was repartitioned on the fly. If a thread needed fewer registers, the remaining registers could be allocated to another thread. While this was a great approach for saving die area, it proved to be a limiter for performance. In many cases threads couldn’t be worked on as there were no registers available for use. Intel moved from 64 to 80 registers per thread and finally to 120 for Sandy Bridge. The register count limiting thread count scenarios were alleviated.

At the time, all of these enhancements resulted in 2x the instruction throughput per EU.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/SC2.jpg
Sandy Bridge vs. NVIDIA GeForce 310M Playing Starcraft 2

At launch there were two versions of Sandy Bridge graphics: one with 6 EUs and one with 12 EUs. All mobile parts (at launch) will use 12 EUs, while desktop SKUs may either use 6 or 12 depending on the model. Sandy Bridge was a step in the right direction for Intel, where integrated graphics were starting to become a requirement in anything consumer related, and Intel would slowly start to push the percentage of die area dedicated to GPU. Modern day equivalent desktop processors (2019) have 24 EUs (Gen 9.5), while future 10nm CPUs will have ~64 EUs (Gen11).

Sandy Bridge Media Engine

Sitting alongside the GPU is Sandy Bridge’s Media processor. Media processing in SNB is composed of two major components: video decode, and video encode.

The hardware accelerated decode engine is improved from the current generation: the entire video pipeline is now decoded via fixed function units. This is contrast to Intel’s pre-SNB design that uses the EU array for some video decode stages. As a result, Intel claims that SNB processor power is cut in half for HD video playback.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/decoder.jpg

The video encode engine was a brand new addition to Sandy Bridge. Intel took a ~3 minute 1080p 30Mbps source video and transcoded it to a 640 x 360 iPhone video format. The total process took 14 seconds and completed at a rate of roughly 400 frames per second.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/encoder.jpg

The fixed function encode/decode mentality is now pervasive in any graphics hardware for desktops and even smartphones. At the time, Sandy Bridge was using 3mm2 of the die for this basic encode/decode structure.

New, More Aggressive Turbo

Lynnfield was the first Intel CPU to aggressively pursue the idea of dynamically increasing the core clock of active CPU cores while powering down idle cores. The idea is that if you have a 95W TDP for a quad-core CPU, but three of those four cores are idle, then you can increase the clock speed of the one active core until you hit a turbo limit.

In all current generation processors the assumption is that the CPU reaches a turbo power limit immediately upon enabling turbo. In reality however, the CPU doesn’t heat up immediately - there’s a period of time where the CPU isn’t dissipating its full power consumption - there’s a ramp.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/nextgenturbo.jpg

Sandy Bridge takes advantage of this by allowing the PCU to turbo up active cores above TDP for short periods of time (up to 25 seconds). The PCU keeps track of available thermal budget while idle and spends it when CPU demand goes up. The longer the CPU remains idle, the more potential it has to ramp up above TDP later on. When a workload comes around, the CPU can turbo above its TDP and step down as the processor heats up, eventually settling down at its TDP. While SNB can turbo up beyond its TDP, the PCU won’t allow the chip to exceed any reliability limits.

https://images.anandtech.com/reviews/cpu/intel/sandybridge/arch/corengpupower.jpg

Both CPU and GPU turbo can work in tandem. Workloads that are more GPU bound running on SNB can result in the CPU cores clocking down and the GPU clocking up, while CPU bound tasks can drop the GPU frequency and increase CPU frequency. Sandy Bridge as a whole was a much more dynamic of a beast than anything that’s come before it.



Test Bed and Setup

As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer. Where possible, we will extend out testing to include faster memory modules either at the same time as the review or a later date.

Test Setup
Intel i7-9700K ASRock Z370
Pro Gaming i7
P3.20 TRUE Copper Corsair Vengeance
4x8GB
DDR4-2666
Intel i7-7700K GIGABYTE X170
Extreme-ECC
F21e Silverstone
AR10-115XS*
G.Skill RipjawsV
2x16GB
DDR4-2400
Intel i7-2600K (OC) ASRock Z77
OC Formula
P2.40 TRUE Copper GeIL Evo Veloce
2x8GB
DDR3-2400
Intel i7-2600K ASRock Z77
OC Formula
P2.40 TRUE Copper G.Skill Ares
4x4 GB
DDR3-1333
GPU Sapphire RX 460 2GB (CPU Tests)
MSI GTX 1080 Gaming 8G (Gaming Tests)
PSU Corsair AX860i
Corsair AX1200i
SSD Crucial MX200 1TB
OS Windows 10 x64 RS3 1709
Spectre and Meltdown Patched
*VRM Supplimented with SST-FHP141-VF 173 CFM fans

Many thanks to...

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Hardware Providers
Sapphire RX 460 Nitro MSI GTX 1080 Gaming X OC Crucial MX200 +
MX500 SSDs
Corsair AX860i +
AX1200i PSUs
G.Skill RipjawsV,
SniperX, FlareX
Crucial Ballistix
DDR4
Silverstone
Coolers
Silverstone
Fans


Our New Testing Suite for 2019 and 2020

Spectre and Meltdown Hardened

In order to keep up to date with our testing, we have to update our software every so often to stay relevant. In our updates we typically implement the latest operating system, the latest patches, the latest software revisions, the newest graphics drivers, as well as add new tests or remove old ones. As regular readers will know, our CPU testing revolves an automated test suite, and depending on how the newest software works, the suite either needs to change, be updated, have tests removed, or be rewritten completely. Last time we did a full re-write, it took the best part of a month, including regression testing (testing older processors).

One of the key elements of our testing update for 2018 (and 2019) is the fact that our scripts and systems are designed to be hardened for Spectre and Meltdown. This means making sure that all of our BIOSes are updated with the latest microcode, and all the steps are in place with our operating system with updates. In this case we are using Windows 10 x64 Enterprise 1709 with April security updates which enforces Smeltdown (our combined name) mitigations. Uses might ask why we are not running Windows 10 x64 RS4, the latest major update – this is due to some new features which are giving uneven results. Rather than spend a few weeks learning to disable them, we’re going ahead with RS3 which has been widely used.

Our previous benchmark suite was split into several segments depending on how the test is usually perceived. Our new test suite follows similar lines, and we run the tests based on:

  • Power
  • Memory
  • Office
  • System
  • Render
  • Encoding
  • Web
  • Legacy
  • Integrated Gaming
  • CPU Gaming

Depending on the focus of the review, the order of these benchmarks might change, or some left out of the main review. All of our data will reside in our benchmark database, Bench, for which there is a new ‘CPU 2019’ section for all of our new tests.

Within each section, we will have the following tests:

Power

Our power tests consist of running a substantial workload for every thread in the system, and then probing the power registers on the chip to find out details such as core power, package power, DRAM power, IO power, and per-core power. This all depends on how much information is given by the manufacturer of the chip: sometimes a lot, sometimes not at all.

We are currently running POV-Ray as our main test for Power, as it seems to hit deep into the system and is very consistent. In order to limit the number of cores for power, we use an affinity mask driven from the command line.

Memory

These tests involve disabling all turbo modes in the system, forcing it to run at base frequency, and them implementing both a memory latency checker (Intel’s Memory Latency Checker works equally well for both platforms) and AIDA64 to probe cache bandwidth.

Office

  • Chromium Compile: Windows VC++ Compile of Chrome 56 (same as 2017)
  • PCMark10: Primary data will be the overview results – subtest results will be in Bench
  • 3DMark Physics: We test every physics sub-test for Bench, and report the major ones (new)
  • GeekBench4: By request (new)
  • SYSmark 2018: Recently released by BAPCo, currently automating it into our suite (new, when feasible)

System

  • Application Load: Time to load GIMP 2.10.4 (new)
  • FCAT: Time to process a 90 second ROTR 1440p recording (same as 2017)
  • 3D Particle Movement: Particle distribution test (same as 2017) – we also have AVX2 and AVX512 versions of this, which may be added later
  • Dolphin 5.0: Console emulation test (same as 2017)
  • DigiCortex: Sea Slug Brain simulation (same as 2017)
  • y-Cruncher v0.7.6: Pi calculation with optimized instruction sets for new CPUs (new)
  • Agisoft Photoscan 1.3.3: 2D image to 3D modelling tool (updated)

Render

  • Corona 1.3: Performance renderer for 3dsMax, Cinema4D (same as 2017)
  • Blender 2.79b: Render of bmw27 on CPU (updated to 2.79b)
  • LuxMark v3.1 C++ and OpenCL: Test of different rendering code paths (same as 2017)
  • POV-Ray 3.7.1: Built-in benchmark (updated)
  • CineBench R15: Older Cinema4D test, will likely remain in Bench (same as 2017)

Encoding

  • 7-zip 1805: Built-in benchmark (updated to v1805)
  • WinRAR 5.60b3: Compression test of directory with video and web files (updated to 5.60b3)
  • AES Encryption: In-memory AES performance. Slightly older test. (same as 2017)
  • Handbrake 1.1.0: Logitech C920 1080p60 input file, transcoded into three formats for streaming/storage:
    • 720p60, x264, 6000 kbps CBR, Fast, High Profile
    • 1080p60, x264, 3500 kbps CBR, Faster, Main Profile
    • 1080p60, HEVC, 3500 kbps VBR, Fast, 2-Pass Main Profile

Web

  • WebXPRT3: The latest WebXPRT test (updated)
  • WebXPRT15: Similar to 3, but slightly older. (same as 2017)
  • Speedometer2: Javascript Framework test (new)
  • Google Octane 2.0: Depreciated but popular web test (same as 2017)
  • Mozilla Kraken 1.1: Depreciated but popular web test (same as 2017)

Legacy (same as 2017)

  • 3DPM v1: Older version of 3DPM, very naïve code
  • x264 HD 3.0: Older transcode benchmark
  • Cinebench R11.5 and R10: Representative of different coding methodologies

Integrated and CPU Gaming

We have recently automated around a dozen games at four different performance levels. A good number of games will have frame time data, however due to automation complications, some will not. The idea is that we get a good overview of a number of different genres and engines for testing. So far we have the following games automated:

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
World of Tanks enCore Driving / Action Feb
2018
DX11 768p
Minimum
1080p
Medium
1080p
Ultra
4K
Ultra
Final Fantasy XV JRPG Mar
2018
DX11 720p
Standard
1080p
Standard
4K
Standard
8K
Standard
Shadow of War Action / RPG Sep
2017
DX11 720p
Ultra
1080p
Ultra
4K
High
8K
High
F1 2018 Racing Aug
2018
DX11 720p
Low
1080p
Med
4K
High
4K
Ultra
Civilization VI RTS Oct
2016
DX12 1080p
Ultra
4K
Ultra
8K
Ultra
16K
Low
Ashes: Classic RTS Mar
2016
DX12 720p
Standard
1080p
Standard
1440p
Standard
4K
Standard
Strange Brigade* FPS Aug
2018
DX12
Vulkan
720p
Low
1080p
Medium
1440p
High
4K
Ultra
Shadow of the Tomb Raider Action Sep
2018
DX12 720p
Low
1080p
Medium
1440p
High
4K
Highest
Grand Theft Auto V Open World Apr
2015
DX11 720p
Low
1080p
High
1440p
Very High
4K
Ultra
Far Cry 5 FPS Mar
2018
DX11 720p
Low
1080p
Normal
1440p
High
4K
Ultra
*Strange Brigade is run in DX12 and Vulkan modes

For our CPU Gaming tests, we will be running on an NVIDIA GTX 1080. For the CPU benchmarks, we use an RX460 as we now have several units for concurrent testing.

In previous years we tested multiple GPUs on a small number of games – this time around, due to a Twitter poll I did which turned out exactly 50:50, we are doing it the other way around: more games, fewer GPUs.

Scale Up vs Scale Out: Benefits of Automation

One comment we get every now and again is that automation isn’t the best way of testing – there’s a higher barrier to entry, and it limits the tests that can be done. From our perspective, despite taking a little while to program properly (and get it right), automation means we can do several things:

  1. Guarantee consistent breaks between tests for cooldown to occur, rather than variable cooldown times based on ‘if I’m looking at the screen’
  2. It allows us to simultaneously test several systems at once. I currently run five systems in my office (limited by the number of 4K monitors, and space) which means we can process more hardware at the same time
  3. We can leave tests to run overnight, very useful for a deadline
  4. With a good enough script, tests can be added very easily

Our benchmark suite collates all the results and spits out data as the tests are running to a central storage platform, which I can probe mid-run to update data as it comes through. This also acts as a mental check in case any of the data might be abnormal.

We do have one major limitation, and that rests on the side of our gaming tests. We are running multiple tests through one Steam account, some of which (like GTA) are online only. As Steam only lets one system play on an account at once, our gaming script probes Steam’s own APIs to determine if we are ‘online’ or not, and to run offline tests until the account is free to be logged in on that system. Depending on the number of games we test that absolutely require online mode, it can be a bit of a bottleneck.

Benchmark Suite Updates

As always, we do take requests. It helps us understand the workloads that everyone is running and plan accordingly.

A side note on software packages: we have had requests for tests on software such as ANSYS, or other professional grade software. The downside of testing this software is licensing and scale. Most of these companies do not particularly care about us running tests, and state it’s not part of their goals. Others, like Agisoft, are more than willing to help. If you are involved in these software packages, the best way to see us benchmark them is to reach out. We have special versions of software for some of our tests, and if we can get something that works, and relevant to the audience, then we shouldn’t have too much difficulty adding it to the suite.



CPU Performance: System Tests

Our System Test section focuses significantly on real-world testing, user experience, with a slight nod to throughput. In this section we cover application loading time, image processing, simple scientific physics, emulation, neural simulation, optimized compute, and 3D model development, with a combination of readily available and custom software. For some of these tests, the bigger suites such as PCMark do cover them (we publish those values in our office section), although multiple perspectives is always beneficial. In all our tests we will explain in-depth what is being tested, and how we are testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Application Load: GIMP 2.10.4

One of the most important aspects about user experience and workflow is how fast does a system respond. A good test of this is to see how long it takes for an application to load. Most applications these days, when on an SSD, load fairly instantly, however some office tools require asset pre-loading before being available. Most operating systems employ caching as well, so when certain software is loaded repeatedly (web browser, office tools), then can be initialized much quicker.

In our last suite, we tested how long it took to load a large PDF in Adobe Acrobat. Unfortunately this test was a nightmare to program for, and didn’t transfer over to Win10 RS3 easily. In the meantime we discovered an application that can automate this test, and we put it up against GIMP, a popular free open-source online photo editing tool, and the major alternative to Adobe Photoshop. We set it to load a large 50MB design template, and perform the load 10 times with 10 seconds in-between each. Due to caching, the first 3-5 results are often slower than the rest, and time to cache can be inconsistent, we take the average of the last five results to show CPU processing on cached loading.

AppTimer: GIMP 2.10.4

Even overclocked, the 2600K doesn't quite reach the 7700K performance, while the 9700K with the higher single thread frequency takes a healthy lead.

FCAT: Image Processing

The FCAT software was developed to help detect microstuttering, dropped frames, and run frames in graphics benchmarks when two accelerators were paired together to render a scene. Due to game engines and graphics drivers, not all GPU combinations performed ideally, which led to this software fixing colors to each rendered frame and dynamic raw recording of the data using a video capture device.

The FCAT software takes that recorded video, which in our case is 90 seconds of a 1440p run of Rise of the Tomb Raider, and processes that color data into frame time data so the system can plot an ‘observed’ frame rate, and correlate that to the power consumption of the accelerators. This test, by virtue of how quickly it was put together, is single threaded. We run the process and report the time to completion.

FCAT Processing ROTR 1440p GTX980Ti Data

FCAT is another single threaded test, so we're seeing the same performance differences: the 2600K overclocked can't quite match the 7700K at stock, while the 9700K goes out into the lead.

3D Particle Movement v2.1: Brownian Motion

Our 3DPM test is a custom built benchmark designed to simulate six different particle movement algorithms of points in a 3D space. The algorithms were developed as part of my PhD., and while ultimately perform best on a GPU, provide a good idea on how instruction streams are interpreted by different microarchitectures.

A key part of the algorithms is the random number generation – we use relatively fast generation which ends up implementing dependency chains in the code. The upgrade over the naïve first version of this code solved for false sharing in the caches, a major bottleneck. We are also looking at AVX2 and AVX512 versions of this benchmark for future reviews.

For this test, we run a stock particle set over the six algorithms for 20 seconds apiece, with 10 second pauses, and report the total rate of particle movement, in millions of operations (movements) per second. We have a non-AVX version and an AVX version, with the latter implementing AVX512 and AVX2 where possible.

3DPM v2.1 can be downloaded from our server: 3DPMv2.1.rar (13.0 MB)

3D Particle Movement v2.1

3D Particle Movement v2.1 (with AVX)

As the 2600K does not have AVX2, it ends up severely lacking behind the 7700K/9700K when the program is optimized for the new instructions.

Dolphin 5.0: Console Emulation

One of the popular requested tests in our suite is to do with console emulation. Being able to pick up a game from an older system and run it as expected depends on the overhead of the emulator: it takes a significantly more powerful x86 system to be able to accurately emulate an older non-x86 console, especially if code for that console was made to abuse certain physical bugs in the hardware.

For our test, we use the popular Dolphin emulation software, and run a compute project through it to determine how close to a standard console system our processors can emulate. In this test, a Nintendo Wii would take around 1050 seconds.

The latest version of Dolphin can be downloaded from https://dolphin-emu.org/

Dolphin 5.0 Render Test

Dolphin gained substantial performance around the Haswell/Broadwell era, hence the incredible performance gain from 2600K to 7700K. Unfortunaetly for some reason the overclocked CPU failed this test.

DigiCortex 1.20: Sea Slug Brain Simulation

This benchmark was originally designed for simulation and visualization of neuron and synapse activity, as is commonly found in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron / 1.8B synapse simulation, equivalent to a Sea Slug.

Example of a 2.1B neuron simulation

We report the results as the ability to simulate the data as a fraction of real-time, so anything above a ‘one’ is suitable for real-time work. Out of the two modes, a ‘non-firing’ mode which is DRAM heavy and a ‘firing’ mode which has CPU work, we choose the latter. Despite this, the benchmark is still affected by DRAM speed a fair amount.

DigiCortex can be downloaded from http://www.digicortex.net/

DigiCortex 1.20 (32k Neuron, 1.8B Synapse)

For memory related tests, we ran the systems at their Intel designated supported frequencies, except for the OC system, which got a healthy boost from DDR3-1333 to DDR3-2400. The results show the bump in performance, but even a 7700K at stock wins out. Jumping up to the 9700K gets added core performance.

y-Cruncher v0.7.6: Microarchitecture Optimized Compute

I’ve known about y-Cruncher for a while, as a tool to help compute various mathematical constants, but it wasn’t until I began talking with its developer, Alex Yee, a researcher from NWU and now software optimization developer, that I realized that he has optimized the software like crazy to get the best performance. Naturally, any simulation that can take 20+ days can benefit from a 1% performance increase! Alex started y-cruncher as a high-school project, but it is now at a state where Alex is keeping it up to date to take advantage of the latest instruction sets before they are even made available in hardware.

For our test we run y-cruncher v0.7.6 through all the different optimized variants of the binary, single threaded and multi-threaded, including the AVX-512 optimized binaries. The test is to calculate 250m digits of Pi, and we use the single threaded and multi-threaded versions of this test.

Users can download y-cruncher from Alex’s website: http://www.numberworld.org/y-cruncher/

y-Cruncher 0.7.6 Single Thread, 250m Digitsy-Cruncher 0.7.6 Multi-Thread, 250m Digits

y-cruncher is another benchmark that implements as many AVX acceleration functions as possible, showcasing how newer chips than Sandy Bridge have additional benefits.

Agisoft Photoscan 1.3.3: 2D Image to 3D Model Conversion

One of the ISVs that we have worked with for a number of years is Agisoft, who develop software called PhotoScan that transforms a number of 2D images into a 3D model. This is an important tool in model development and archiving, and relies on a number of single threaded and multi-threaded algorithms to go from one side of the computation to the other.

In our test, we take v1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos and push it through a reasonably fast variant of the algorithms, but is still more stringent than our 2017 test. We report the total time to complete the process.

Agisoft’s Photoscan website can be found here: http://www.agisoft.com/

Agisoft Photoscan 1.3.3, Complex Test

As a variable threaded test, the overclock on the 2600K gives a sizeable performance jump over the stock performance, however the 7700K at stock gets almost the same size jump again. Having more cores in the 9700K just laughs at the rest of the chips in this comparison.



CPU Performance: Rendering Tests

Rendering is often a key target for processor workloads, lending itself to a professional environment. It comes in different formats as well, from 3D rendering through rasterization, such as games, or by ray tracing, and invokes the ability of the software to manage meshes, textures, collisions, aliasing, physics (in animations), and discarding unnecessary work. Most renderers offer CPU code paths, while a few use GPUs and select environments use FPGAs or dedicated ASICs. For big studios however, CPUs are still the hardware of choice.

All of our benchmark results can also be found in our benchmark engine, Bench.

Corona 1.3: Performance Render

An advanced performance based renderer for software such as 3ds Max and Cinema 4D, the Corona benchmark renders a generated scene as a standard under its 1.3 software version. Normally the GUI implementation of the benchmark shows the scene being built, and allows the user to upload the result as a ‘time to complete’.

We got in contact with the developer who gave us a command line version of the benchmark that does a direct output of results. Rather than reporting time, we report the average number of rays per second across six runs, as the performance scaling of a result per unit time is typically visually easier to understand.

The Corona benchmark website can be found at https://corona-renderer.com/benchmark

Corona 1.3 Benchmark

We can see the sizeable difference in performance between the 7700K and the 2600K, coming from microarchitecture updates and frequency, however even overclocking the 2600K only halves that gap.

Blender 2.79b: 3D Creation Suite

A high profile rendering tool, Blender is open-source allowing for massive amounts of configurability, and is used by a number of high-profile animation studios worldwide. The organization recently released a Blender benchmark package, a couple of weeks after we had narrowed our Blender test for our new suite, however their test can take over an hour. For our results, we run one of the sub-tests in that suite through the command line - a standard ‘bmw27’ scene in CPU only mode, and measure the time to complete the render.

Blender can be downloaded at https://www.blender.org/download/

Blender 2.79b bmw27_cpu Benchmark

Similarly with Blender, the overclock only cuts the defecit in half between the 2600K and 7700K at stock performance. Add in an overclock to the 7700K, and that gap gets wider.

LuxMark v3.1: LuxRender via Different Code Paths

As stated at the top, there are many different ways to process rendering data: CPU, GPU, Accelerator, and others. On top of that, there are many frameworks and APIs in which to program, depending on how the software will be used. LuxMark, a benchmark developed using the LuxRender engine, offers several different scenes and APIs.

In our test, we run the simple ‘Ball’ scene on both the C++ and OpenCL code paths, but in CPU mode. This scene starts with a rough render and slowly improves the quality over two minutes, giving a final result in what is essentially an average ‘kilorays per second’.

LuxMark v3.1 C++
LuxMark v3.1 OpenCL

POV-Ray 3.7.1: Ray Tracing

The Persistence of Vision ray tracing engine is another well-known benchmarking tool, which was in a state of relative hibernation until AMD released its Zen processors, to which suddenly both Intel and AMD were submitting code to the main branch of the open source project. For our test, we use the built-in benchmark for all-cores, called from the command line.

POV-Ray can be downloaded from http://www.povray.org/

POV-Ray 3.7.1 Benchmark

POV-Ray is a little different, just because AVX2 is playing a part here in how well the newer processors perform. POV-Ray also prefers cores over threads, so having eight real cores means the 9700K gets a nice big lead.



CPU Performance: Office Tests

The Office test suite is designed to focus around more industry standard tests that focus on office workflows, system meetings, some synthetics, but we also bundle compiler performance in with this section. For users that have to evaluate hardware in general, these are usually the benchmarks that most consider.

All of our benchmark results can also be found in our benchmark engine, Bench.

PCMark 10: Industry Standard System Profiler

Futuremark, now known as UL, has developed benchmarks that have become industry standards for around two decades. The latest complete system test suite is PCMark 10, upgrading over PCMark 8 with updated tests and more OpenCL invested into use cases such as video streaming.

PCMark splits its scores into about 14 different areas, including application startup, web, spreadsheets, photo editing, rendering, video conferencing, and physics. We post all of these numbers in our benchmark database, Bench, however the key metric for the review is the overall score.

PCMark10 Extended Score

Something like PCMark doesn't really show the scale of the differences, except in the main tests that are fully multithreaded where the 9700K pulls out a bigger lead. The 7700K only has a 17% lead over the 2600K, which goes down to 5% when compared to the overclocked version. This is perhaps more of an indication of how often you might feel the difference with a new 7700K over an overclocked 2600K: 5% of the time. It depends on your load balance, of course.

Chromium Compile: Windows VC++ Compile of Chrome 56

A large number of AnandTech readers are software engineers, looking at how the hardware they use performs. While compiling a Linux kernel is ‘standard’ for the reviewers who often compile, our test is a little more varied – we are using the windows instructions to compile Chrome, specifically a Chrome 56 build from March 2017, as that was when we built the test. Google quite handily gives instructions on how to compile with Windows, along with a 400k file download for the repo.

In our test, using Google’s instructions, we use the MSVC compiler and ninja developer tools to manage the compile. As you may expect, the benchmark is variably threaded, with a mix of DRAM requirements that benefit from faster caches. Data procured in our test is the time taken for the compile, which we convert into compiles per day.

Compile Chromium (Rate)

Our compile test in this case loves the cores of the 9700K over SMT, but in this case we again see the overclocked 2600K get inbetween the 7700K and the 2600K at stock. Even without an overclock on the 7700K, that's an easy gain to amortize.

3DMark Physics: In-Game Physics Compute

Alongside PCMark is 3DMark, Futuremark’s (UL’s) gaming test suite. Each gaming tests consists of one or two GPU heavy scenes, along with a physics test that is indicative of when the test was written and the platform it is aimed at. The main overriding tests, in order of complexity, are Ice Storm, Cloud Gate, Sky Diver, Fire Strike, and Time Spy.

Some of the subtests offer variants, such as Ice Storm Unlimited, which is aimed at mobile platforms with an off-screen rendering, or Fire Strike Ultra which is aimed at high-end 4K systems with lots of the added features turned on. Time Spy also currently has an AVX-512 mode (which we may be using in the future).

For our tests, we report in Bench the results from every physics test, but for the sake of the review we keep it to the most demanding of each scene: Ice Storm Unlimited, Cloud Gate, Sky Diver, Fire Strike Ultra, and Time Spy.

3DMark Physics - Cloud Gate3DMark Physics - Sky Diver3DMark Physics - Fire Strike3DMark Physics - Time Spy

GeekBench4: Synthetics

A common tool for cross-platform testing between mobile, PC, and Mac, GeekBench 4 is an ultimate exercise in synthetic testing across a range of algorithms looking for peak throughput. Tests include encryption, compression, fast Fourier transform, memory operations, n-body physics, matrix operations, histogram manipulation, and HTML parsing.

I’m including this test due to popular demand, although the results do come across as overly synthetic, and a lot of users often put a lot of weight behind the test due to the fact that it is compiled across different platforms (although with different compilers).

We record the main subtest scores (Crypto, Integer, Floating Point, Memory) in our benchmark database, but for the review we post the overall single and multi-threaded results.

Geekbench 4 - ST OverallGeekbench 4 - MT Overall



CPU Performance: Encoding Tests

With the rise of streaming, vlogs, and video content as a whole, encoding and transcoding tests are becoming ever more important. Not only are more home users and gamers needing to convert video files into something more manageable, for streaming or archival purposes, but the servers that manage the output also manage around data and log files with compression and decompression. Our encoding tasks are focused around these important scenarios, with input from the community for the best implementation of real-world testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Handbrake 1.1.0: Streaming and Archival Video Transcoding

A popular open source tool, Handbrake is the anything-to-anything video conversion software that a number of people use as a reference point. The danger is always on version numbers and optimization, for example the latest versions of the software can take advantage of AVX-512 and OpenCL to accelerate certain types of transcoding and algorithms. The version we use here is a pure CPU play, with common transcoding variations.

We have split Handbrake up into several tests, using a Logitech C920 1080p60 native webcam recording (essentially a streamer recording), and convert them into two types of streaming formats and one for archival. The output settings used are:

  • 720p60 at 6000 kbps constant bit rate, fast setting, high profile
  • 1080p60 at 3500 kbps constant bit rate, faster setting, main profile
  • 1080p60 HEVC at 3500 kbps variable bit rate, fast setting, main profile

Handbrake 1.1.0 - 720p60 x264 6000 kbps FastHandbrake 1.1.0 - 1080p60 x264 3500 kbps FasterHandbrake 1.1.0 - 1080p60 HEVC 3500 kbps Fast

7-zip v1805: Popular Open-Source Encoding Engine

Out of our compression/decompression tool tests, 7-zip is the most requested and comes with a built-in benchmark. For our test suite, we’ve pulled the latest version of the software and we run the benchmark from the command line, reporting the compression, decompression, and a combined score.

It is noted in this benchmark that the latest multi-die processors have very bi-modal performance between compression and decompression, performing well in one and badly in the other. There are also discussions around how the Windows Scheduler is implementing every thread. As we get more results, it will be interesting to see how this plays out.

Please note, if you plan to share out the Compression graph, please include the Decompression one. Otherwise you’re only presenting half a picture.7-Zip 1805 Compression7-Zip 1805 Decompression7-Zip 1805 Combined

WinRAR 5.60b3: Archiving Tool

My compression tool of choice is often WinRAR, having been one of the first tools a number of my generation used over two decades ago. The interface has not changed much, although the integration with Windows right click commands is always a plus. It has no in-built test, so we run a compression over a set directory containing over thirty 60-second video files and 2000 small web-based files at a normal compression rate.

WinRAR is variable threaded but also susceptible to caching, so in our test we run it 10 times and take the average of the last five, leaving the test purely for raw CPU compute performance.

WinRAR 5.60b3

One of our closes tests between an overclocked 2600K and the 7700K at stock is WinRAR. It's a variable threaded test, and doesn't seem to take advantage of any of the newer instructions offered by the 7700K. However, the more cores of the 9700K over having real threads shows a big bonus, as well as the upgraded DRAM.

AES Encryption: File Security

A number of platforms, particularly mobile devices, are now offering encryption by default with file systems in order to protect the contents. Windows based devices have these options as well, often applied by BitLocker or third-party software. In our AES encryption test, we used the discontinued TrueCrypt for its built-in benchmark, which tests several encryption algorithms directly in memory.

The data we take for this test is the combined AES encrypt/decrypt performance, measured in gigabytes per second. The software does use AES commands for processors that offer hardware selection, however not AVX-512.

AES Encoding



CPU Performance: Web and Legacy Tests

While more the focus of low-end and small form factor systems, web-based benchmarks are notoriously difficult to standardize. Modern web browsers are frequently updated, with no recourse to disable those updates, and as such there is difficulty in keeping a common platform. The fast paced nature of browser development means that version numbers (and performance) can change from week to week. Despite this, web tests are often a good measure of user experience: a lot of what most office work is today revolves around web applications, particularly email and office apps, but also interfaces and development environments. Our web tests include some of the industry standard tests, as well as a few popular but older tests.

We have also included our legacy benchmarks in this section, representing a stack of older code for popular benchmarks.

All of our benchmark results can also be found in our benchmark engine, Bench.

WebXPRT 3: Modern Real-World Web Tasks, including AI

The company behind the XPRT test suites, Principled Technologies, has recently released the latest web-test, and rather than attach a year to the name have just called it ‘3’. This latest test (as we started the suite) has built upon and developed the ethos of previous tests: user interaction, office compute, graph generation, list sorting, HTML5, image manipulation, and even goes as far as some AI testing.

For our benchmark, we run the standard test which goes through the benchmark list seven times and provides a final result. We run this standard test four times, and take an average.

Users can access the WebXPRT test at http://principledtechnologies.com/benchmarkxprt/webxprt/

WebXPRT 3 (2018)

WebXPRT 2015: HTML5 and Javascript Web UX Testing

The older version of WebXPRT is the 2015 edition, which focuses on a slightly different set of web technologies and frameworks that are in use today. This is still a relevant test, especially for users interacting with not-the-latest web applications in the market, of which there are a lot. Web framework development is often very quick but with high turnover, meaning that frameworks are quickly developed, built-upon, used, and then developers move on to the next, and adjusting an application to a new framework is a difficult arduous task, especially with rapid development cycles. This leaves a lot of applications as ‘fixed-in-time’, and relevant to user experience for many years.

Similar to WebXPRT3, the main benchmark is a sectional run repeated seven times, with a final score. We repeat the whole thing four times, and average those final scores.

WebXPRT15

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a accrued test over a series of javascript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics. We report this final score.

Speedometer 2

Google Octane 2.0: Core Web Compute

A popular web test for several years, but now no longer being updated, is Octane, developed by Google. Version 2.0 of the test performs the best part of two-dozen compute related tasks, such as regular expressions, cryptography, ray tracing, emulation, and Navier-Stokes physics calculations.

The test gives each sub-test a score and produces a geometric mean of the set as a final result. We run the full benchmark four times, and average the final results.

Google Octane 2.0

Mozilla Kraken 1.1: Core Web Compute

Even older than Octane is Kraken, this time developed by Mozilla. This is an older test that does similar computational mechanics, such as audio processing or image filtering. Kraken seems to produce a highly variable result depending on the browser version, as it is a test that is keenly optimized for.

The main benchmark runs through each of the sub-tests ten times and produces an average time to completion for each loop, given in milliseconds. We run the full benchmark four times and take an average of the time taken.

Mozilla Kraken 1.1

3DPM v1: Naïve Code Variant of 3DPM v2.1

The first legacy test in the suite is the first version of our 3DPM benchmark. This is the ultimate naïve version of the code, as if it was written by scientist with no knowledge of how computer hardware, compilers, or optimization works (which in fact, it was at the start). This represents a large body of scientific simulation out in the wild, where getting the answer is more important than it being fast (getting a result in 4 days is acceptable if it’s correct, rather than sending someone away for a year to learn to code and getting the result in 5 minutes).

In this version, the only real optimization was in the compiler flags (-O2, -fp:fast), compiling it in release mode, and enabling OpenMP in the main compute loops. The loops were not configured for function size, and one of the key slowdowns is false sharing in the cache. It also has long dependency chains based on the random number generation, which leads to relatively poor performance on specific compute microarchitectures.

3DPM v1 can be downloaded with our 3DPM v2 code here: 3DPMv2.1.rar (13.0 MB)

3DPM v1 Single Threaded3DPM v1 Multi-Threaded

x264 HD 3.0: Older Transcode Test

This transcoding test is super old, and was used by Anand back in the day of Pentium 4 and Athlon II processors. Here a standardized 720p video is transcoded with a two-pass conversion, with the benchmark showing the frames-per-second of each pass. This benchmark is single-threaded, and between some micro-architectures we seem to actually hit an instructions-per-clock wall.

x264 HD 3.0 Pass 1x264 HD 3.0 Pass 2



Gaming: World of Tanks enCore

Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved eSports status when it debuted at the World Cyber Games back in 2012.

World of Tanks enCore is a demo application for a new and unreleased graphics engine penned by the Wargaming development team. Over time the new core engine will implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine run optimally on their system.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
World of Tanks enCore Driving / Action Feb
2018
DX11 768p
Minimum
1080p
Medium
1080p
Ultra
4K
Ultra

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

As with a lot of the CPU benchmarks, the overclocked 2600K sits between the 2600K at stock and the 7700K, at least up to 1080p Ultra. At 4K Ultra, the OC and 7700K are essentially the same performance, but the 2600K at stock certainly has a lower 95th percentile result.



Gaming: Final Fantasy XV

Upon arriving to PC earlier this, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console, fruits of their successful partnership with NVIDIA, with hardly any hint of the troubles during Final Fantasy XV's original production and development.

In preparation for the launch, Square Enix opted to release a standalone benchmark that they have since updated. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues.

Square Enix has patched the benchmark with custom graphics settings and bugfixes to be much more accurate in profiling in-game performance and graphical options. For our testing, we run the standard benchmark with a FRAPs overlay, taking a 6 minute recording of the test.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
Final Fantasy XV JRPG Mar
2018
DX11 720p
Standard
1080p
Standard
4K
Standard
8K
Standard

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

For Final Fantasy, all chips performed essentially the same from 4K upwards (the OC run failed at 8K for some reason), but at 1080p resolutions the OC chip still sits between the 2600K/7700K at stock almost easily in the middle. 



Gaming: Civilization 6 (DX12)

Originally penned by Sid Meier and his team, the Civ series of turn-based strategy games are a cult classic, and many an excuse for an all-nighter trying to get Gandhi to declare war on you due to an integer overflow. Truth be told I never actually played the first version, but every edition from the second to the sixth, including the fourth as voiced by the late Leonard Nimoy, it a game that is easy to pick up, but hard to master.

Benchmarking Civilization has always been somewhat of an oxymoron – for a turn based strategy game, the frame rate is not necessarily the important thing here and even in the right mood, something as low as 5 frames per second can be enough. With Civilization 6 however, Firaxis went hardcore on visual fidelity, trying to pull you into the game. As a result, Civilization can taxing on graphics and CPUs as we crank up the details, especially in DirectX 12.

Perhaps a more poignant benchmark would be during the late game, when in the older versions of Civilization it could take 20 minutes to cycle around the AI players before the human regained control. The new version of Civilization has an integrated ‘AI Benchmark’, although it is not currently part of our benchmark portfolio yet, due to technical reasons which we are trying to solve. Instead, we run the graphics test, which provides an example of a mid-game setup at our settings.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
Civilization VI RTS Oct
2016
DX12 1080p
Ultra
4K
Ultra
8K
Ultra
16K
Low

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

Civilization is a game that isn't frame rate driven per se, and having all the settings turned up helps a lot. However even at 4K, there's difference in performance between the 2600K and the 7700K when both at stock, which gets halved when the 2600K is overclocked.



Gaming: Ashes Classic (DX12)

Seen as the holy child of DirectX12, Ashes of the Singularity (AoTS, or just Ashes) has been the first title to actively go explore as many of the DirectX12 features as it possibly can. Stardock, the developer behind the Nitrous engine which powers the game, has ensured that the real-time strategy title takes advantage of multiple cores and multiple graphics cards, in as many configurations as possible.

As a real-time strategy title, Ashes is all about responsiveness during both wide open shots but also concentrated battles. With DirectX12 at the helm, the ability to implement more draw calls per second allows the engine to work with substantial unit depth and effects that other RTS titles had to rely on combined draw calls to achieve, making some combined unit structures ultimately very rigid.

Stardock clearly understand the importance of an in-game benchmark, ensuring that such a tool was available and capable from day one, especially with all the additional DX12 features used and being able to characterize how they affected the title for the developer was important. The in-game benchmark performs a four minute fixed seed battle environment with a variety of shots, and outputs a vast amount of data to analyze.

For our benchmark, we run Ashes Classic: an older version of the game before the Escalation update. The reason for this is that this is easier to automate, without a splash screen, but still has a strong visual fidelity to test.

Ashes has dropdown options for MSAA, Light Quality, Object Quality, Shading Samples, Shadow Quality, Textures, and separate options for the terrain. There are several presents, from Very Low to Extreme: we run our benchmarks at the above settings, and take the frame-time output for our average and percentile numbers.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
Ashes: Classic RTS Mar
2016
DX12 720p
Standard
1080p
Standard
1440p
Standard
4K
Standard

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

For Ashes we see performance differences between the chips all the way up to 4K, however the 7700K and 2600K overclocked perform almost the same at 4K. From 1440p and down however, the OC doesn't quite make the grade when being put against the 7700K, showing the difference between the two architectures and platforms.



Gaming: Strange Brigade (DX12)

Strange Brigade is based in 1903’s Egypt and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.

The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark which offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. AMD has boasted previously that Strange Brigade is part of its Vulkan API implementation offering scalability for AMD multi-graphics card configurations.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
Strange Brigade FPS Aug
2018
DX12 720p
Low
1080p
Medium
1440p
High
4K
Ultra

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

On Strange Bridgade, all the chips (apart from 2600K at stock) perfom the same at 1080p and above, meaning that there's no reason to upgrade if this is the only title you play.



Gaming: Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark. The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data.

There are no presets for the graphics options on GTA, allowing the user to adjust options such as population density and distance scaling on sliders, but others such as texture/shadow/shader/water quality from Low to Very High. Other options include MSAA, soft shadows, post effects, shadow resolution and extended draw distance options. There is a handy option at the top which shows how much video memory the options are expected to consume, with obvious repercussions if a user requests more video memory than is present on the card (although there’s no obvious indication if you have a low end GPU with lots of GPU memory, like an R7 240 4GB).

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
Grand Theft Auto V Open World Apr
2015
DX11 720p
Low
1080p
High
1440p
Very High
4K
Ultra

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

We see performance parity between the chips at 4K, but for all other resolutions and settings, the OC chip again still can't make it to the level of the 7700K, often sitting midway between the 7700K at stock and the 2600K at stock.



Gaming: Far Cry 5

The latest title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration.

Far Cry 5 does support Vega-centric features with Rapid Packed Math and Shader Intrinsics. Far Cry 5 also supports HDR (HDR10, scRGB, and FreeSync 2). We use the in-game benchmark for our data, and report the average/minimum frame rates.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low High
Far Cry 5 FPS Mar
2018
DX11 720p
Low
1080p
Normal
4K
Ultra

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low High
Average FPS
95th Percentile

As with some other titles, there is parity at 4K, but below that resolution there's a large gap between the 2600K and 7700K that an overclock doesn't quite fill.



Gaming: Shadow of the Tomb Raider (DX12)

The latest instalment of the Tomb Raider franchise does less rising and lurks more in the shadows with Shadow of the Tomb Raider. As expected this action-adventure follows Lara Croft which is the main protagonist of the franchise as she muscles through the Mesoamerican and South American regions looking to stop a Mayan apocalyptic she herself unleashed. Shadow of the Tomb Raider is the direct sequel to the previous Rise of the Tomb Raider and was developed by Eidos Montreal and Crystal Dynamics and was published by Square Enix which hit shelves across multiple platforms in September 2018. This title effectively closes the Lara Croft Origins story and has received critical acclaims upon its release.

The integrated Shadow of the Tomb Raider benchmark is similar to that of the previous game Rise of the Tomb Raider, which we have used in our previous benchmarking suite. The newer Shadow of the Tomb Raider uses DirectX 11 and 12, with this particular title being touted as having one of the best implementations of DirectX 12 of any game released so far.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
Shadow of the Tomb Raider Action Sep
2018
DX12 720p
Low
1080p
Medium
1440p
High
4K
Highest

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

Unfortunately our overclocked system was having issues with the SoTR test, but our results show that from 1440P onwards, there should be some good parity between the chips.



Gaming: F1 2018

Aside from keeping up-to-date on the Formula One world, F1 2017 added HDR support, which F1 2018 has maintained; otherwise, we should see any newer versions of Codemasters' EGO engine find its way into F1. Graphically demanding in its own right, F1 2018 keeps a useful racing-type graphics workload in our benchmarks.

We use the in-game benchmark, set to run on the Montreal track in the wet, driving as Lewis Hamilton from last place on the grid. Data is taken over a one-lap race.

AnandTech CPU Gaming 2019 Game List
Game Genre Release Date API IGP Low Med High
F1 2018 Racing Aug
2018
DX11 720p
Low
1080p
Med
4K
High
4K
Ultra

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

F1 2018 shows that the overclocked 2600K and the 7700K are basically equal from 1080p and higher.



Power Consumption

One of the risk factors in overclocking is driving the processor beyond its ideal point of power and performance. Processors are typically manufactured with a particular sweet spot in mind: the peak efficiency of a processor will be at a particular voltage and particular frequency combination, and any deviation from that mark will result in expending extra energy (usually for better performance).

When Intel first introduced the Skylake family, this efficiency point was a key element to its product portfolio. Some CPUs would test and detect the best efficiency point on POST, making sure that when the system was idle, the least power is drawn. When the CPU is actually running code however, the system raises the frequency and voltage in order to offer performance away from that peak efficiency point. If a user pushes that frequency a lot higher, voltage needs to increase and power consumption rises.

So when overclocking a processor, either one of the newer ones or even an old processor, the user ends up expending more energy for the same workload, albeit to get the workload performed faster as well. For our power testing, we took the peak power consumption values during an all-thread version of POV-Ray, using the CPU internal metrics to record full SoC power.

Power (Package), Full Load

The Core i7-2600K was built on Intel’s 32nm process, while the i7-7700K and i7-9700K were built on variants of Intel’s 14nm process family. These latter two, as shown in the benchmarks in this review, have considerable performance advantages due to microarchitectural, platform, and frequency improvements that the more efficient process node offers. They also have AVX2, which draw a lot of power in our power test.

In our peak power results graph, we see the Core i7-2600K at stock (3.5 GHz all-core) hitting only 88W, while the Core i7-7700K at stock (4.3 GHz all-core) at 95 W. These results are both respectable, however adding the overclock to the 2600K, to hit 4.7 GHz all-core, shows how much extra power is needed. At 116W, the 34% overclock is consuming 31% more power (for 24% more performance) when comparing to the 2600K at stock.

The Core i7-9700K, with eight full cores, goes above and beyond this, drawing 124W at stock. While Intel’s power policy didn’t change between the generations, the way it ended up being interpreted did, as explained in our article here:

Why Intel Processors Draw More Power Than Expected: TDP and Turbo Explained

You can also learn about power control on Intel’s latest CPUs in our original Skylake review:

The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis



Comparing the Quad Cores: CPU Tests

As a straight up comparison between what Intel offered in terms of quad cores, here’s an analysis of all the results for the 2600K, 2600K overclocked, and Intel’s final quad-core with HyperThreading chip for desktop, the 7700K.

On our CPU tests, the Core i7-2600K when overclocked to a 4.7 GHz all-core frequency (and with DDR3-2400 memory) offers anywhere from 10-24% increase in performance against the stock settings with Intel maximum supported frequency memory. Users liked the 2600K because of this – there were sizable gains to be had, and Intel’s immediate replacements to the 2600K didn’t offer the same level of boost or difference in performance.

However, when compared to the Core i7-7700K, Intel’s final quad-core with HyperThreading processor, users were able to get another 8-29% performance on top of that. Depending on the CPU workload, it would be very easy to see how a user could justify getting the latest quad core processor and feeling the benefits for more modern day workloads, such as rendering or encoding, especially given how the gaming market has turned more into a streaming culture. For the more traditional workflows, such as PCMark or our legacy tests, only gains of 5-12% are seen, which is what we would have seen back when some of these newer tests were no longer so relevant.

As for the Core i7-9700K, which has eight full cores and now sits in the spot of Intel’s best Core i7 processor, performance gains are very much more tangible, and almost double in a lot of cases against an overclocked Core i7-2600K (and more than double against one at stock).

The CPU case is clear: Intel’s last quad core with hyperthreading is an obvious upgrade for a 2600K user, even before you overclock it, and the 9700K which is almost the same launch price parity is definitely an easy sell. The gaming side of the equation isn’t so rosy though.

Comparing the Quad Cores: GPU Tests

Modern games today are running at higher resolutions and quality settings than the Core i7-2600K did when it was first launch, as well as new physics features, new APIs, and new gaming engines that can take advantage of the latest advances in CPU instructions as well as CPU-to-GPU connectivity. For our gaming benchmarks, we test with four tests of settings on each game (720p, 1080p, 1440p-4K, and 4K+) using a GTX 1080, which is one of last generations high-end gaming cards, and something that a number of Core i7 users might own for high-end gaming.

When the Core i7-2600K was launched, 1080p gaming was all the rage. I don’t think I purchased a monitor bigger than 1080p until 2012, and before then I was clan gaming on screens that could have been as low as 1366x768. The point here is that with modern games at older resolutions like 1080p, we do see a sizeable gain when the 2600K is overclocked. A 22% gain in frame rates from a 34% overclock sounds more than reasonable to any high-end focused gamer. Intel only managed to improve on that by 12% over the next few years to the Core i7-7700K, relying mostly on frequency gains. It’s not until the 9700K, with more cores and running games that actually know what to do with them, do we see another jump up in performance.

However, all those gains are muted at a higher resolutions setting, such as 1440p. Going from an overclocked 2600K to a brand new 9700K only gives a 9% increase in frame rates for modern games. At an enthusiast 4K setting, the results across the board are almost equal. As resolutions are getting higher, even with modern physics and instructions and APIs, the bulk of the workload is still on the GPU, and even the Core i7-2600K is powerful enough for it. There is the odd title where having the newer chip helps a lot more, but it’s in the minority.

That is, at least on average frame rates. Modern games and modern testing methods now test percentile frame rates, and the results are a little different.

Here the results look a little worse for the Core i7-2600K and a bit better for the Core i7-9700K, but on the whole the broad picture is the same for percentile results as it is for average frame results. In the individual results, we see some odd outliers, such as Ashes of the Singularity which was 15% down on percentiles at 4K for a stock 2600K, but the 9700K was only 6% higher than an overclocked 2600K, but like the average frame rates, it is really title dependent.



Upgrading from an Intel Core i7-2600K: Yes

Back in 2010-2011, life was simple. We were relishing in benchmarks like CineBench R10, SuperPI, and no-one had even thought of trying to transcode video on any sort of scale. In 2019, the landscape has changed: gamers gonna stream, designers gonna design, scientists gonna simulate, and emulators gonna emulate. The way that software is designed has changed substantially as well, with more care taken for memory allocations, multiple cores and threads, and with fast storage in mind. Compilers are smarter too, and all the optimizations for the older platforms are in those code bases.

We regularly speak to CPU architects that describe how they build new processors for the next generation: by analyzing modern workload requirements. In a future of machine learning, for example, we’re now seeing hardware on mobile processors dedicated to accelerating neural networks for things like smartphone photography. (It’s interesting that smartphone SoCs today, in day-to-day use, are arguably more diverse than desktops in that regard.)

Ultimately, benchmarks have changed too. What we tested back in 2011 in our Core i7-2600K review was indicative of the way people were using their computers then, and in 2019 we are testing how people are using their computers today. On some level, one expects that what would have been the balance of compute/storage/resources back then might have adjusted, and as a result, older parts may perform better or worse than expected.

For this review, I wanted to compare an eternal idol for enthusiast desktop computing with its more modern counterparts. The Sandy Bridge Core i7-2600K that was released in 2011 was an enthusiasts dream: significantly faster than the previous generation, priced right, and offered a substantial performance boost when overclocked. The fact that it overclocked well was the crux of its staying power: if users were seeing 20-40%+ performance from an overclock and some fast memory, then the several years of Intel offering baseline 3-8% performance increases were scoffed at, and users did not upgrade.


It's a Core i7 Family Photo

The Core i7-2600K was a quad core processor with hyperthreading. Intel launched five more families of Core i7 that were also quad core with hyperthreading: the Core i7-3770K, i7-4770K, i7-5775C, 6700K, and 7700K, before it moved up to six cores (HT) with the 8700K and eight cores (no HT) with the 9700K. Each of those generations of quad cores offered slightly more frequency, sometimes new instructions, sometimes better transistor density, sometimes better graphics, and sometimes a better platform.

Features like new instructions, better integrated graphics, or the platform are valid reasons to push an upgrade, even if the raw performance gain in most tasks is minor. Moving to PCIe 3.0 for graphics, or moving to DDR4 to access higher capacity memory modules, or shifting to NVMe storage with more diverse chipset support all helped users that bypassed the popular 2600K.

In this review, we tested the Core i7-2600K at Intel’s recommended release settings (known as ‘stock’), and an overclocked Core i7-2600K, pushing up from 3.5 GHz all-core to 4.7 GHz all-core, and with faster memory. For comparison to newer CPUs, we chose the Core i7-7700K, Intel’s final Core i7 quad-core for the desktop, representing the best Intel has offered in a quad-core with HT package, and the Core i7-9700K, the latest high-end Core i7 processor.

The results from our testing paint an interesting picture, and as a result so do our conclusions. Our CPU testing was quite clear – in almost every test, the overclock on the 2600K was only able to half the deficit between the 7700K and the 2600K when both were run at stock. Whenever the overclock gave 20% extra performance, the 7700K was another 20% ahead. The only benchmarks that differed were the benchmarks that were AVX2 capable, where the 7700K had a massive lead due to the fact that it supports AVX2. In all our CPU tests, the Core i7-9700K by comparison blew them all out of the water.

For anyone still using a Core i7-2600K for CPU testing, even when overclocked, it’s time to feel the benefits of an upgrade.

 

The GPU testing had a different result. From 2011 to 2019, enthusiast gamers have moved from 1080p in one of two directions: higher resolutions or higher framerates. The direction moved depends on the type of game played, and modern game engines are geared up to cater for both, and have been optimized for the latest hardware with the latest APIs.

For users going up in resolution, to 4K and beyond, the i7-2600K when overclocked performs just as well as the latest Core i7-9700K. The stock 2600K is a little behind, but not overly noticeable unless you drill down into specific titles. But the overclocked Core i7-2600K is still a great chip for high resolution 60 FPS gaming.

For users staying at 1080p (or 1440p) but looking at high frame rates to drive higher refresh rate displays, there is more of a tangible benefit here. Newer games on modern APIs can use more threads, and the higher number of draw calls required per frame (and for more frames) can be driven better with the latest Core i7 hardware. The Core i7-7700K gives a good boost, which can be bettered with the full eight cores of the Core i7-9700K. Both of these chips can be overclocked too, which we’ve not covered here.

The Bottom Line

Back during 2011 and 2012, I was a competitive overclocker, and my results were focused around using the Core i7-2600K as the base for pushing my CPU and GPUs to the limits. The day-to-day performance gains for any of my CPU or GPU tests were tangible, not only for work but also for gaming at 1080p.

Fast forward to 2019, and there is only one or two reasons to stick to that old system, even when overclocked. The obvious reason is cost: if you can’t afford an upgrade, then that’s a very legitimate reason not to, and I hope you’re still having fun with it. The second reason to not upgrade is that the only thing you do, as an enthusiast gamer with a modern day graphics card, is game at 4K.

There are a million other reasons to upgrade, even to the Core i7-7700K: anything CPU related, memory support (capacity and speed), storage support, newer chipsets, newer connectivity standards, AVX2, PCIe 3.0, multi-tasking, gaming and streaming, NVMe. Or if you’re that way inclined, the RGB LED fad of modern components.

Back in my day, we installed games from DVDs and used cold cathodes for RGB.


Picture from 2006? – Battlefield 2 on a CRT.
Running an ATI X1900XTX on an AMD Athlon 3400+

Log in

Don't have an account? Sign up now