Original Link: https://www.anandtech.com/show/16343/intel-core-i710700-vs-core-i710700k-review-is-65w-comet-lake-an-option



Over the years, Intel’s consumer processor lineup has featured its usual array of overclocking ‘K’ models, and more recently the ‘F’ series that come without integrated graphics. The bulk of the lineup however are still the versions without a suffix, the ‘nones’, like the Core i7-10700 in this review. These processors sit in the middle of the road, almost always having a 65 W TDP compared to the 91-125 W overclockable models, but also having integrated graphics, unlike the F family. What makes it interesting is when we pair one of these 65 W parts against its 125 W overclocking counterpart, and if the extra base and turbo frequency boost is actually worth the money in an era where motherboards don't seem to care about power?

Intel’s Core i7-10700 at 65 W: Is It Really 65 W?

The understanding of the way that Intel references its TDP (thermal design point) values has gone through a mini-revolution in the last few years. We have had an almost-decade of quad-core processors at around 90 W and 65 W, and most of them would never reached these numbers even under turbo modes - for example, the Core i5-6600K was rated at 91 W, but peak power draw was only 83 W. This has been the norm for a while, until recently when Intel had to start boosting the core count. As we have slowly gone up in core count, from 4 to 6 to 8 and now 10, these numbers have seemed almost arbitrary for a while.

The reason comes down to what TDP really is. In the past, we used to assume that it was the peak power consumption of the processor was its TDP rating – after all, a ‘thermal design point’ of a processor was almost worthless if you didn’t account for the peak power dissipation. What makes Intel’s situation different (or confusing, depending on how you want to call it) is that the company defines its TDP in the context of a 'base' frequency. The TDP will be the maximum power under a sustained workload for which the base frequency is the minimum frequency guarantee. Intel defines a sustained workload one in which the 'turbo budget' has expired, and the processor will achieve its best frequency above base frequency (but not turbo modes) .

The point about ‘not turbo’ is the key element here. Intel’s TDP ratings are only in effect for the base frequency, not the turbo frequency. If a PC is built with a maximum power dissipation in mind, allowing a processor to turbo above that power might have catastrophic consequences for the thermal performance of that system. The other angle is that Intel never quotes the turbo power levels (also known as Power Level 2, or PL2) alongside the other specifications, although they are technically in the specification documents when they get released.

On top of all this, motherboard manufacturers also get a say in how a processor performs. Because turbo power is only an optional suggestion from Intel, technically Intel will accept any value for the ceiling of the turbo power, and accept turbo under any circumstances if the motherboard manufacturer wants it. Motherboard manufacturers overengineer their motherboards to support longer turbo times (or overclocking), and so they will often ignore these Intel recommended values for PL2, allowing the processor to turbo harder for longer, and in a lot of cases of premium motherboards, indefinitely.

So why does all this matter with respect for this review? Well my key comparison in this review is our new processor, the Core i7-10700, up against its overclocking counterpart, the Core i7-10700K. Aside from the suffix difference, the K variant has a TDP almost twice as high, and this manifests almost entirely in the base frequency difference.

Intel SKU vs SKU
(an homage to Spy vs Spy)
Intel Core
i7-10700K
AnandTech Intel Core
i7-10700
8 C / 16 T Cores / Threads 8 C / 16 T
3.8 GHz Base Frequency 2.9 GHz
5.1 GHz Peak Turbo (1-2C) 4.8 GHz
4.7 GHz All-Core Turbo 4.6 GHz
2 x DDR4-2933
Up to 128 GB
DRAM Support 2 x DDR4-2933
Up to 128 GB
125 W TDP / PL1 65 W
Intel UHD 630 Integrated Graphics Intel UHD 630
$374 Price (1ku) $323

Even though the TDP is 125 W vs 65 W, the peak turbo frequency difference is only +300 MHz, and the all-core turbo difference is only +100 MHz. In contrast, the base frequency difference is +900 MHz, and that is ultimately what the user is paying for. But this base frequency only matters if the motherboard bothers to put a cap on turbo budgets.

The base frequency is more of a minimum guaranteed frequency, than an absolute 'this is what you will get' value under a sustained workload. Intel likes to state that the base frequency is the guarantee, however if a processor can achieve a higher frequency while power limited, it will - if it can achieve that power value with 200 MHz above base frequency, it will run at the higher frequency. If this sounds familiar, this is how all AMD Ryzen processors work, however Intel only implements it when turbo is no longer available. This ends up being very processor dependent. 

For the turbo, as mentioned, Intel has recommendations for power levels and turbo time in its documentation, however OEMs and motherboard manufacturers are free to routinely ignore it. This is no more obvious than when comparing these two processors. What does this mean for end-users? Well, graphs like this.

Intel Peak Power Draw

First time I saw these numbers, it shocked me. Why is this cheaper, and supposedly less powerful version of this silicon running at a higher turbo power in a standard off-the-shelf Intel Z490 motherboard?

Welcome to our review. There’s going to be a lot of discussion on the page where we talk about power, frequency, and the quality of the silicon. Also when it comes to benchmarking, because if we were to take an extreme view of everything, then benchmarking is pointless and I'm out of a job.

The Market

The Core i7-10700 and Core i7-10700K are both members of Intel’s 10th Generation ‘Comet Lake’ Core i7 family. This means they are based on Intel’s latest 14nm process variant (14+++, we think, Intel stopped telling us outright), but are essentially power and frequency optimized versions of Intel’s 6th Generation Skylake Core, except we get eight cores rather than four.

Intel 10th Gen Comet Lake
Core i9 and Core i7
AnandTech Cores Base
Freq
TB2
2C
TB2
nT
TB3
2C
TVB
2C
TVB
nT
TDP
(W)
IGP MSRP
1ku
Core i9
i9-10900K 10C/20T 3.7 5.1 4.8 5.2 5.3 4.9 125 630 $488
i9-10900KF 10C/20T 3.7 5.1 4.8 5.2 5.3 4.9 125 - $472
i9-10900 10C/20T 2.8 5.0 4.5 5.1 5.2 4.6 65 630 $439
i9-10900F 10C/20T 2.8 5.0 4.5 5.1 5.2 4.6 65 - $422
i9-10900T 10C/20T 1.9 4.5 3.7 4.6 - - 35 630 $439
i9-10850K 10C/20T 3.6 5.0 4.7 5.1 5.2 4.8 125 630 $453
Core i7
i7-10700K 8C/16T 3.8 5.0 4.7 5.1 - - 125 630 $374
i7-10700KF 8C/16T 3.8 5.0 4.7 5.1 - - 125 - $349
i7-10700 8C/16T 2.9 4.7 4.6 4.8 - - 65 630 $323
i7-10700F 8C/16T 2.9 4.7 4.6 4.8 - - 65 - $298
i7-10700T 8C/16T 2.0 4.4 3.7 4.5 - - 35 630 $325
T = Low Power
F = No Integrated Graphics
K = Overclockable

TB2/TB3 = Intel Turbo Boost 2 (any core in CPU), TB3 (specific core in CPU)
TVB = Thermal Velocity Boost (Spec = 70ºC); routinely ignored by motherboard vendors

Both CPUs are rated to run dual channel memory at DDR4-2933 speeds, and have 16 PCIe 3.0 lanes with support for Intel 400-series chipsets. These are socket LGA1200 processors, and are incompatible with other LGA115x motherboards.

Aside from the power and frequency differences, the other one is the price: $335 MSRP for the Core i7-10700, and $387 MSRP for the Core i7-10700K. This is a +$52 difference, which is designed to enable better frequencies and overclocking on the K processor. The non-K processor may be shipped with Intel’s 65 W PCG-2015C thermal solution, depending on location, although the first thing you would want to do is to buy something/anything else to cool the processor with given that it'll peak at 215W in enthusiast systems.

On the competing side from AMD, the nearest solution is the Ryzen 5 5600X, a 65W version of Zen 3 with two fewer cores but higher IPC, with an MSRP of $300. This does come with a reasonably good default cooler. Our full review of the Ryzen 5 5600X can be found here.

This Review

The goal of this review was initially just to benchmark the Core i7-10700 and see where it fits into the market. As our testing results came into focus, it was clear that we had an interesting comparison on our hands against the Core i7-10700K, which we have also tested. In this review the focus will be on the difference between the two, focusing primarily on where the i7-10700 lands compared to the competition, and perhaps some of the complexities involved.

Test Setup

As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer.

Test Setup
Intel LGA1200 Core i9-10900K
Core i9-10850K
Core i7-10700K
Core i7-10700
ASRock Z490
PG Velocita
BIOS
P1.50
TRUE
Copper
+ SST*
Corsair DomRGB
4x8 GB
DDR4-2933
AMD AM4 Ryzen 9 5900X
Ryzen 7 5800X
Ryzen 5 5600X
MSI MEG
X570 Godlike
1.B3
T13
Noctua
NHU-12S
SE-AM4
ADATA
2x32 GB
DDR4-3200
GPU Sapphire RX 460 2GB (CPU Tests)
NVIDIA RTX 2080 Ti FE (Gaming Tests)
PSU Corsair AX860i
Corsair AX1200i
Silverstone SST-ST1000-P
SSD Crucial MX500 2TB
*TRUE Copper used with Silverstone SST-FHP141-VF 173 CFM fans. Nice and loud.

Many thanks to...

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Hardware Providers for CPU and Motherboard Reviews
Sapphire
RX 460 Nitro
NVIDIA
RTX 2080 Ti
Crucial SSDs Corsair PSUs

G.Skill DDR4 ADATA DDR4 Silverstone
Coolers
Noctua
Coolers

A big thanks to ADATA for the ​AD4U3200716G22-SGN modules for this review. They're currently the backbone of our AMD testing.

Users interested in the details of our current CPU benchmark suite can refer to our #CPUOverload article which covers the topics of benchmark automation as well as what our suite runs and why. We also benchmark much more data than is shown in a typical review, all of which you can see in our benchmark database. We call it ‘Bench’, and there’s also a link on the top of the website in case you need it for processor comparison in the future.

If anyone is wondering why I've written the SKU of the processor on it with a sharpie, as per our lead image, it's because when you're shuffling through a box of them in low light, what is printed on the headspreader can be difficult to read if the light isn't right. With a perminent marker, it makes it much easier to read at-a-glance.

Read on for our full review.



Power Consumption

The nature of reporting processor power consumption has become, in part, a dystopian nightmare. Historically the peak power consumption of a processor, as purchased, is given by its Thermal Design Power (TDP, or PL1). For many markets, such as embedded processors, that value of TDP still signifies the peak power consumption. For the processors we test at AnandTech, either desktop, notebook, or enterprise, this is not always the case.

Modern high performance processors implement a feature called Turbo. This allows, usually for a limited time, a processor to go beyond its rated frequency. Exactly how far the processor goes depends on a few factors, such as the Turbo Power Limit (PL2), whether the peak frequency is hard coded, the thermals, and the power delivery. Turbo can sometimes be very aggressive, allowing power values 2.5x above the rated TDP.

AMD and Intel have different definitions for TDP, but are broadly speaking applied the same. The difference comes to turbo modes, turbo limits, turbo budgets, and how the processors manage that power balance. These topics are 10000-12000 word articles in their own right, and we’ve got a few articles worth reading on the topic.

In simple terms, processor manufacturers only ever guarantee two values which are tied together - when all cores are running at base frequency, the processor should be running at or below the TDP rating. All turbo modes and power modes above that are not covered by warranty. Intel kind of screwed this up with the Tiger Lake launch in September 2020, by refusing to define a TDP rating for its new processors, instead going for a range. Obfuscation like this is a frustrating endeavor for press and end-users alike.

However, for our tests in this review, we measure the power consumption of the processor in a variety of different scenarios. These include full workflows, real-world image-model construction, and others as appropriate. These tests are done as comparative models. We also note the peak power recorded in any of our tests.

Let us start with simply the peak power observed in all our testing, when using our standard ASRock Z490 PG Velocita motherboard. The peak power is the top power value observed across a number of our high-compute tests (rendering, AVX2, compute - whichever is highest). This is a bigger graph with more data than we saw on the front page.

(0-0) Peak Power

The Core i7-10700K, technically the higher performance processor, came in with a 205 W detected peak power draw in our AI Benchmark. The Core i7-10700 by comparison was observed at 215 W during our LINPACK benchmark. Both workloads are heavy on throughput. If we look at some of the other processors for comparison, both of the Core i7 use more power than Intel’s 28-core Xeon processors, and match the Xeon W-1200 series parts, although there’s still a way to go to match the Core i9-10900K. AMD’s peak here is 142 W.

Trying to drill down into why our Core i7-10700 was drawing more power at peak, I turned to the per-core loading to see if there was something perhaps more obvious. In this test, we use affinity masks to limit which cores are loaded, and scale through using an AIDA workload. The peaks here are slightly lower than in the graph above due to the different workload, but still show the same discrepancy.

Going through the full spectrum of per-core loading, at a quick glance both chips look roughly the same. Technically inside that little green package, the silicon is the same design, and the only difference is going to be what Intel wrote into firmware (what exactly Intel wrote would of course be somewhat contingent on how the processor was binned).

However, if we examine this data in closer detail, we see three distinct stages.

  1. Up to 3 cores or less, the Core i7-10700K uses more power
  2. Between 3-cores to 7 cores, the results are identical
  3. Only at 8 cores does the i7-10700 use more.

For each of these, there is a clear explanation. In the first instance, up to 3 cores, the 10700K draws more power because it is running at higher clocks, up to +300 MHz. This requires more voltage and power efficiency at that end of the spectrum is quite poor, meaning a lot of power is needed for that last extra bit of frequency.

Then during the 3-7 core loading, because each processor is approximately the same frequency within 100 MHz or so, they perform identical.

It is only at 8-core loading where the 10700 spikes up by an extra 5% or so, despite the 100 MHz deficit. In this instance, I would attribute this to the fact that in the binning process, the Core i7-10700K has to have the better efficiency when all cores are loaded. Technically the i7-10700 would be a lower binned chip, and might not have that quality, which manifests itself when all the cores are loaded (and when all the turbo restrictions are turned off, as with most modern consumer motherboards).

If we look at a workload benchmark that varies in complexity and thread loading, like Agisoft, we see that actually neither processor hits ~150 W, let alone going north of 200 W. In both circumstances the processors were mainly at their all-core turbo frequencies, (4600 MHz or 4700 MHz), briefly jumping to 5.1 GHz or 4.8 GHz for seconds at a time either in the initial phase or the end phase.

In this graph though, we see that in the initial phase, the Core i7-10700K is ever so slightly higher in power consumption, but when the more heavy area in the second stage comes in, both processors are about equal. Then in the final stage where the power humps up and down, the ‘down’ stages which are more single threaded showcase that the Core i7-10700 uses less power, up to 25% less.

The time difference in this benchmark was under 2%, the equivalent of 40 seconds over 34 minutes.

But 65 W! Is that figure worthless?

The value ’65 W’ on the Core i7-10700 does seem worthless at this point. If motherboards ignore it, what does that value tell me?

Ultimately, it still confirms what Intel’s warranty to the customer is. If the customer cannot achieve 65 W at base frequency all-core, then Intel has to replace the chip. I’ve never heard of a retail processor being claimed on warranty for the reason of ‘it can't achieve the minimum frequency within the power window’, so make of that what you will.

What that 65 W does do is tell system integrators, especially those that design custom chassis like MSI, ASUS, and GIGABYTE, is that they can design chassis for a 65 W processor and this will work as long as they completely disable turbo. The performance is lower, absolutely, and that just goes to show that for smaller form-factor systems, especially custom systems, the SKU on the box has no bearing on performance, and why reviews of these units are critical.

I know this answer is deeply unsatisfying. Users have the option of manually adjusting those turbo power limits to be more in line with their build, however those limits will be reset if the BIOS is reset for any reason. As we reported on in our ASUS Z490 ROG motherboard review, ASUS has recently enabled a feature whereby on the first boot of a BIOS reset, the system will ask if the user wants ‘Intel recommended settings’ or ‘ASUS recommended settings’. The Intel side forces a lower turbo, while the ASUS side pushes them higher.

The reason why motherboard manufacturers do these higher values is twofold.

First, Intel’s Turbo power and Turbo time values are *only* recommendations, not requirements.

Second, most consumer motherboards are over-engineered for higher performance. A lot of Intel’s processors would be quite happy with a 4-phase power delivery and a 4-layer motherboard. But motherboard manufacturers add value to their product by using 8 or 12 phase power delivery, with 8 layer motherboards to help with signal integrity, that also allow pushing the memory and frequency higher. Even $80 Biostar H-series motherboards are above Intel’s recommended specifications, because there has to be some form of brand value in the product. Because of these modifications, these motherboards can support higher performance. Given that most users don’t know how to adjust any of the turbo settings, and Intel doesn’t mind if they are adjusted, these vendors change the default values, and you get this sort of situation.

Intel does not care if these turbo values are ignored, only if the frequencies are adhered to. In fact, Intel actively encourages motherboard vendors to exceed these turbo numbers.

If there needs to be a call to action on this, then we can do one of two things.

First, demand from Intel that the recommended turbo power limits are included on the retail box – make these values part of the primary specifications, not hidden away in the secondary documents. I’ve been asking for this for a while, to no avail; Intel has even gone as far as reducing the number of primary specifications for its latest notebook processors, which I have thoroughly disagreed with.

Second, demand from motherboard manufacturers that they are more open about what the default turbo settings are for each processor. This is a bit tougher, because a motherboard manufacturer can change these numbers on a whim from BIOS version to BIOS version. To be honest, the way ASUS has done it is quite interesting, and I would be interested to hear how non-enthusiast system users interpret these options.

To see the difference between ASUS’ settings, here are some values from when we tested the motherboard [click for the full review] using the AIDA benchmark:

The difference obviously becomes clear above 4 cores loaded.

Some users might question how reviewers test, when there are shenanigans like these going on. The sad truth is perhaps depressing: Intel’s processors have worked this way for over a decade, and both Intel and motherboard manufacturers have used the fact that the quad-core processors never seriously went above the TDP values in the first place to hide this fact. Over that decade, neither Intel nor the motherboard vendors have ever decided to correct the reviewer base, or their customers. No serious effort has ever been put into dictating that there is a very explicit difference with the manufacturer recommended performance vs motherboard vendor adjusted turbo performance. Part of it would be explained with turbo values on the box, but the essence gets lost when this sort of data starts being hidden.

For anyone looking for a min-max of this effect, then here is the Core i7-10700K in the ASUS motherboard (not our usual motherboard) experiencing a full rendering POV-Ray workload in both 'Intel Suggested' and 'Motherboard Vendor Default' mode, where the latter gives the system infinite turbo time. Note that this is a different motherboard to our normal testing.

As the system loads up at the 3 minute mark, both settings will settle at 217 W during an all-core turbo load. After 40 seconds*, when the turbo budget expires when Intel's Suggested turbo settings are there, the system will drop down to 125 W, the TDP/PL1 of this processor. From there, rather than run at 3.7 GHz base, the system is running at 4.0 GHz, due to some extra frequency headroom within the 125 W window. By contrast, the motherboard vendor defaults continue at 4.7 GHz and 217 W forever.

*Note that Intel's turbo time is more of a time multiplied by a power virus. So if a workload has less power draw than Intel's test virus, then the time that turbo is enabled is longer. In this case, 28 seconds * 250 W is the likely budget, and as we were running at 217W, we get an extra 12 seconds or so. Intel uses an Expontential Weighted Moving Average, so to add to all this, it's not just a simple linear equation deciding what to do.

Intel’s stance here is quite clear: Intel is fully supportive of motherboard vendors changing these values. In the past I’ve had discussions with Intel Fellows on the matter, and they simply suggesting that each CPU we review should be tested in several motherboards. Unfortunately no commercial reviewer has that sort of bandwidth, showing perhaps a discontinuity in how many resources Intel’s engineers think we have (they have whole performance divisions for this).

Most users end up with the out-of-the-box experience, i.e. the peak power and the flat line, unless they buy a small form factor pre-built system. We’ve always tested processors with the typical out of the box experience with a high-end motherboard, which means that we will always get that infinite turbo performance characteristic on the processors that support it. We’ve never felt the need to limit the turbo values, as ultimately if we were to report significantly lower performance metrics than most of our users' experience, then there would be a lot of explaining to do if we start scoring a lot lower than what users see in their systems. Personally I try and benchmark a processor in a system as tweak-free as possible (the only one being to ensure that the fans are up on high for sufficient cooling). Our CPU benchmarking strategy is such that a full generation is tested in the same high-end enthusiast motherboard, minimizing differences within a CPU family. The best way to compare motherboard-to-motherboard would be to look at our motherboard reviews, and if you have a specific model in mind, please lodge a request, either via email or Twitter.

There is a slight difference to all of this, with some of the more commercial and enterprise motherboards, like the Q series or W series (and sometimes the H series). Because of the markets these go into, depending on the motherboard company, they will often adhere to Intel’s recommendations strictly. This is why users should beware of CPU reviews that compare two models in the same family on different motherboards, specially if one is in an H-series and another in a Z-series. For more enterprise products, such as Xeon W, Intel has locked availability to specific motherboards for those parts, we use those instead. We have an upcoming review of a pair of W-1200 Comet Lake series processors in the pipeline which shows this, or you can read our motherboard review here which has some of this data already.

I would add a personal comment about memory support as well. Intel’s specifications for these processors are DDR4-2933, but motherboard vendors obviously create boards to support DDR4-3600+. There is a large number of the tech audience that complain when DDR4-2933 memory is used, saying that any sane user can buy higher speed memory for the same price, and use that for extra free performance. We test at DDR4-2933 with JEDEC settings with Comet Lake, as the difference here is that DDR4-2933 is where Intel’s warranty comes into the equation. The turbo values that manufacturers set on the motherboards are still in warranty (I’ve asked Intel, they confirmed that their values for turbo are solely ‘recommendations'), whereas running beyond DDR4-2933 would be outside warranty. It would also be overclocking the memory, and if we're overclocking the memory, then why aren't we also overclocking the CPU? (That was a rhetorical question, the obvious answer is that we're not.) There has been plenty of comments about this over the years, enough for me to go make a video explaining it. Apologies if you disagree, but this is a hill I'm willing to die on. Please watch the video to learn more.

If you’re of the opinion that this is all a mess, you’re not the only one. Congratulations on peeking through a small window of the tech reviewer life! There’s a visitor’s book down below.



CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

When we first reviewed the 10-core Comet Lake processors, we noticed that a core (or two) seemed to take slightly longer to ping/pong than the others. These two parts are both derived from the 10-core silicon but with two cores disabled, and we still see a pattern of some cores having additional latency. The ring on the 8-core parts still acts like a 10-core ring, but it all depends on which cores were disabled.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

Both processors ramp from idle to full turbo in about six milliseconds, well within a single frame of standard gaming.



CPU Tests: Office and Science

Our previous set of ‘office’ benchmarks have often been a mix of science and synthetics, so this time we wanted to keep our office section purely on real world performance.

Agisoft Photoscan 1.3.3: link

The concept of Photoscan is about translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the final 3D model in both spatial accuracy and texturing accuracy. The algorithm has four stages, with some parts of the stages being single-threaded and others multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.

For the update to version 1.3.3, the Agisoft software now supports command line operation. Agisoft provided us with a set of new images for this version of the test, and a python script to run it. We’ve modified the script slightly by changing some quality settings for the sake of the benchmark suite length, as well as adjusting how the final timing data is recorded. The python script dumps the results file in the format of our choosing. For our test we obtain the time for each stage of the benchmark, as well as the overall time.

(1-1) Agisoft Photoscan 1.3, Complex Test

The 10700K takes a small lead.

Application Opening: GIMP 2.10.18

First up is a test using a monstrous multi-layered xcf file to load GIMP. While the file is only a single ‘image’, it has so many high-quality layers embedded it was taking north of 15 seconds to open and to gain control on the mid-range notebook I was using at the time.

What we test here is the first run - normally on the first time a user loads the GIMP package from a fresh install, the system has to configure a few dozen files that remain optimized on subsequent opening. For our test we delete those configured optimized files in order to force a ‘fresh load’ each time the software in run. As it turns out, GIMP does optimizations for every CPU thread in the system, which requires that higher thread-count processors take a lot longer to run.

We measure the time taken from calling the software to be opened, and until the software hands itself back over to the OS for user control. The test is repeated for a minimum of ten minutes or at least 15 loops, whichever comes first, with the first three results discarded.

(1-2) AppTimer: GIMP 2.10.18

The 10700K takes a small lead.

Science

In this version of our test suite, all the science focused tests that aren’t ‘simulation’ work are now in our science section. This includes Brownian Motion, calculating digits of Pi, molecular dynamics, and for the first time, we’re trialing an artificial intelligence benchmark, both inference and training, that works under Windows using python and TensorFlow.  Where possible these benchmarks have been optimized with the latest in vector instructions, except for the AI test – we were told that while it uses Intel’s Math Kernel Libraries, they’re optimized more for Linux than for Windows, and so it gives an interesting result when unoptimized software is used.

3D Particle Movement v2.1: Non-AVX and AVX2/AVX512

This is the latest version of this benchmark designed to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. This involves randomly moving particles in a 3D space using a set of algorithms that define random movement. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in.

The initial version of v2.1 is a custom C++ binary of my own code, and flags are in place to allow for multiple loops of the code with a custom benchmark length. By default this version runs six times and outputs the average score to the console, which we capture with a redirection operator that writes to file.

For v2.1, we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere. According to Jim Keller, there are only a couple dozen or so people who understand how to extract the best performance out of a CPU, and this guy is one of them. To keep things honest, AMD also has a copy of the code, but has not proposed any changes.

The 3DPM test is set to output millions of movements per second, rather than time to complete a fixed number of movements.

(2-1) 3D Particle Movement v2.1 (non-AVX)(2-2) 3D Particle Movement v2.1 (Peak AVX)

The 10700K takes a small lead (this is going to be a recurring theme).

y-Cruncher 0.78.9506: www.numberworld.org/y-cruncher

If you ask anyone what sort of computer holds the world record for calculating the most digits of pi, I can guarantee that a good portion of those answers might point to some colossus super computer built into a mountain by a super-villain. Fortunately nothing could be further from the truth – the computer with the record is a quad socket Ivy Bridge server with 300 TB of storage. The software that was run to get that was y-cruncher.

Built by Alex Yee over the last part of a decade and some more, y-Cruncher is the software of choice for calculating billions and trillions of digits of the most popular mathematical constants. The software has held the world record for Pi since August 2010, and has broken the record a total of 7 times since. It also holds records for e, the Golden Ratio, and others. According to Alex, the program runs around 500,000 lines of code, and he has multiple binaries each optimized for different families of processors, such as Zen, Ice Lake, Sky Lake, all the way back to Nehalem, using the latest SSE/AVX2/AVX512 instructions where they fit in, and then further optimized for how each core is built.

For our purposes, we’re calculating Pi, as it is more compute bound than memory bound. In single thread mode we calculate 250 million digits, while in multithreaded mode we go for 2.5 billion digits. That 2.5 billion digit value requires ~12 GB of DRAM, and so is limited to systems with at least 16 GB.

(2-3) yCruncher 0.78.9506 ST (250m Pi)(2-4) yCruncher 0.78.9506 MT (2.5b Pi)

NAMD 2.13 (ApoA1): Molecular Dynamics

One of the popular science fields is modeling the dynamics of proteins. By looking at how the energy of active sites within a large protein structure over time, scientists behind the research can calculate required activation energies for potential interactions. This becomes very important in drug discovery. Molecular dynamics also plays a large role in protein folding, and in understanding what happens when proteins misfold, and what can be done to prevent it. Two of the most popular molecular dynamics packages in use today are NAMD and GROMACS.

NAMD, or Nanoscale Molecular Dynamics, has already been used in extensive Coronavirus research on the Frontier supercomputer. Typical simulations using the package are measured in how many nanoseconds per day can be calculated with the given hardware, and the ApoA1 protein (92,224 atoms) has been the standard model for molecular dynamics simulation.

Luckily the compute can home in on a typical ‘nanoseconds-per-day’ rate after only 60 seconds of simulation, however we stretch that out to 10 minutes to take a more sustained value, as by that time most turbo limits should be surpassed. The simulation itself works with 2 femtosecond timesteps. We use version 2.13 as this was the recommended version at the time of integrating this benchmark into our suite. The latest nightly builds we’re aware have started to enable support for AVX-512, however due to consistency in our benchmark suite, we are retaining with 2.13. Other software that we test with has AVX-512 acceleration.

(2-5) NAMD ApoA1 Simulation

AI Benchmark 0.1.2 using TensorFlow: Link

Finding an appropriate artificial intelligence benchmark for Windows has been a holy grail of mine for quite a while. The problem is that AI is such a fast moving, fast paced word that whatever I compute this quarter will no longer be relevant in the next, and one of the key metrics in this benchmarking suite is being able to keep data over a long period of time. We’ve had AI benchmarks on smartphones for a while, given that smartphones are a better target for AI workloads, but it also makes some sense that everything on PC is geared towards Linux as well.

Thankfully however, the good folks over at ETH Zurich in Switzerland have converted their smartphone AI benchmark into something that’s useable in Windows. It uses TensorFlow, and for our benchmark purposes we’ve locked our testing down to TensorFlow 2.10, AI Benchmark 0.1.2, while using Python 3.7.6.

The benchmark runs through 19 different networks including MobileNet-V2, ResNet-V2, VGG-19 Super-Res, NVIDIA-SPADE, PSPNet, DeepLab, Pixel-RNN, and GNMT-Translation. All the tests probe both the inference and the training at various input sizes and batch sizes, except the translation that only does inference. It measures the time taken to do a given amount of work, and spits out a value at the end.

There is one big caveat for all of this, however. Speaking with the folks over at ETH, they use Intel’s Math Kernel Libraries (MKL) for Windows, and they’re seeing some incredible drawbacks. I was told that MKL for Windows doesn’t play well with multiple threads, and as a result any Windows results are going to perform a lot worse than Linux results. On top of that, after a given number of threads (~16), MKL kind of gives up and performance drops of quite substantially.

So why test it at all? Firstly, because we need an AI benchmark, and a bad one is still better than not having one at all. Secondly, if MKL on Windows is the problem, then by publicizing the test, it might just put a boot somewhere for MKL to get fixed. To that end, we’ll stay with the benchmark as long as it remains feasible.

(2-6) AI Benchmark 0.1.2 Total



CPU Tests: Simulation

Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.

DigiCortex v1.35: link

DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.

The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.

The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.

For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.

(3-1) DigiCortex 1.35 (32k Neuron, 1.8B Synapse)

For users wondering why the 5800X wins, it seems that Digicortex prefers single chiplet designs, and the more cores the better. On the Intel side, the 10700 pulls a slight lead.

Dwarf Fortress 0.44.12: Link

Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.

Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.

For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:

  • Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
  • Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
  • Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts

DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.

(3-2a) Dwarf Fortress 0.44.12 World Gen 65x65, 250 Yr(3-2b) Dwarf Fortress 0.44.12 World Gen 129x129, 550 Yr(3-2c) Dwarf Fortress 0.44.12 World Gen 257x257, 550 Yr

Dolphin v5.0 Emulation: Link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.

(3-3) Dolphin 5.0 Render Test



CPU Tests: Rendering

Rendering tests, compared to others, are often a little more simple to digest and automate. All the tests put out some sort of score or time, usually in an obtainable way that makes it fairly easy to extract. These tests are some of the most strenuous in our list, due to the highly threaded nature of rendering and ray-tracing, and can draw a lot of power. If a system is not properly configured to deal with the thermal requirements of the processor, the rendering benchmarks is where it would show most easily as the frequency drops over a sustained period of time. Most benchmarks in this case are re-run several times, and the key to this is having an appropriate idle/wait time between benchmarks to allow for temperatures to normalize from the last test.

Blender 2.83 LTS: Link

One of the popular tools for rendering is Blender, with it being a public open source project that anyone in the animation industry can get involved in. This extends to conferences, use in films and VR, with a dedicated Blender Institute, and everything you might expect from a professional software package (except perhaps a professional grade support package). With it being open-source, studios can customize it in as many ways as they need to get the results they require. It ends up being a big optimization target for both Intel and AMD in this regard.

For benchmarking purposes, we fell back to one rendering a frame from a detailed project. Most reviews, as we have done in the past, focus on one of the classic Blender renders, known as BMW_27. It can take anywhere from a few minutes to almost an hour on a regular system. However now that Blender has moved onto a Long Term Support model (LTS) with the latest 2.83 release, we decided to go for something different.

We use this scene, called PartyTug at 6AM by Ian Hubert, which is the official image of Blender 2.83. It is 44.3 MB in size, and uses some of the more modern compute properties of Blender. As it is more complex than the BMW scene, but uses different aspects of the compute model, time to process is roughly similar to before. We loop the scene for at least 10 minutes, taking the average time of the completions taken. Blender offers a command-line tool for batch commands, and we redirect the output into a text file.

(4-1) Blender 2.83 Custom Render Test

The 10700K takes a small lead.

Corona 1.3: Link

Corona is billed as a popular high-performance photorealistic rendering engine for 3ds Max, with development for Cinema 4D support as well. In order to promote the software, the developers produced a downloadable benchmark on the 1.3 version of the software, with a ray-traced scene involving a military vehicle and a lot of foliage. The software does multiple passes, calculating the scene, geometry, preconditioning and rendering, with performance measured in the time to finish the benchmark (the official metric used on their website) or in rays per second (the metric we use to offer a more linear scale).

The standard benchmark provided by Corona is interface driven: the scene is calculated and displayed in front of the user, with the ability to upload the result to their online database. We got in contact with the developers, who provided us with a non-interface version that allowed for command-line entry and retrieval of the results very easily.  We loop around the benchmark five times, waiting 60 seconds between each, and taking an overall average. The time to run this benchmark can be around 10 minutes on a Core i9, up to over an hour on a quad-core 2014 AMD processor or dual-core Pentium.

(4-2) Corona 1.3 Benchmark

The 10700K takes a small lead.

Crysis CPU-Only Gameplay

One of the most oft used memes in computer gaming is ‘Can It Run Crysis?’. The original 2007 game, built in the Crytek engine by Crytek, was heralded as a computationally complex title for the hardware at the time and several years after, suggesting that a user needed graphics hardware from the future in order to run it. Fast forward over a decade, and the game runs fairly easily on modern GPUs.

But can we also apply the same concept to pure CPU rendering? Can a CPU, on its own, render Crysis? Since 64 core processors entered the market, one can dream. So we built a benchmark to see whether the hardware can.

For this test, we’re running Crysis’ own GPU benchmark, but in CPU render mode. This is a 2000 frame test, with medium and low settings.

(4-3a) Crysis CPU Render at 320x200 Low(4-3b) Crysis CPU Render at 1080p Low

Almost playable.

POV-Ray 3.7.1: Link

A long time benchmark staple, POV-Ray is another rendering program that is well known to load up every single thread in a system, regardless of cache and memory levels. After a long period of POV-Ray 3.7 being the latest official release, when AMD launched Ryzen the POV-Ray codebase suddenly saw a range of activity from both AMD and Intel, knowing that the software (with the built-in benchmark) would be an optimization tool for the hardware.

We had to stick a flag in the sand when it came to selecting the version that was fair to both AMD and Intel, and still relevant to end-users. Version 3.7.1 fixes a significant bug in the early 2017 code that was advised against in both Intel and AMD manuals regarding to write-after-read, leading to a nice performance boost.

The benchmark can take over 20 minutes on a slow system with few cores, or around a minute or two on a fast system, or seconds with a dual high-core count EPYC. Because POV-Ray draws a large amount of power and current, it is important to make sure the cooling is sufficient here and the system stays in its high-power state. Using a motherboard with a poor power-delivery and low airflow could create an issue that won’t be obvious in some CPU positioning if the power limit only causes a 100 MHz drop as it changes P-states.

(4-4) POV-Ray 3.7.1

V-Ray: Link

We have a couple of renderers and ray tracers in our suite already, however V-Ray’s benchmark came through for a requested benchmark enough for us to roll it into our suite. Built by ChaosGroup, V-Ray is a 3D rendering package compatible with a number of popular commercial imaging applications, such as 3ds Max, Maya, Undreal, Cinema 4D, and Blender.

We run the standard standalone benchmark application, but in an automated fashion to pull out the result in the form of kilosamples/second. We run the test six times and take an average of the valid results.

(4-5) V-Ray Renderer

Cinebench R20: Link

Another common stable of a benchmark suite is Cinebench. Based on Cinema4D, Cinebench is a purpose built benchmark machine that renders a scene with both single and multi-threaded options. The scene is identical in both cases. The R20 version means that it targets Cinema 4D R20, a slightly older version of the software which is currently on version R21. Cinebench R20 was launched given that the R15 version had been out a long time, and despite the difference between the benchmark and the latest version of the software on which it is based, Cinebench results are often quoted a lot in marketing materials.

Results for Cinebench R20 are not comparable to R15 or older, because both the scene being used is different, but also the updates in the code bath. The results are output as a score from the software, which is directly proportional to the time taken. Using the benchmark flags for single CPU and multi-CPU workloads, we run the software from the command line which opens the test, runs it, and dumps the result into the console which is redirected to a text file. The test is repeated for a minimum of 10 minutes for both ST and MT, and then the runs averaged.

(4-6a) CineBench R20 Single Thread(4-6b) CineBench R20 Multi-Thread

We are still in the process of rolling out CineBench R23 (you can see the results in our benchmark database here), but had not tested it on all the CPUs in this review at this time. It will be added to future reviews.



CPU Tests: Encoding

One of the interesting elements on modern processors is encoding performance. This covers two main areas: encryption/decryption for secure data transfer, and video transcoding from one video format to another.

In the encrypt/decrypt scenario, how data is transferred and by what mechanism is pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security.

Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.

HandBrake 1.32: Link

Video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codecs, VP9 and AV1, there are others that are prominent: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H.265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content. There are other codecs coming to market designed for specific use cases all the time.

Handbrake is a favored tool for transcoding, with the later versions using copious amounts of newer APIs to take advantage of co-processors, like GPUs. It is available on Windows via an interface or can be accessed through the command-line, with the latter making our testing easier, with a redirection operator for the console output.

We take the compiled version of this 16-minute YouTube video about Russian CPUs at 1080p30 h264 and convert into three different files: (1) 480p30 ‘Discord’, (2) 720p30 ‘YouTube’, and (3) 4K60 HEVC.

(5-1a) Handbrake 1.3.2, 1080p30 H264 to 480p Discord(5-1b) Handbrake 1.3.2, 1080p30 H264 to 720p YouTube(5-1c) Handbrake 1.3.2, 1080p30 H264 to 4K60 HEVC

7-Zip 1900: Link

The first compression benchmark tool we use is the open-source 7-zip, which typically offers good scaling across multiple cores. 7-zip is the compression tool most cited by readers as one they would rather see benchmarks on, and the program includes a built-in benchmark tool for both compression and decompression.

The tool can either be run from inside the software or through the command line. We take the latter route as it is easier to automate, obtain results, and put through our process. The command line flags available offer an option for repeated runs, and the output provides the average automatically through the console. We direct this output into a text file and regex the required values for compression, decompression, and a combined score.

(5-2c) 7-Zip 1900 Combined Score

AES Encoding

Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.

(5-3) AES Encoding

WinRAR 5.90: Link

For the 2020 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack

  • 33 video files , each 30 seconds, in 1.37 GB,
  • 2834 smaller website files in 370 folders in 150 MB,
  • 100 Beat Saber music tracks and input files, for 451 MB

This is a mixture of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test for 20 minutes times and take the average of the last five runs when the benchmark is in a steady state.

For automation, we use AHK’s internal timing tools from initiating the workload until the window closes signifying the end. This means the results are contained within AHK, with an average of the last 5 results being easy enough to calculate.

(5-4) WinRAR 5.90 Test, 3477 files, 1.96 GB



CPU Tests: Legacy and Web

In order to gather data to compare with older benchmarks, we are still keeping a number of tests under our ‘legacy’ section. This includes all the former major versions of CineBench (R15, R11.5, R10) as well as x264 HD 3.0 and the first very naïve version of 3DPM v2.1. We won’t be transferring the data over from the old testing into Bench, otherwise it would be populated with 200 CPUs with only one data point, so it will fill up as we test more CPUs like the others.

The other section here is our web tests.

Web Tests: Kraken, Octane, and Speedometer

Benchmarking using web tools is always a bit difficult. Browsers change almost daily, and the way the web is used changes even quicker. While there is some scope for advanced computational based benchmarks, most users care about responsiveness, which requires a strong back-end to work quickly to provide on the front-end. The benchmarks we chose for our web tests are essentially industry standards – at least once upon a time.

It should be noted that for each test, the browser is closed and re-opened a new with a fresh cache. We use a fixed Chromium version for our tests with the update capabilities removed to ensure consistency.

Mozilla Kraken 1.1

Kraken is a 2010 benchmark from Mozilla and does a series of JavaScript tests. These tests are a little more involved than previous tests, looking at artificial intelligence, audio manipulation, image manipulation, json parsing, and cryptographic functions. The benchmark starts with an initial download of data for the audio and imaging, and then runs through 10 times giving a timed result.

We loop through the 10-run test four times (so that’s a total of 40 runs), and average the four end-results. The result is given as time to complete the test, and we’re reaching a slow asymptotic limit with regards the highest IPC processors.

(7-1) Kraken 1.1 Web Test

Google Octane 2.0

Our second test is also JavaScript based, but uses a lot more variation of newer JS techniques, such as object-oriented programming, kernel simulation, object creation/destruction, garbage collection, array manipulations, compiler latency and code execution.

Octane was developed after the discontinuation of other tests, with the goal of being more web-like than previous tests. It has been a popular benchmark, making it an obvious target for optimizations in the JavaScript engines. Ultimately it was retired in early 2017 due to this, although it is still widely used as a tool to determine general CPU performance in a number of web tasks.

(7-2) Google Octane 2.0 Web Test

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a test over a series of JavaScript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics.

We repeat over the benchmark for a dozen loops, taking the average of the last five.

(7-3) Speedometer 2.0 Web Test

Legacy Tests

(6-5a) x264 HD 3.0 Pass 1(6-5b) x264 HD 3.0 Pass 2(6-4a) 3DPM v1 ST(6-4b) 3DPM v1 MT(6-3a) CineBench R15 ST(6-3b) CineBench R15 MT



CPU Tests: Synthetic and SPEC

Most of the people in our industry have a love/hate relationship when it comes to synthetic tests. On the one hand, they’re often good for quick summaries of performance and are easy to use, but most of the time the tests aren’t related to any real software. Synthetic tests are often very good at burrowing down to a specific set of instructions and maximizing the performance out of those. Due to requests from a number of our readers, we have the following synthetic tests.

Linux OpenSSL Speed: SHA256

One of our readers reached out in early 2020 and stated that he was interested in looking at OpenSSL hashing rates in Linux. Luckily OpenSSL in Linux has a function called ‘speed’ that allows the user to determine how fast the system is for any given hashing algorithm, as well as signing and verifying messages.

OpenSSL offers a lot of algorithms to choose from, and based on a quick Twitter poll, we narrowed it down to the following:

  1. rsa2048 sign and rsa2048 verify
  2. sha256 at 8K block size
  3. md5 at 8K block size

For each of these tests, we run them in single thread and multithreaded mode. All the graphs are in our benchmark database, Bench, and we use the sha256 results in published reviews.

(8-3c) Linux OpenSSL Speed sha256 8K Block (1T)(8-4c) Linux OpenSSL Speed sha256 8K Block (nT)

Specifically on the sha256 tests, both AMD and Via pull out a lead due to a dedicated sha256 compute block in each core. Intel is enabling accelerated sha256 via AVX-512 to its processors at a later date.

GeekBench 5: Link

As a common tool for cross-platform testing between mobile, PC, and Mac, GeekBench is an ultimate exercise in synthetic testing across a range of algorithms looking for peak throughput. Tests include encryption, compression, fast Fourier transform, memory operations, n-body physics, matrix operations, histogram manipulation, and HTML parsing.

I’m including this test due to popular demand, although the results do come across as overly synthetic, and a lot of users often put a lot of weight behind the test due to the fact that it is compiled across different platforms (although with different compilers).

We have both GB5 and GB4 results in our benchmark database. GB5 was introduced to our test suite after already having tested ~25 CPUs, and so the results are a little sporadic by comparison. These spots will be filled in when we retest any of the CPUs.

(8-1c) Geekbench 5 Single Thread(8-1d) Geekbench 5 Multi-Thread

LinX 0.9.5 LINPACK

One of the benchmarks I’ve been after for a while is just something that outputs a very simple GFLOPs FP64 number, or in the case of AI I’d like to get a value for TOPs at a given level of quantization (FP32/FP16/INT8 etc). The most popular tool for doing this on supercomputers is a form of LINPACK, however for consumer systems it’s a case of making sure that the software is optimized for each CPU.

LinX has been a popular interface for LINPACK on Windows for a number of years. However the last official version was 0.6.5, launched in 2015, before the latest Ryzen hardware came into being. HWTips in Korea has been updating LinX and has separated out into two versions, one for Intel and one for AMD, and both have reached version 0.9.5. Unfortunately the AMD version is still a work in progress, as it doesn’t work on Zen 2.

There does exist a program called Linpack Extreme 1.1.3, which claims to be updated to use the latest version of the Intel Math Kernel Libraries. It works great, however the way the interface has been designed means that it can’t be automated for our uses, so we can’t use it.

For LinX 0.9.5, there also is a difficulty of what parameters to put into LINPACK. The two main parameters are problem size and time – choose a problem size too small, and you won’t get peak performance. Choose it too large, and the calculation can go on for hours. To that end, we use the following algorithms as a compromise:

  • Memory Use  = Floor(1000 + 20*sqrt(threads)) MB
  • Time = Floor(10+sqrt(threads)) minutes

For a 4 thread system, we use 1040 MB and run for 12 minutes.
For a 128 thread system, we use 1226 MB and run for 21 minutes.

(8-5) LinX 0.9.5 LINPACK

 

CPU Tests: SPEC

SPEC2017 and SPEC2006 is a series of standardized tests used to probe the overall performance between different systems, different architectures, different microarchitectures, and setups. The code has to be compiled, and then the results can be submitted to an online database for comparison. It covers a range of integer and floating point workloads, and can be very optimized for each CPU, so it is important to check how the benchmarks are being compiled and run.

We run the tests in a harness built through Windows Subsystem for Linux, developed by our own Andrei Frumusanu. WSL has some odd quirks, with one test not running due to a WSL fixed stack size, but for like-for-like testing is good enough. SPEC2006 is deprecated in favor of 2017, but remains an interesting comparison point in our data. Because our scores aren’t official submissions, as per SPEC guidelines we have to declare them as internal estimates from our part.

For compilers, we use LLVM both for C/C++ and Fortan tests, and for Fortran we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.

clang version 10.0.0
clang version 7.0.1 (ssh://[email protected]/flang-compiler/flang-driver.git
 24bd54da5c41af04838bbe7b68f830840d47fc03)

-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2
-mfma -mavx -mavx2

Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions. We decided to build our SPEC binaries on AVX2, which puts a limit on Haswell as how old we can go before the testing will fall over. This also means we don’t have AVX512 binaries, primarily because in order to get the best performance, the AVX-512 intrinsic should be packed by a proper expert, as with our AVX-512 benchmark. All of the major vendors, AMD, Intel, and Arm, all support the way in which we are testing SPEC.

To note, the requirements for the SPEC licence state that any benchmark results from SPEC have to be labelled ‘estimated’ until they are verified on the SPEC website as a meaningful representation of the expected performance. This is most often done by the big companies and OEMs to showcase performance to customers, however is quite over the top for what we do as reviewers.

For each of the SPEC targets we are doing, SPEC2006 rate-1, SPEC2017 speed-1, and SPEC2017 speed-N, rather than publish all the separate test data in our reviews, we are going to condense it down into a few interesting data points. The full per-test values are in our benchmark database.

(9-0a) SPEC2006 1T Geomean Total(9-0b) SPEC2017 1T Geomean Total(9-0c) SPEC2017 nT Geomean Total

Both of the 8-core Core i7 parts here are handily beaten by AMD's 6-core Ryzen 5 in ST and MT.



Gaming Tests: Chernobylite

Despite the advent of recent TV shows like Chernobyl, recreating the situation revolving around the 1986 Chernobyl nuclear disaster, the concept of nuclear fallout and the town of Pripyat have been popular settings for a number of games – mostly first person shooters. Chernobylite is an indie title that plays on a science-fiction survival horror experience and uses a 3D-scanned recreation of the real Chernobyl Exclusion Zone. It involves challenging combat, a mix of free exploration with crafting and non-linear story telling. While still in early access, it is already picking up plenty of awards.

I picked up Chernobylite while still in early access, and was impressed by its in-game benchmark, showcasing complex building structure with plenty of trees and structures where aliasing becomes important. The in-game benchmark is an on-rails experience through the scenery, covering both indoor and outdoor scenes – it ends up being very CPU limited in the way it is designed. We have taken an offline version of Chernobylite to use in our tests, and we are testing the following settings combinations:

  • 360p Low, 1440p Low, 4K Low, 1080p Max

We do as many runs within 10 minutes per resolution/setting combination, and then take averages.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Civilization 6

Originally penned by Sid Meier and his team, the Civilization series of turn-based strategy games are a cult classic, and many an excuse for an all-nighter trying to get Gandhi to declare war on you due to an integer underflow. Truth be told I never actually played the first version, but I have played every edition from the second to the sixth, including the fourth as voiced by the late Leonard Nimoy, and it a game that is easy to pick up, but hard to master.

Benchmarking Civilization has always been somewhat of an oxymoron – for a turn based strategy game, the frame rate is not necessarily the important thing here and even in the right mood, something as low as 5 frames per second can be enough. With Civilization 6 however, Firaxis went hardcore on visual fidelity, trying to pull you into the game. As a result, Civilization can taxing on graphics and CPUs as we crank up the details, especially in DirectX 12.

For this benchmark, we are using the following settings:

  • 480p Low, 1440p Low, 4K Low, 1080p Max

For automation, Firaxis supports the in-game automated benchmark from the command line, and output a results file with frame times. We do as many runs within 10 minutes per resolution/setting combination, and then take averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Deus Ex Mankind Divided

Deus Ex is a franchise with a wide level of popularity. Despite the Deus Ex: Mankind Divided (DEMD) version being released in 2016, it has often been heralded as a game that taxes the CPU. It uses the Dawn Engine to create a very complex first-person action game with science-fiction based weapons and interfaces. The game combines first-person, stealth, and role-playing elements, with the game set in Prague, dealing with themes of transhumanism, conspiracy theories, and a cyberpunk future. The game allows the player to select their own path (stealth, gun-toting maniac) and offers multiple solutions to its puzzles.

DEMD has an in-game benchmark, an on-rails look around an environment showcasing some of the game’s most stunning effects, such as lighting, texturing, and others. Even in 2020, it’s still an impressive graphical showcase when everything is jumped up to the max. For this title, we are testing the following resolutions:

  • 600p Low, 1440p Low, 4K Low, 1080p Max

The benchmark runs for about 90 seconds. We do as many runs within 10 minutes per resolution/setting combination, and then take averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Final Fantasy XIV

Despite being one number less than Final Fantasy 15, because FF14 is a massively-multiplayer online title, there are always yearly update packages which give the opportunity for graphical updates too. In 2019, FFXIV launched its Shadowbringers expansion, and an official standalone benchmark was released at the same time for users to understand what level of performance they could expect. Much like the FF15 benchmark we’ve been using for a while, this test is a long 7-minute scene of simulated gameplay within the title. There are a number of interesting graphical features, and it certainly looks more like a 2019 title than a 2010 release, which is when FF14 first came out.

With this being a standalone benchmark, we do not have to worry about updates, and the idea for these sort of tests for end-users is to keep the code base consistent. For our testing suite, we are using the following settings:

  • 768p Minimum, 1440p Minimum, 4K Minimum, 1080p Maximum

As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Final Fantasy XV

Upon arriving to PC, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console. As a fantasy RPG with a long history, the fruits of Square-Enix’s successful partnership with NVIDIA are on display. The game uses the internal Luminous Engine, and as with other Final Fantasy games, pushes the imagination of what we can do with the hardware underneath us. To that end, FFXV was one of the first games to promote the use of ‘video game landscape photography’, due in part to the extensive detail even at long range but also with the integration of NVIDIA’s Ansel software, that allowed for super-resolution imagery and post-processing effects to be applied.

In preparation for the launch of the game, Square Enix opted to release a standalone benchmark. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues. We use the following settings:

  • 720p Standard, 1080p Standard, 4K Standard, 8K Standard

For automation, the title accepts command line inputs for both resolution and settings, and then auto-quits when finished. As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: World of Tanks

Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved eSports status when it debuted at the World Cyber Games back in 2012.

World of Tanks enCore is a demo application for its new graphics engine penned by the Wargaming development team. Over time the new core engine has been implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine runs optimally on their system. There is technically a Ray Tracing version of the enCore benchmark now available, however because it can’t be deployed standalone without the installer, we decided against using it. If that gets fixed, then we can look into it.

The benchmark tool comes with a number of presets:

  • 768p Minimum, 1080p Standard, 1080p Max, 4K Max (not a preset)

The odd one out is the 4K Max preset, because the benchmark doesn’t automatically have a 4K option – to get this we edit the acceptable resolutions ini file, and then we can select 4K. The benchmark outputs its own results file, with frame times, making it very easy to parse the data needed for average and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Borderlands 3

As a big Borderlands fan, having to sit and wait six months for the EPIC Store exclusive to expire before we saw it on Steam felt like a long time to wait. The fourth title of the franchise, if you exclude the TellTale style-games, BL3 expands the universe beyond Pandora and its orbit, with the set of heroes (plus those from previous games) now cruising the galaxy looking for vaults and the treasures within. Popular Characters like Tiny Tina, Claptrap, Lilith, Dr. Zed, Zer0, Tannis, and others all make appearances as the game continues its cel-shaded design but with the graphical fidelity turned up. Borderlands 1 gave me my first ever taste of proper in-game second order PhysX, and it’s a high standard that continues to this day.

BL3 works best with online access, so it is filed under our online games section. BL3 is also one of our biggest downloads, requiring 100+ GB. As BL3 supports resolution scaling, we are using the following settings:

  • 360p Very Low, 1440p Very Low, 4K Very Low, 1080p Badass

BL3 has its own in-game benchmark, which recreates a set of on-rails scenes with a variety of activity going on in each, such as shootouts, explosions, and wildlife. The benchmark outputs its own results files, including frame times, which can be parsed for our averages/percentile data.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: F1 2019

The F1 racing games from Codemasters have been popular benchmarks in the tech community, mostly for ease-of-use and that they seem to take advantage of any area of a machine that might be better than another. The 2019 edition of the game features all 21 circuits on the calendar for that year, and includes a range of retro models and DLC focusing on the careers of Alain Prost and Ayrton Senna. Built on the EGO Engine 3.0, the game has been criticized similarly to most annual sports games, by not offering enough season-to-season graphical fidelity updates to make investing in the latest title worth it, however the 2019 edition revamps up the Career mode, with features such as in-season driver swaps coming into the mix. The quality of the graphics this time around is also superb, even at 4K low or 1080p Ultra.

For our test, we put Alex Albon in the Red Bull in position #20, for a dry two-lap race around Austin. We test at the following settings:

  • 768p Ultra Low, 1440p Ultra Low, 4K Ultra Low, 1080p Ultra

In terms of automation, F1 2019 has an in-game benchmark that can be called from the command line, and the output file has frame times. We repeat each resolution setting for a minimum of 10 minutes, taking the averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Far Cry 5

The fifth title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration with a lot of configurability.

Unfortunately, the game doesn’t like us changing the resolution in the results file when using certain monitors, resorting to 1080p but keeping the quality settings. But resolution scaling does work, so we decided to fix the resolution at 1080p and use a variety of different scaling factors to give the following:

  • 720p Low, 1440p Low, 4K Low, 1440p Max.

Far Cry 5 outputs a results file here, but that the file is a HTML file, which showcases a graph of the FPS detected. At no point in the HTML file does it contain the frame times for each frame, but it does show the frames per second, as a value once per second in the graph. The graph in HTML form is a series of (x,y) co-ordinates scaled to the min/max of the graph, rather than the raw (second, FPS) data, and so using regex I carefully tease out the values of the graph, convert them into a (second, FPS) format, and take our values of averages and percentiles that way.

If anyone from Ubisoft wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to [email protected]. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).

As with the other gaming tests, we run each resolution/setting combination for a minimum of 10 minutes and take the relevant frame data for averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Gears Tactics

Remembering the original Gears of War brings back a number of memories – some good, and some involving online gameplay. The latest iteration of the franchise was launched as I was putting this benchmark suite together, and Gears Tactics is a high-fidelity turn-based strategy game with an extensive single player mode. As with a lot of turn-based games, there is ample opportunity to crank up the visual effects, and here the developers have put a lot of effort into creating effects, a number of which seem to be CPU limited.

Gears Tactics has an in-game benchmark, roughly 2.5 minutes of AI gameplay starting from the same position but using a random seed for actions. Much like the racing games, this usually leads to some variation in the run-to-run data, so for this benchmark we are taking the geometric mean of the results. One of the biggest things that Gears Tactics can do is on the resolution scaling, supporting 8K, and so we are testing the following settings:

  • 720p Low, 4K Low, 8K Low, 1080p Ultra

For results, the game showcases a mountain of data when the benchmark is finished, such as how much the benchmark was CPU limited and where, however none of that is ever exported into a file we can use. It’s just a screenshot which we have to read manually.

If anyone from the Gears Tactics team wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to [email protected]. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).

As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed. For this benchmark, we manually read each of the screenshots for each quality/setting/run combination. The benchmark does also give 95th percentiles and frame averages, so we can use both of these data points.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA to help optimize the title. At this point GTA V is super old, but still super useful as a benchmark – it is a complicated test with many features that modern titles today still struggle with. With rumors of a GTA 6 on the horizon, I hope Rockstar make that benchmark as easy to use as this one is.

GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

We are using the following settings:

  • 720p Low, 1440p Low, 4K Low, 1080p Max

The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data. The benchmark can also be called from the command line, making it very easy to use.

There is one funny caveat with GTA. If the CPU is too slow, or has too few cores, the benchmark loads, but it doesn’t have enough time to put items in the correct position. As a result, for example when running our single core Sandy Bridge system, the jet ends up stuck at the middle of an intersection causing a traffic jam. Unfortunately this means the benchmark never ends, but still amusing.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

 

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Red Dead Redemption 2

It’s great to have another Rockstar benchmark in the mix, and the launch of Red Dead Redemption 2 (RDR2) on the PC gives us a chance to do that. Building on the success of the original RDR, the second incarnation came to Steam in December 2019 having been released on consoles first. The PC version takes the open-world cowboy genre into the start of the modern age, with a wide array of impressive graphics and features that are eerily close to reality.

For RDR2, Rockstar kept the same benchmark philosophy as with Grand Theft Auto V, with the benchmark consisting of several cut scenes with different weather and lighting effects, with a final scene focusing on an on-rails environment, only this time with mugging a shop leading to a shootout on horseback before riding over a bridge into the great unknown. Luckily most of the command line options from GTA V are present here, and the game also supports resolution scaling. We have the following tests:

  • 384p Minimum, 1440p Minimum, 8K Minimum, 1080p Max

For that 8K setting, I originally thought I had the settings file at 4K and 1.0x scaling, but it was actually set at 2.0x giving that 8K.  For the sake of it, I decided to keep the 8K settings.

For our results, we run through each resolution and setting configuration for a minimum of 10 minutes, before averaging and parsing the frame time data.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Strange Brigade

Strange Brigade is based in 1903’s Egypt, and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen, who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.

The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark as an on-rails experience through the game. For quality, the game offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. Strange Brigade supports Vulkan and DX12, and so we test on both.

  • 720p Low, 1440p Low, 4K Low, 1080p Ultra

The automation for Strange Brigade is one of the easiest in our suite – the settings and quality can be changed by pre-prepared .ini files, and the benchmark is called via the command line. The output includes all the frame time data.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Conclusion: TDP is Not Fit For Purpose

In years gone by, processors were sold with a single frequency and power rating. It was very quickly realized that if a processor could effectively go to sleep, using either lower voltage or lower frequency (or both) then a lot of idle power could be saved. Going the other way, processor designers realized that for temporary short bursts, a core could run at a higher frequency before it reached a thermal limit. Also, using a multi-core processor meant that either the power budget could be shared across all the cores, or it could be focused in one.

Both AMD and Intel have noticed this over time, and both companies have different attitudes on how they report numbers relating to ‘base frequency’ and related power as well as the bursty ‘turbo frequency’ and related power. Out of those four metrics, the only one Intel doesn’t provide is turbo power, because from their perspective it is system dependent.

(0-0) Peak Power

Intel lets motherboard manufacturers determine how long a system can turbo for, and what that budget is. Intel encourages motherboard manufacturers to over-engineer the motherboards, not only for overclocking, but for non-overclockable CPUs to get the best performance for longer. This really messes up what the ‘default out-of-the-box performance’ should be if different motherboards give different values. The trend lately is that enthusiast motherboards enable an unlimited turbo budget, and the user building their system just has to deal with it.

This means that users who buy the Core i7-10700 in this review, despite the 65 W rating on the box, will have to cater for a system that will not only peak around 215 W, but sustain that 215 W during any extended high-performance load, such as rendering or compute. We really wished Intel put this 215 W value on the box to help end-users determine their cooling, as without sufficient guidance, users could be hitting thermal limits without even knowing why. At this point, 'Intel Recommended Values' for turbo time and budget mean nothing outside of Intel's own OEM partners building commercial systems.

Core i7-10700 vs Core i7-10700K Performance

In the review we highlighted that these two processors have a peak turbo frequency difference of 300 MHz and an all-core turbo frequency difference of 100 MHz. The fact that one is rated at 65 W and the other is rated at 125 W is inconsequential here, given that most end-user motherboards will simply enable turbo all the time. This means the performance in most of our tests between the two is practically identical, and consummate to a 100-300 MHz frequency difference.

In practically all of our tests, the Core i7-10700K is ahead by a super slim margin. At $387 for the 10700K compared to $335 for the 10700, the performance difference is not enough to warrant the $52 price difference between the two. Performance per dollar sides mostly with the Core i7-10700, although users getting the i7-10700K will likely look towards overclocking their processor to get the most out of it – that ultimately is what to pay for.

The other comparison point is with the Ryzen 5 5600X, which has two fewer cores but costs $299. In practically every test, the increased IPC of the Ryzen over Intel means that it sits identical with the Core i7 processors, AMD is cheaper on list price, and at a much lower power (AMD will peak around 76 W, compared to 215 W). AM4 motherboards are also abundant, while corresponding Intel motherboards are still expensive. The problem here however is that AMD is having such high demand for its product lines right now that finding one in stock might be difficult, and it probably won’t be at its recommended price.

Users in this price bracket have a tough choice – the more efficient AMD processor that might be in stock, compared to the Intel processor that will be in stock but more cooling will likely be required.

Log in

Don't have an account? Sign up now