Original Link: https://www.anandtech.com/show/16495/intel-rocket-lake-14nm-review-11900k-11700k-11600k



Today is the official launch of Intel’s 11th Generation Core processor family, given the internal name ‘Rocket Lake’. Rocket Lake showcases new performance gains for Intel in the desktop space, with a raw clock-for-clock performance uplift in a number of key workloads.

In order to accomplish this, Intel has retrofitted its 10nm CPU and GPU designs back to 14nm, because only 14nm can achieve the frequency required. In exchange, the new processors to get this performance run hot, cost more for Intel to produce, have two fewer cores at the high end, but customers also get PCIe 4.0 on Intel’s mainstream desktop platform for the first time.

In our review today, we will be going over Intel’s new hardware, why it exists, and how it performs, focusing specifically on Intel’s new flagship, the Core i9-11900K, which has eight cores and can boost up to 5.3 GHz

Intel’s Rocket Lake: Core i9, Core i7, and Core i5

The new Intel 11th Gen Core desktop processor family will start with Core i5, with six cores and twelve threads, through to Core i7 and Core i9, both with eight cores and sixteen threads. All processors will support DDR4-3200 natively, and offer 20 PCIe 4.0 lanes in supported motherboards – these lanes will enable graphics and storage direct from the processor, typically in an x16/x4 or x8/x8/x4 combination.

Both the Core i9 and Core i7 this time around have the same core count - normally the Core i9 would offer an obvious difference, such as more cores, but for this generation the difference is more subtle: Core i9 will offer higher frequencies and Thermal Velocity Boost (TVB). The Core i9-K and i9-KF will also feature Intel’s new Adaptive Boost Technology (ABT). We’ll go over Intel’s Turbo nomenclature later in the article.

Intel 11th Gen Core Rocket Lake
Core i9
AnandTech Cores
Threads
Base
Freq
1T
Peak
nT
Turbo
TDP
(W)
IGP
UHD
Price
1ku
i9-11900K 8 / 16 3500 5300 4700 125 750 $539
i9-11900KF 8 / 16 3500 5300 4700 125 - $513
i9-11900 8 / 16 2500 5200 4600 65 750 $439
i9-11900F 8 / 16 2500 5200 4600 65 - $422
i9-11900T 8 / 16 1500 4900 3700 35 750 $439

At the top of the stack is the Core i9-11900K. Intel has set the 1000-unit pricing of the Core i9-11900K at $539. Note that Intel does this 1k unit pricing for OEMs, and the final retail price is often $10-$25 higher, but in the case of the Core i9-11900K, users are currently looking at a $615 price point at Newegg. This is well above AMD’s Ryzen 7 5800X at $449 SEP (MSRP), which is also an 8-core processor, and beyond even the Ryzen 9 5900X at $549 SEP. Intel is stating that along with better gaming performance, this processor also offers next-generation integrated graphics, support for new AI instructions, and enhanced media support for the price differential.

The Core i9-11900K is the highlight processor of today’s review, and it has a base frequency of 3.5 GHz, alongside a peak turbo of 5.3 GHz in Thermal Velocity Boost mode, 5.2 GHz otherwise on the favored core, or 5.1 GHz on non-favored cores. The all-core frequency is 4.8 GHz in TVB turbo mode, or 4.7 GHz otherwise, or it can ‘float’ the turbo up to 5.1 GHz when ABT is enabled, however ABT is disabled by default.

The only processor not getting TVB in the Core i9 family is the i9-11900T, which is the 35 W member of the family. This processor has 35 W on the box because its base frequency is 1.5 GHz, although it will turbo up to 4.9 GHz single core and 3.7 GHz all-core. These T processors typically end up in OEM systems and mini-PCs which are more likely to strictly follow Intel’s turbo recommendations.

All Core i9 processors will support DDR4-3200, and the specification is to enable a 1:1 frequency mode with the memory controller at this speed.

Intel 11th Gen Core Rocket Lake
Core i7
AnandTech Cores
Threads
Base
Freq
1T
Peak
nT
Turbo
TDP
(W)
IGP
UHD
Price
1ku
i7-11700K 8 / 16 3600 5000 4600 125 750 $399
i7-11700KF 8 / 16 3600 5000 4600 125 - $374
i7-11700 8 / 16 2500 4900 4400 65 750 $323
i7-11700F 8 / 16 2500 4900 4400 65 - $298
i7-11700T 8 / 16 1400 4600 3600 35 750 $323
 

The Core i7 family includes the Core i7-11700K, which we have already reviewed with our retail sample, and tested on the latest microcode to date. This processor offers eight cores, sixteen threads, with a single core turbo of 5.0 GHz on the favored core, 4.9 GHz otherwise, and 4.6 GHz all-core turbo. The rated TDP is 125 W, although we saw 160 W during a regular load, 225 W peaks with an AVX2 rendering load, and 292 W peak power with an AVX-512 compute load.

On the topic of memory support, the Core i7 family does support DDR4-3200, however Intel’s specifications for Rocket Lake are that any non-Core i9 processor should run at a 2:1 ratio of DRAM to memory controller by default, rather than 1:1, effectively lowering memory performance. This creates some segmentation between Core i9 and the rest, as for the rest of the processors the fastest supported 1:1 memory ratio is DDR4-2933. Despite this technical specification, we can confirm in our testing of our Core i7-11700K that all the motherboards we have used so far actually default to 1:1 at DDR4-3200. It would appear that motherboard manufacturers are confident enough in their memory designs to ignore Intel’s specifications on this.

On pricing, the Intel Core i7-11700K is $399, which is important in two ways.

First, it is $140 cheaper than the Core i9-K, and it only loses a few hundred MHz. That leaves the Core i9 high and dry on day one. Unless there’s something special in that chip we haven’t been told about that we have to discover come retail day on March 30th, that’s a vast pricing difference for a small performance difference.

Second is the comparative AMD processor, the Ryzen 7 5800X, which has 8 cores and has a $449 SEP. If both processors were found at these prices, then the comparison is a good one – the Ryzen 7 5800X in our testing scored +8% in CPU tests and +1% in gaming tests (1080p Max). The Ryzen is very much the more power-efficient processor, however the Intel has integrated graphics (an argument that disappears with KF at $374). It will be interesting to see what recommendations people come to with that pricing.

Intel 11th Gen Core Rocket Lake
Core i5
AnandTech Cores
Threads
Base
Freq
1T
Peak
nT
Turbo
TDP
(W)
IGP
UHD
Price
1ku
i5-11600K 6 / 12 3900 4900 4600 125 750 $262
i5-11600KF 6 / 12 3900 4900 4600 125 - $237
i5-11600 6 / 12 2800 4800 4300 65 750 $213
i5-11600T 6 / 12 1700 4100 3500 35 750 $213
i5-11500 6 / 12 2700 4600 4200 65 750 $192
i5-11500T 6 / 12 1500 3900 3400 35 750 $192
i5-11400 6 / 12 2600 4400 4200 65 730 $182
i5-11400F 6 / 12 2600 4400 4200 65 - $157
i5-11400T 6 / 12 1300 3700 3300 35 730 $182

The Core i5 spreads out a lot with more offerings, from $157 for the Core i5-11400F, up to $262 for the Core i5-11600K. All these processors have six cores and twelve threads, all have the traditional Intel Turbo 2.0, and all support DDR4-3200 (2:1) or DDR4-2933 (1:1).

Another difference within these parts is that the Core i5-11400 and Core i5-11400T have UHD Graphics 730, not 750, which means using a 24 EU configuration rather than the full 32 EUs.

Intel’s Competition: Intel vs Intel vs AMD

With both the Core i9 and the Core i7 being eight cores and sixteen threads, the natural competitor to both would be either (a) Intel’s previous generation of processors or (b) AMD’s Ryzen 7 5800X, which is starting to come back into the market with sufficient stock that it can be purchased at its suggested retail price.

Rocket Lake Competition
AnandTech Core i7
10700K
Core i9
10900K
Core i7
11700K
Core i9
11900K
  Ryzen 7
5800X
Ryzen 9
5900X
uArch Comet
Lake
Comet Lake Cypress
Cove
Cypress
Cove
  Zen 3 Zen 3
Cores 8 C
16 T
10 C
20 T
8 C
16 T
8 C
16 T
  8 C
16 T
12 C
24 T
Base Freq 3800 3700 3600 3500   3800 3700
Turbo Freq 5100 5200 5000 5300   4800 4800
All-Core 4700 4900 4600 4800   ~4550 ~4350
TDP 125 W 125 W 125 W 125 W   105 W 105 W
IGP / EUs Gen 9, 24 Gen 9, 24 Xe-LP, 32 Xe-LP, 32   - -
L3 Cache 16 MB 20 MB 16 MB 16 MB   32 MB 64 MB
DDR4 2 x 2933 2 x 2933 2 x 3200 2 x 3200   2 x 3200 2 x 3200
PCIe 3.0 x16 3.0 x16 4.0 x20 4.0 x20   4.0 x24 4.0 x24
MSRP $387 $499 $399 $539   $449 $549
Retail $322 $470 $419 $614   $449 $549

As we saw in our Core i7-11700K review, at $399/$419, the Ryzen 7 5800X at $449 is actually a good comparison point. On high-end gaming both processor performed the same, the AMD processor was ahead an average of 8% on CPU workloads, and the AMD processor came across as a lot more efficient and easy to cool, while the Intel processor scored a big lead in AVX-512 workloads.  At the time of our review, we noted that stock of AMD’s Ryzen 5000 processors would be a large part of the choice between the two processors, given that stock was low and highly volatile. Since then, as in our latest CPU Guide, stock of the AMD CPUs is coming back to normal, so then it would come down to exact pricing differences.

If we focus on the Core i9-11900K in this comparison, given the small differences between itself and the Core i7, you would also have to pit it against the AMD Ryzen 7 5800X, however at its $539 tray price and $615 Newegg price, it really has to go against the 12-core Ryzen 9 5900X, where it loses out by 50% on cores but has a chance to at least draw level on single thread performance.

Test Setup and #CPUOverload Benchmarks

As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also run at JEDEC subtimings where possible. Reasons are explained here.

Test Setup
Intel
Rocket Lake
Core i9-11900K
Core i7-11700K
Core i5-11600K
ASUS Maximus
XIII Hero
0610/
0703**
TRUE
Copper
+ SST*
ADATA
4x32 GB
DDR4-3200
Intel
Comet Lake
Core i9-10900K
Core i7-10700K
ASRock Z490
PG Velocita
P1.50 TRUE
Copper
+ SST*
ADATA
4x32 GB
DDR4-2933
Intel Coffee
Refresh
Core i9-9900KS
Core i9-9900K
MSI MPG Z390
Gaming Edge AC
AB0 TRUE
Copper
+SST*
ADATA
4x32GB
DDR4-2666
Intel
Coffee Lake
Core i7-8700K MSI MPG Z390
Gaming Edge AC
AB0 TRUE
Copper
+SST*
ADATA
4x32GB
DDR4-2666
AMD
AM4
Ryzen 9 5900X
Ryzen 7 5800X
Ryzen 7 4750G
GIGABYTE X570I
Aorus Pro
F31L Noctua
NHU-12S
SE-AM4
ADATA
2x32 GB
DDR4-3200
GPU Sapphire RX 460 2GB (CPU Tests)
NVIDIA RTX 2080 Ti FE (Gaming Tests)
PSU Corsair AX860i
SSD Crucial MX500 2TB
*TRUE Copper used with Silverstone SST-FHP141-VF 173 CFM fans. Nice and loud.
**0703 was applied for stability support

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Hardware Providers for CPU and Motherboard Reviews
Sapphire
RX 460 Nitro
NVIDIA
RTX 2080 Ti
Crucial SSDs Corsair PSUs

G.Skill DDR4 ADATA DDR4 Silverstone
Coolers
Noctua
Coolers

A big thanks to ADATA for the ​AD4U3200716G22-SGN modules for this review. They're currently the backbone of our AMD testing.

Users interested in the details of our current CPU benchmark suite can refer to our #CPUOverload article which covers the topics of benchmark automation as well as what our suite runs and why. We also benchmark much more data than is shown in a typical review, all of which you can see in our benchmark database. We call it ‘Bench’, and there’s also a link on the top of the website in case you need it for processor comparison in the future.

Table Of Contents

  1. Rocket Lake Product List
  2. Why Rocket Lake Exists: Retrofitting 10nm to 14nm
  3. Motherboards and Overclocking Support
  4. New Turbo Features: Adaptive Boost Technology
  5. Power Consumption and Stability
  6. CPU Microbenchmarks
  7. CPU Testing
  8. Gaming Testing
  9. Conclusion
 


A Rocket Lake Retrofit: 10nm onto 14nm

The new generation Rocket Lake processor family is the combination of two different backported technologies. Intel took the Sunny Cove core from its 10nm Ice Lake processor, and re-built it on 14nm, calling it now Cypress Cove. Intel also took the Xe graphics from 10nm Tiger Lake and re-built those on 14nm, but these are still called Xe graphics, albeit labelled UHD 750.

We can see that the new design is an amalgam of new technologies, by comparing Rocket Lake to Comet Lake, Ice Lake, and Tiger Lake:

Microarchitecture Comparison
AnandTech Comet
Lake
Rocket
Lake
Ice
Lake
Tiger
Lake
Ryzen
5000
Form Factor Desktop Desktop Laptop Laptop Desktop
Max Cores 10 8 4 4 16
TDP 125 W 125 W 28 W 35 W 105 W
uArch Comet Cypress Sunny Willow Zen 3
IGP Gen 9 Xe-LP Gen 11 Xe -
IGP Cores 24 32 64 96 -
L1-D 32 KB /c 48 KB /c 48 KB /c 48 KB/c 32 KB/c
L2 Cache 256 KB /c 512 KB /c 512 KB/c 1280KB /c 512 KB/c
L3 Cache 20 MB 16 MB 8 MB 12 MB 64 MB
PCIe 3.0 x16 4.0 x20 3.0 x8 4.0 x4 4.0 x24
DDR4 2 x 2933 2 x 3200 2 x 3200 2 x 3200 2 x 3200
LPDDR4X - - 4 x 3733 4 x 4266 -

There are obviously some differences between the notebook and desktop parts, most noticeably that the new platform at the high-end has only eight cores, two fewer than Comet Lake.

Additional improvements over Comet Lake include AVX512 units, support for 20 PCIe 4.0 lanes, and faster memory. With the new chipsets, Intel has already disclosed that the Rocket Lake platform will have native USB 3.2 Gen 2x2 (20 Gbps), and with the Z590 motherboards, a double bandwidth link from CPU to the chipset, moving from DMI x4 to DMI x8, effectively a PCIe 3.0 x8 link.

Rocket Lake on 14nm: The Best of a Bad Situation

The delays around the viability of Intel’s 10nm manufacturing have been well documented. To date, the company has launched several products on its 10nm process for notebooks, such as Cannon Lake, Ice Lake, Jasper Lake, Elkhart Lake, and Tiger Lake. There have been other non-consumer products, such as Agilex FPGAs and Snow Ridge 5G SoCs, and Intel has confirmed that its 10nm server products ‘Ice Lake Xeon Scalable’, are currently in volume production for an early Q2 launch on April 6th.

The one product line missing from that list is the desktop and enthusiast segments that typically use socketed processors paired with discrete graphics. Intel has always committed to launching desktop processors on its 10nm process, however we are yet to see the results of their efforts. The issues Intel is having with 10nm have not been fully disclosed at this time, with Intel instead happy to promote some of the improvements made, such as its new SuperFin technology, which is in Tiger Lake and the next-generation server platform beyond Ice Lake Xeon Scalable (for those keeping track, that would be Sapphire Rapids). The 10nm improvements so far has enabled Intel to launch notebook processors and server processors, both of which have lower power-per-core than a typical desktop offering.

As 10nm has not been able to meet the standards required for desktop-level performance, rather than leave a potential 3 year gap in the desktop product family, Intel has been in a holding pattern releasing slightly upgraded versions of Skylake on slightly improved variants of 14nm. The first two members of the Skylake family, Skylake and Kaby Lake were released as expected. While waiting, we saw Intel release Coffee Lake, Coffee Lake Refresh, and Comet Lake. Each of these afforded minor updates in frequency, or core count, or power, but very little in the way of fundamental microarchitectural improvement. The goal all along was to move to 10nm with the same architecture as the mobile Ice Lake processors, but that wasn’t feasible due to manufacturing limitations limiting how well the processors scaled to desktop level power.

  • Skylake, Core 6th Gen in August 2015
  • Kaby Lake, Core 7th Gen in January 2017 (+17mo)
  • Coffee Lake, Core 8th Gen in October 2017 (+9mo)
  • Coffee Lake Refresh, Core 9th Gen in October 2018 (+12mo)
  • Comet Lake, Core 10th Gen in April 2020 (+18mo)
  • Rocket Lake, Core 11th Gen in March 2021 (+11mo)

With each generation, Intel traditionally has either upgraded the process node technology, or updated the microarchitecture – a process that Intel called Tick-Tock. Originally Intel was set to perform a normal ‘Tick’ after Kaby Lake, and have Cannon Lake with the same effective Skylake microarchitecture move to 10nm. Cannon Lake ending up only as a laptop processor with no working graphics in a small number of notebooks in China as it was a hot mess (as shown in our review). As a result, Intel refocused its 10nm for notebook processors hoping that advances would also be applicable to desktop, but the company had to release minor upgrades on desktop from Coffee Lake onwards to keep the product line going.

This meant that at some level Intel knew that it would have to combine both a new architecture and a new process node jump into one product cycle. At some point however, Intel realized that the intercept point with having a new microarchitecture and the jump for the desktop to 10nm was very blurry, and somewhat intangible, and at a time when its main competitor was starting to make noise about a new product that could reach parity in single core performance. In order to keep these important product lines going, drastic measures would have to be taken.

After many meetings with many biscuits, we presume, the decision was made that Intel would take the core microarchitecture design from 10nm Ice Lake, which couldn’t reach high enough frequencies under desktop power, and repackage that design for the more dependable 14nm node which could reach the required absolute performance numbers. This is known as a ‘backport’.

Sunny Cove becomes Cypress Cove

 

The new Core 11th Gen processor which we are looking at today has the codename Rocket Lake. That’s the name for the whole processor, which consists of cores, graphics, interconnect, and other different accelerators and IP blocks, each of which also have their own codenames, just for the sake of making it easier for the engineers to understand what parts are in use. We use these codenames a lot, and the one to focus on here is the CPU core.

Intel’s 10nm Ice Lake notebook processor family uses Sunny Cove cores in the design. It is these cores that have been backported to 14nm for use in the Rocket Lake processors, and because it is on a different process node and there are some minor design changes, Intel calls them Cypress Cove cores.

The reason behind this is because taking a design for one manufacturing process and designing it for a second is no easy task, especially if it’s a regressive step – transistors are bigger, which means logic blocks are bigger, and all the work done with respect to signaling and data paths in the silicon has to be redone. Even with a rework, signal integrity needs to be upgraded for longer distances, or additional path delays and buffers need to be implemented. Any which way you cut it, a 10nm core is bigger when designed for 14nm, consumes more power, and has the potential to be fundamentally slower at execution level.

Intel’s official disclosures to date on the new Cypress Cove cores and Rocket Lake stem from a general briefing back in October, as well as a more product oriented announcement at CES in January. Intel is promoting that the new Cypress Cove core offers ‘up to a +19%’ instruction per clock (IPC) generational improvement over the cores used in Comet Lake, which are higher frequency variants of Skylake from 2015. However, the underlying microarchitecture is promoted as being identical to Ice Lake for mobile processors, such as caches and execution, and overall the new Rocket Lake SoC has a number of other generational improvements new to Intel’s desktop processors.

Eight Cores, Not Ten?

Enabling core designs through this backporting process is more complex than simply photocopying the design into the larger format. With every process node improvement, different density scalers and features are used in that process node that might not be available elsewhere. Undoubtedly the original 10nm Sunny Cove design had these in mind, and so having to re-architect the same floorplan with 14nm requires a lot of extra work. This adds transistors and buffers and ways to manage voltage differences and signal integrity in itself, increasing die size.

Note that Intel has in the past said that its 10nm process node offers a 2.7x transistor density increase moving from 14nm to 10nm. Naturally doing the reverse with a design, going from 10nm to 14nm, hasn’t made the core suddenly 270% bigger, namely because those numbers often refer to the densest transistors, and a high-performance microprocessor core often uses less dense transistors in logic to enable high-frequency with enough inactive silicon (dark silicon) to assist with power and thermals. We are still waiting on official numbers for core sizes, so it will be an interesting comparison between Sunny Cove and Cypress Cove.

Nonetheless, there is a core size increase, and this has to be factored into what silicon is produced. Designing a mass-production silicon layout requires balancing overall die size with expected yields, expected retail costs, required profit margins, and final product performance. Intel could easily make a 20+ core processor with these Cypress Cove cores, however the die size would be too large to be economical, and perhaps the power consumption when all the cores are loaded would necessitate a severe reduction in frequency to keep the power under control. To that end, Intel finalised its design on eight cores.

For die sizes, even with enabling only eight cores, the new Rocket Lake design is substantially bigger than the 10-core variant on Comet Lake.

Intel Consumer Die Size Comparison
All on Intel 14nm
AnandTech uArch Cores Die Size
Core i7-8700K Coffee Lake 6 C 9.2 x 16.7 mm 153.6 mm2
Core i9-9900K CFL Refresh 8 C 9.2 x 19.6 mm 180.3 mm2
Core i9-10900K Comet Lake 10 C 9.2 x 22.4 mm 206.1 mm2
Core i9-11900K Rocket Lake 8 C 11.5 x 24.0 mm 276.0 mm2
HEDT for Comparison
Core i9-7900X Skylake-X 10 C 14.6 x 22.3 mm 325.4 mm2
Core i7-6950X Broadwell-E 10 C   246.3 mm2

So it's worth noting that Intel's new 8 core Rocket Lake processor is actually bigger than the 10 core Broadwell-E processor from 2016. One major difference between those two however is AVX-512, which does have a slight die-size increase. Nonetheless, Intel is approaching its HEDT platform die size with Rocket Lake, but can't sell them for as much as the HEDT has historically sold for. The Core i7-6950X sold for $1723, while the Core i7-7900X was $999. Intel's bulk of interest with this silicon is going to be the Core i7-11700K, which is a $420 processor.

Backport vs Co-Design

One of the critical elements to Rocket Lake is what it means for Intel going forward. With this project, Intel has taken a core designed for 10 nm and recreated the performance on 14 nm, with additional implications for power and efficiency. Intel has stated that in the future it will have cores designed for multiple process nodes at the same time, and so given Rocket Lake’s efficiency at the high frequencies, doesn’t this mean the experiment has failed?

I say no, because it teaches Intel a lot in how it designs its silicon. The issue with Rocket Lake is that the core was originally designed for 10 nm, and that won’t necessarily happen again.

Future cores from Intel are going to be designed, from the ground-up, for multiple process node technologies. Given Intel’s announcements about developing cores on external Intel manufacturing facilities, as well as licensing out its core designs, this means Intel might have to design a core that works at both Intel and TSMC. The point is that if Intel is going to do this, it will design for both from the start. The core will have been built taking into account the different elements of the process nodes in advance, and likely cater for the intricacies of both.

Rocket Lake by contrast, was an ‘after the fact’ redesign, with all of its special features built for 10 nm and then retrofit to 14 nm. Rocket Lake shows it can be done, but the way Intel went about this is unlikely to happen in the future. All future cores that require multiple process nodes, even across multiple foundry partners, are going to be co-designed from day one.

Ultimately, the future of how and when Intel will initiate additional co-design, even given suggested roadmaps, is likely to be in flux based on Intel’s own ability to produce high single-core frequency desktop processors. Cypress Cove, by most measures, is a reflex response to a widening gap in Intel’s desktop roadmap, and takes a core specifically designed for a different process. Intel is/has/has likely learned a lot from this process, but in the future we can expect specific cores to be co-designed with both process nodes in mind. This is akin to Intel’s new stance on ‘enabling the right product on the right node at the right time’. A co-designed approach, rather than a post-production realisation a backport is required, will mean that future core designs that straddle two process nodes are likely to be more similar and optimized on both processes at the same time.



Motherboards

All of these new processors are LGA1200 processors, and a result they will be enabled in 500-series motherboards. There is also some 400-series support, however it depends on the platform. Here’s the trusty AnandTech Guide for support:

Motherboard Support
AnandTech B460
H410
Z490
Q470
H470
Z590
B560
H510
Comet Lake Yes Yes Yes
Rocket Lake No Yes Yes

The reason why Rocket Lake will not work in H410 or B460 motherboards is because these chipsets are built on Intel’s older 22nm process. There is something in the design of those chipsets, likely to be related to signal integrity, which means they cannot be supported, at least at the PCIe 3.0 speeds required. Given previous motherboard firmware, we might see unofficial support later down the line, even if only in PCIe 2.0 mode.

However, the lead platform for Rocket Lake will be the Z590 platform. The new features boil down to:

  • Double CPU-to-Chipset bandwidth when paired with 11th Gen RKL
  • USB 3.2 Gen 2x2 (20 Gbps) native chipset connectivity

The new H570 and B560 motherboards reintroduce memory overclocking, a feature that was removed from the 400-series budget motherboards.

Here is the slide Intel provided for 500-series, though it is worth mentioning some of the caveats:

In this slide, it states that discrete Wi-Fi, 2.5 gigabit Ethernet, and Thunderbolt 4 are supported on 500 series. These are optional upgrades for the motherboard vendors, so not all motherboards will have them, and in each case they also require additional hardware costs for the motherboard manufacturer, such as an RF module for Wi-Fi, a PHY for Ethernet, or a PHY for Thunderbolt. These could all be added to any other motherboard, AMD or Intel, with discrete controllers which are slightly more expensive – those controllers don’t have to be Intel either. But to be clear, they are not unique to offerings to Z590, nor are they natively offered by default on all systems.

All of the 10th Generation Comet Lake processors will work in all 500-series motherboards, and get all the features, except the double CPU-to-Chipset bandwidth, as that specifically requires Z590 + 11th Gen Core CPU.

It should be noted that anyone already with a 400-series or 500-series motherboard, or those looking to purchase one, will need a BIOS update in order to enable the latest performance enhancements. In our testing, we found that the BIOS on our boards when they arrived in our offices were quite old (from January), and the latest microcode from Intel should help increase performance and cache latency. Some may be updated to February microcode, which does get most of the way there to peak performance, but the latest should always give the best results.

Intel Z590 and B560

The two main chipsets to focus on for Rocket Lake are the Z590 and B560 motherboards. The Z590 start at an eye-watering $175 and go up to over $1000, whereas the B560 are more palatable starting from $75 up to around $220.

Where the B560 and Z590 differ is in some of the PCIe bifurcation (x16 only on B560), the number of USB ports, and the chipset number of PCIe 3.0 lanes available for M.2 or additional controllers.

Intel 500-Series Chipset
Feature B560 Z590 Z490
Socket LGA1200 LGA1200 LGA1200
PCIe Lanes (CPU) 20 20 16
PCIe Specification (CPU) 4.0 4.0 3.0*
PCIe Config x16
x16/+4
x16
x8/x8
x8/x8/x4+4
x16
x8/x8
x8/x8/+4
DMI Lanes (3.0) x4 x8 (RKL)
x4 (CML)
x4
Chipset PCIe 3.0 Lanes 12 24 24
Max USB 3.2 (Gen2/Gen1) 4/6 6/10 6/10
USB 3.2 Gen 2x2 (20 Gbps) Y Y ASMedia
Total USB 12 14 14
Max SATA Ports 6 6 6
Memory Channels (Dual) 2/2 2/2 2/2
Intel Optane Memory Support Y Y Y
Intel Rapid Storage Technology (RST) Y Y Y
Integrated WiFi MAC Wi-Fi 6 Wi-Fi 6 Wi-Fi 6
Intel Smart Sound Y Y Y
Overclocking Support *Memory Y Y
Intel vPro N N N
Max HSIO Lanes ? ? 30
ME Firmware 15 15 14
TDP (W) 6 6 6

We’ve gone through all 90+ motherboards from both chipsets, and collated them into two large overviews:

These are all the details on all the motherboards we’ve been able to identify as coming to market. Note that not all will be available in every region, with some being OEM/customer specific and might only be available on the OEM market.

By and large, we have observed several key metrics worth discussing with the new motherboards.

First is the large uptake of 2.5 gigabit Ethernet. It has taken literal years since the first consumer 2.5 GbE solutions came to market with Aquantia, and they were limited to select motherboards at a premium price point. Now we are seeing Intel and Realtek-based 2.5 GbE controllers make their way down to something more affordable. More and more NAS and routers are coming with one or more 2.5 GbE ports as standard, and as more systems get enabled with higher speed for wired connectivity, we should see the market open up a lot more. It won’t improve your internet speed, but it might improve home streaming with the right network configuration.

The other element these boards bring is USB 3.2 Gen 2x2 (20 Gbps). This is the double speed ‘USB 3.2’ standard that was renamed, and now we get this feature native on 500-series chipsets. It was only previously possible with additional ASMedia controllers, but now Intel motherboards can have them, but only if the motherboard vendor enables it. We’re seeing mostly front-panel connections adhere to this standard, but a few motherboards have it available as a Type-C connection on the rear panel.

Also of note is that the B560 motherboards are now enabling memory overclocking again, which was removed in B460. Any 10th Gen or 11th Gen processor in a B560 can have overclocked memory. CPU overclocking is still limited to the Z-series motherboards.

Overclocking Enhancements For Memory: Ratios

On the Overclocking Enhancement side of things, this is perhaps where it gets a bit nuanced. For a while now Intel has been binning its K processors to within an inch of their maximum supported frequencies, and turbo boost techniques like favored core and Thermal Velocity Boost also push the margins on the cores that support it. So what can Intel focus on for overclocking this time round?

With Rocket Lake, Intel is leaning into the memory side of things. These new Rocket Lake processors now support geared ratios between the memory controller and the DRAM data rates. Users can either select a 1:1 ratio or a 2:1 ratio.

Traditionally Intel has natively operated on a 1:1 ratio without ever giving users the option. This meant that in order to push that DDR4-5000 memory, like we did in our review of that premium Corsair kit, it required a processor with a good memory controller that could also support a 5.0 GT/s connection.

With the 2:1 ratio, the memory controller will now operate at half speed, in a more comfortable zone, allowing memory overclockers to go beyond traditional limits. With that DDR4-5000 memory, it means that the memory controller is now only operating at 2.5 GT/s (1250 MHz because DDR4 is measured in transfers per second, and there are two transfers per clock in Double Data Rate DDR memory). This also means that in order to match the internal clocks on DDR4-3200, users will have to start pushing the memory itself to DDR4-6400 to get the memory controller back on a level footing when in that 2:1 ratio. Nevertheless, this feature does allow the memory to be tested to its limits without the bottleneck of the CPU.

By default, all Rocket Lake processors will support DDR4-2933 at a 1:1 ratio in the specifications. Above this will mean a 2:1 ratio, except for the Core i9 family, which allows for a DDR4-3200 1:1 ratio. Despite these specifications, every motherboard we tested puts DDR4-3200 on a 1:1 ratio for all CPUs, so the delineation between the Core i9 and the rest seems arbitrary.

Overclocking Enhancements For Memory: Dual POST

Users that have tried memory overclocking will note that in order to change the memory ratio, it requires a restart. With the new Z590 system, Intel has devised a system that will let a user select two different memory ratios, and it will enable both at boot time.

Under this mode, once in the operating system, a user can switch between them for different benchmarking modes. What this does is allow extreme overclockers, particularly those going for world records with sub-zero coolants, to boot at the lower memory speed, then run the test at a higher memory speed, then instantly revert back to the slow speed. Currently they have to run at the high speed all the time, which can be liable to instability. It’s more one for the extreme overclockers, but Intel has added it here.

Other Overclocking Enhancements

Other new features in the overclocking toolkit include AVX-512 offsets and voltage guard bands, enabling users to overclock the processors without overclocking AVX-512 and incurring a heavy power consumption penalty. Intel has also put in an option to disable AVX altogether, which means that users who don’t want to worry about AVX-512 draining almost 300 W from an errantly loaded program, it can be disabled directly in the firmware.

Intel is also continuing support for a number of overclock-related features, such as per-core HyperThreading, per-core frequency adjustment, and fine-grained PLL controls. Intel has stated that with Rocket Lake, it has opened up some of the features to enable proper BCLK overclocking again, however we wait to see if there is a good range for overclockers to play with.

All these new features are enabled when a 500-series motherboard and a new Rocket Lake 11th Generation Core processor. Support with Comet Lake will be limited.



Intel’s New Adaptive Boost Technology for Core i9-K/KF

Taken from our news item

To say that Intel’s turbo levels are complicated to understand is somewhat of an understatement. Trying to teach the difference between the turbo levels to those new to measuring processor performance is an art form in of itself. But here’s our handy guide, taken from our article on the subject.

Adaptive Boost Technology is now the fifth frequency metric Intel uses on its high-end enthusiast grade processors, and another element in Intel’s ever complex ‘Turbo’ family of features. Here’s the list, in case we forget one:

Intel Frequency Levels
Base Frequency - The frequency at which the processor is guaranteed to run under warranty conditions with a power consumption no higher than the TDP rating of the processor.
Turbo Boost 2.0 TB2 When in a turbo mode, this is the defined frequency the cores will run at. TB2 varies with how many cores are being used.
Turbo Boost Max 3.0 TBM3
'Favored Core'
When in a turbo mode, for the best cores on the processor (usually one or two), these will get extra frequency when they are the only cores in use.
Thermally Velocity Boost TVB When in a turbo mode, if the peak thermal temperature detected on the processor is below a given value (70ºC on desktops), then the whole processor will get a frequency boost of +100 MHz. This follows the TB2 frequency tables depending on core loading.
Adaptive Boost Technology ABT
'floating turbo'
When in a turbo mode, if 3 or more cores are active, the processor will attempt to provide the best frequency within the power budget, regardless of the TB2 frequency table. The limit of this frequency is given by TB2 in 2-core mode. ABT overrides TVB when 3 or more cores are active.
*Turbo mode is limited by the turbo power level (PL2) and timing (Tau) of the system. Intel offers recommended guidelines for this, but those guidelines can be overridden (and are routinely ignored) by motherboard manufacturers. Most gaming motherboards will implement an effective ‘infinite’ turbo mode. In this mode, the peak power observed will be the PL2 value. It is worth noting that the 70ºC requirement for TVB is also often ignored, and TVB will be applied whatever the temperature.

Intel provided a slide trying to describe the new ABT, however the diagram is a bit of a mess and doesn’t explain it that well. Here’s the handy AnandTech version.

First up is the Core i7-11700K that AnandTech has already reviewed. This processor has TB2, TBM3, but not TVB or ABT.

The official specifications show that when one to four cores are loaded, when in turbo mode, it will boost to 4.9 GHz. If it is under two cores, the OS will shift the threads onto the favored cores and Turbo Boost Max 3.0 will kick in for 5.0 GHz. More than four core loading will be distributed as above.

On the Core i9-11900, the non-overclocking version, we also get Thermal Velocity Boost which adds another +100 MHz onto every core max turbo, but only if the processor is below 70ºC.

We can see here that the first two cores get both TBM3 (favored core) as well as TVB, which makes those two cores give a bigger jump. In this case, if all eight cores are loaded, the turbo is 4.6 GHz, unless the CPU is under 70ºC, then we get an all-core turbo of 4.7 GHz.

Now move up to the Core i9-11900K or Core i9-11900KF, which are the only two processors with the new floating turbo / Adaptive Boost Technology. Everything beyond two cores changes and TVB no longer applies.

Here we see what looks like a 5.1 GHz all-core turbo, from three cores to eight cores loaded. This is +300 MHz above TVB when all eight cores are loaded. But the reason why I’m calling this a floating turbo is because it is opportunistic.

What this means is that, if all 8 cores are loaded, TB2 means that it will run at 4.7 GHz. If there is power budget and thermal budget, it will attempt 4.8 GHz. If there is more power budget and thermal budget available, it will go to 4.9 GHz, then 5.0 GHz, then 5.1 GHz. The frequency will float as long as it has enough of those budgets to play with, and it will increase/decrease as necessary. This is important as different instructions cause different amounts of power draw and such.

If this sounds familiar, you are not wrong. AMD does the same thing, and they call it Precision Boost 2, and it was introduced in April 2018 with Zen+. AMD applies its floating turbo to all of its processors – Intel is currently limiting floating turbo to only the Core i9-K and Core i9-KF in Core 11th Gen Rocket Lake.

One of the things that we noticed with AMD however is that this floating turbo does increase power draw, especially with AVX/AVX2 workloads. Intel is likely going to see similar increases in power draw. What might be a small saving grace here is that Intel’s frequency jumps are still limited to full 100 MHz steps, whereas AMD can do it on the 25 MHz boundary. This means that Intel has to manage larger steps, and will likely only cross that boundary if it knows it can be maintained for a fixed amount of time. It will be interesting to see if Intel gives the user the ability to change those entry/exit points for Adaptive Boost Technology.

There will be some users who are already familiar with Multi-Core Enhancement / Multi-Core Turbo. This is a feature from some motherboard vendors have, and often enable at default, which lets a processor reach an all-core turbo equal to the single core turbo. That is somewhat similar to ABT, but that was more of a fixed frequency, whereas ABT is a floating turbo design. That being said, some motherboard vendors might still have Multi-Core Enhancement as part of their design anyway, bypassing ABT.

Overall, it’s a performance plus. It makes sense for the users that can also manage the thermals. AMD caught a wind with the feature when it moved to TSMC’s 7nm. I have a feeling that Intel will have to shift to a new manufacturing node to get the best out of ABT, and then we might see the feature on the more mainstream CPUs, as well as becoming default as standard.



 Power Consumption: AVX-512 Caution

I won’t rehash the full ongoing issue with how companies report power vs TDP in this review – we’ve covered it a number of times before, but in a quick sentence, Intel uses one published value for sustained performance, and an unpublished ‘recommended’ value for turbo performance, the latter of which is routinely ignored by motherboard manufacturers. Most high-end consumer motherboards ignore the sustained value, often 125 W, and allow the CPU to consume as much as it needs with the real limits being the full power consumption at full turbo, the thermals, or the power delivery limitations.

One of the dimensions of this we don’t often talk about is that the power consumption of a processor is always dependent on the actual instructions running through the core.  A core can be ‘100%’ active while sitting around waiting for data from memory or doing simple addition, however a core has multiple ways to run instructions in parallel, with the most complex instructions consuming the most power. This was noticeable in the desktop consumer space when Intel introduced vector extensions, AVX, to its processor design. The concurrent introduction of AVX2, and AVX512, means that running these instructions draws the most power.

AVX-512 comes with its own discussion, because even going into an ‘AVX-512’ mode causes additional issues. Intel’s introduction of AVX-512 on its server processors showcased that in order to remain stable, the core had to reduce the frequency and increase the voltage while also pausing the core to enter the special AVX-512 power mode. This made the advantage of AVX-512 suitably only for strong high-performance server code. But now Intel has enabled AVX-512 across its product line, from notebook to enterprise, with the running AI code faster, and enabling a new use cases. We’re also a couple of generations on from then, and AVX-512 doesn’t get quite the same hit as it did, but it still requires a lot of power.

For our power benchmarks, we’ve taken several tests that represent a real-world compute workload, a strong AVX2 workload, and a strong AVX512 workload.

Starting with the Agisoft power consumption, we’ve truncated it to the first 1200 seconds as after that the graph looks messy. Here we see the following power ratings in the first stage and second stage:

  • Intel Core i9-11900K (1912 sec): 164 W dropping to 135 W
  • Intel Core i7-11700K (1989 sec): 149 W dropping to 121 W
  • Intel Core i5-11600K (2292 sec): 109 W dropping to 96 W
  • AMD Ryzen 7 5800X (1890 sec): 121 W dropping to 96 W

So in this case, the heavy second section of the benchmark, the AMD processor is the lowest power, and quickest to finish. In the more lightly threaded first section, AMD is still saving 25% of the power compared to the big Core i9.

One of the big takeaways from our initial Core i7-11700K review was the power consumption under AVX-512 modes, as well as the high temperatures. Even with the latest microcode updates, both of our Core i9 parts draw lots of power.

The Core i9-11900K in our test peaks up to 296 W, showing temperatures of 104ºC, before coming back down to ~230 W and dropping to 4.5 GHz. The Core i7-11700K is still showing 278 W in our ASUS board, tempeartures of 103ºC, and after the initial spike we see 4.4 GHz at the same ~230 W.

The Core i5-11600K, with fewer cores, gets a respite here. Our peak power numbers are around the 206 W range, with the workload not doing an initial spike and staying around 4.6 GHz. Peak temperatures were at the 82ºC mark, which is very manageable. During AVX2, the i5-11600K was only at 150 W.

Moving to another real world workload, here’s what the power consumption looks like over time for Handbrake 1.3.2 converting a H.264 1080p60 file into a HEVC 4K60 file.

This is showing the full test, and we can see that the higher performance Intel processors do get the job done quicker. However, the AMD Ryzen 7 processor is still the lowest power of them all, and finishes the quickest. By our estimates, the AMD processor is twice as efficient as the Core i9 in this test.

Thermal Hotspots

Given that Rocket Lake seems to peak at 104ºC, and here’s where we get into a discussion about thermal hotspots.

There are a number of ways to report CPU temperature. We can either take the instantaneous value of a singular spot of the silicon while it’s currently going through a high-current density event, like compute, or we can consider the CPU as a whole with all of its thermal sensors. While the overall CPU might accept operating temperatures of 105ºC, individual elements of the core might actually reach 125ºC instantaneously. So what is the correct value, and what is safe?

The cooler we’re using on this test is arguably the best air cooling on the market – a 1.8 kilogram full copper ThermalRight Ultra Extreme, paired with a 170 CFM high static pressure fan from Silverstone. This cooler has been used for Intel’s 10-core and 18-core high-end desktop variants over the years, even the ones with AVX-512, and not skipped a beat. Because we’re seeing 104ºC here, are we failing in some way?

Another issue we’re coming across with new processor technology is the ability to effectively cool a processor. I’m not talking about cooling the processor as a whole, but more for those hot spots of intense current density. We are going to get to a point where can’t remove the thermal energy fast enough, or with this design, we might be there already.

I will point out an interesting fact down this line of thinking though, which might go un-noticed by the rest of the press – Intel has reduced the total vertical height of the new Rocket Lake processors.

The z-height, or total vertical height, of the previous Comet Lake generation was 4.48-4.54 mm. This number was taken from a range of 7 CPUs I had to hand. However, this Rocket Lake processor is over 0.1 mm smaller, at 4.36 mm. The smaller height of the package plus heatspreader could be a small indicator to the required thermal performance, especially if the airgap (filled with solder) between the die and the heatspreader is smaller. If it aids cooling and doesn’t disturb how coolers fit, then great, however at some point in the future we might have to consider different, better, or more efficient ways to remove these thermal hotspots.

Peak Power Comparison

For completeness, here is our peak power consumption graph.

(0-0) Peak Power

Platform Stability: Not Complete

It is worth noting that in our testing we had some issues with platform stability with our Core i9 processor. Personally, across two boards and several BIOS revisions, I would experience BSODs in high memory use cases. Gavin, our motherboard editor, was seeing lockups during game tests with his Core i9 on one motherboard, but it worked perfectly with a second. We’ve heard about issues of other press seeing lockups, with one person going through three motherboards to find stability. Conversations with an OEM showcased they had a number of instability issues running at default settings with their Core i9 processors.

The exact nature of these issues is unknown. One of my systems refused to post with 4x32 GB of memory, only with 2x32 GB of memory. Some of our peers that we’ve spoken to have had zero problems with any of their systems. For us, our Core i7 and Core i5 were absolutely fine. I have a second Core i9 processor here which is going through stability tests as this review goes live, and it seems to be working so far, which might point that it is a silicon/BIOS issue, not a memory issue.

Edit: As I was writing this, the second Core i9 crashed and restarted to desktop.

We spoke to Intel about the problem, and they acknowledged our information, stating:

We are aware of these reports and actively trying to reproduce these issues for further debugging.

Some motherboard vendors are only today putting out updated BIOSes for Intel’s new turbo technology, indicating that (as with most launches) there’s a variety of capability out there. Seeing some of the comments from other press in their reviews today, we’re sure this isn’t an isolated incident; however we do expect this issue to be solved.



CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

All three CPUs exhibit the same behaviour - one core seems to be given high priority, while the rest are not.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

From an idle frequency of 800 MHz, It takes ~16 ms for Intel to boost to the top frequency for both the i9 and the i5. The i7 was most of the way there, but took an addition 10 ms or so. 



CPU Tests: Office and Science

Our previous set of ‘office’ benchmarks have often been a mix of science and synthetics, so this time we wanted to keep our office section purely on real world performance.

Agisoft Photoscan 1.3.3: link

The concept of Photoscan is about translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the final 3D model in both spatial accuracy and texturing accuracy. The algorithm has four stages, with some parts of the stages being single-threaded and others multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.

For the update to version 1.3.3, the Agisoft software now supports command line operation. Agisoft provided us with a set of new images for this version of the test, and a python script to run it. We’ve modified the script slightly by changing some quality settings for the sake of the benchmark suite length, as well as adjusting how the final timing data is recorded. The python script dumps the results file in the format of our choosing. For our test we obtain the time for each stage of the benchmark, as well as the overall time.

(1-1) Agisoft Photoscan 1.3, Complex Test

For a variable threaded load, the i9-10900K sits above the Rocket Lake parts.

RISC-V Toolchain Compile

Our latest test in our suite is the RISCV Toolchain compile from the Github source. This set of tools enables users to build software for a RISCV platform, however the tools themselves have to be built. For our test, we're running a complete fresh build of the toolchain, including from-scratch linking. This makes the test not a straightforward test of an updated compile on its own, but does form the basis of an ab initio analysis of system performance given its range of single-thread and multi-threaded workload sections. More details can be found here.

(1-4) Compile RISCV Toolchain

One place where Intel is winning in absolute terms in our compile-from-scratch test. We re-ran the numbers on Intel with the latest microcode due to a critical issue, but we can see here that AMD's best are single chiplet designs but Intel ekes out a small lead.

 

Science

In this version of our test suite, all the science focused tests that aren’t ‘simulation’ work are now in our science section. This includes Brownian Motion, calculating digits of Pi, molecular dynamics, and for the first time, we’re trialing an artificial intelligence benchmark, both inference and training, that works under Windows using python and TensorFlow.  Where possible these benchmarks have been optimized with the latest in vector instructions, except for the AI test – we were told that while it uses Intel’s Math Kernel Libraries, they’re optimized more for Linux than for Windows, and so it gives an interesting result when unoptimized software is used.

3D Particle Movement v2.1: Non-AVX and AVX2/AVX512

This is the latest version of this benchmark designed to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. This involves randomly moving particles in a 3D space using a set of algorithms that define random movement. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in.

The initial version of v2.1 is a custom C++ binary of my own code, and flags are in place to allow for multiple loops of the code with a custom benchmark length. By default this version runs six times and outputs the average score to the console, which we capture with a redirection operator that writes to file.

For v2.1, we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere. According to Jim Keller, there are only a couple dozen or so people who understand how to extract the best performance out of a CPU, and this guy is one of them. To keep things honest, AMD also has a copy of the code, but has not proposed any changes.

The 3DPM test is set to output millions of movements per second, rather than time to complete a fixed number of movements.

(2-1) 3D Particle Movement v2.1 (non-AVX)(2-2) 3D Particle Movement v2.1 (Peak AVX)

When AVX-512 comes to play, every-one else goes home. Easiest and clearest win for Intel.

y-Cruncher 0.78.9506: www.numberworld.org/y-cruncher

If you ask anyone what sort of computer holds the world record for calculating the most digits of pi, I can guarantee that a good portion of those answers might point to some colossus super computer built into a mountain by a super-villain. Fortunately nothing could be further from the truth – the computer with the record is a quad socket Ivy Bridge server with 300 TB of storage. The software that was run to get that was y-cruncher.

Built by Alex Yee over the last part of a decade and some more, y-Cruncher is the software of choice for calculating billions and trillions of digits of the most popular mathematical constants. The software has held the world record for Pi since August 2010, and has broken the record a total of 7 times since. It also holds records for e, the Golden Ratio, and others. According to Alex, the program runs around 500,000 lines of code, and he has multiple binaries each optimized for different families of processors, such as Zen, Ice Lake, Sky Lake, all the way back to Nehalem, using the latest SSE/AVX2/AVX512 instructions where they fit in, and then further optimized for how each core is built.

For our purposes, we’re calculating Pi, as it is more compute bound than memory bound. In ST and MT mode we calculate 250 million digits.

(2-3) yCruncher 0.78.9506 ST (250m Pi)(2-4) yCruncher 0.78.9506 MT (2.5b Pi)

In ST mode, we are more dominated by the AVX-512 instructions, whereas in MT it becomes a mix of memory as well.

NAMD 2.13 (ApoA1): Molecular Dynamics

One of the popular science fields is modeling the dynamics of proteins. By looking at how the energy of active sites within a large protein structure over time, scientists behind the research can calculate required activation energies for potential interactions. This becomes very important in drug discovery. Molecular dynamics also plays a large role in protein folding, and in understanding what happens when proteins misfold, and what can be done to prevent it. Two of the most popular molecular dynamics packages in use today are NAMD and GROMACS.

NAMD, or Nanoscale Molecular Dynamics, has already been used in extensive Coronavirus research on the Frontier supercomputer. Typical simulations using the package are measured in how many nanoseconds per day can be calculated with the given hardware, and the ApoA1 protein (92,224 atoms) has been the standard model for molecular dynamics simulation.

Luckily the compute can home in on a typical ‘nanoseconds-per-day’ rate after only 60 seconds of simulation, however we stretch that out to 10 minutes to take a more sustained value, as by that time most turbo limits should be surpassed. The simulation itself works with 2 femtosecond timesteps. We use version 2.13 as this was the recommended version at the time of integrating this benchmark into our suite. The latest nightly builds we’re aware have started to enable support for AVX-512, however due to consistency in our benchmark suite, we are retaining with 2.13. Other software that we test with has AVX-512 acceleration.

(2-5) NAMD ApoA1 Simulation

The Intel parts shows some improvement over the previous generations of Intel, however the 10-core Comet Lake still wins ahead of Rocket Lake.

AI Benchmark 0.1.2 using TensorFlow: Link

Finding an appropriate artificial intelligence benchmark for Windows has been a holy grail of mine for quite a while. The problem is that AI is such a fast moving, fast paced word that whatever I compute this quarter will no longer be relevant in the next, and one of the key metrics in this benchmarking suite is being able to keep data over a long period of time. We’ve had AI benchmarks on smartphones for a while, given that smartphones are a better target for AI workloads, but it also makes some sense that everything on PC is geared towards Linux as well.

Thankfully however, the good folks over at ETH Zurich in Switzerland have converted their smartphone AI benchmark into something that’s useable in Windows. It uses TensorFlow, and for our benchmark purposes we’ve locked our testing down to TensorFlow 2.10, AI Benchmark 0.1.2, while using Python 3.7.6.

The benchmark runs through 19 different networks including MobileNet-V2, ResNet-V2, VGG-19 Super-Res, NVIDIA-SPADE, PSPNet, DeepLab, Pixel-RNN, and GNMT-Translation. All the tests probe both the inference and the training at various input sizes and batch sizes, except the translation that only does inference. It measures the time taken to do a given amount of work, and spits out a value at the end.

There is one big caveat for all of this, however. Speaking with the folks over at ETH, they use Intel’s Math Kernel Libraries (MKL) for Windows, and they’re seeing some incredible drawbacks. I was told that MKL for Windows doesn’t play well with multiple threads, and as a result any Windows results are going to perform a lot worse than Linux results. On top of that, after a given number of threads (~16), MKL kind of gives up and performance drops of quite substantially.

So why test it at all? Firstly, because we need an AI benchmark, and a bad one is still better than not having one at all. Secondly, if MKL on Windows is the problem, then by publicizing the test, it might just put a boot somewhere for MKL to get fixed. To that end, we’ll stay with the benchmark as long as it remains feasible.

(2-6) AI Benchmark 0.1.2 Total

Every generation of Intel seems to regress with AI Benchmark, most likely due to MKL issues. I have previously identified the issue for Intel, however I have not heard of any progress to date.



CPU Tests: Simulation

Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.

DigiCortex v1.35: link

DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.

The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.

The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.

For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.

(3-1) DigiCortex 1.35 (32k Neuron, 1.8B Synapse)

AMD's single chiplet design seems to get a big win here, but DigiCortex can use AVX-512 so RKL gets a healthy boost over the previous generation.

Dwarf Fortress 0.44.12: Link

Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.

Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.

For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:

  • Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
  • Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
  • Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts

DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.

(3-2a) Dwarf Fortress 0.44.12 World Gen 65x65, 250 Yr(3-2b) Dwarf Fortress 0.44.12 World Gen 129x129, 550 Yr(3-2c) Dwarf Fortress 0.44.12 World Gen 257x257, 550 Yr

Dolphin v5.0 Emulation: Link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.

(3-3) Dolphin 5.0 Render Test

Intel Regains the lead in our Dolphin test, with the Core i9 having a sizable advantage over the Core i7.

Factorio v1.1.26: Link

One of the most requested simulation game tests we’ve had in recently is that of Factorio, a construction and management title where the user builds endless automated factories of increasing complexity. Factorio falls under the same banner as other simulation games where users can lose hundreds of hours of sleepless nights configuring the minutae of their production line.

Our new benchmark here takes the v1.1.26 version of the game, a fixed map, and uses the automated benchmark mode to calculate how long it takes to run 1000 updates. This is then repeated for 5 minutes, and the best time to complete is used, reported in updates per second. The benchmark is single threaded and said to be reliant on cache size and memory.

Details for the benchmark can be found at this link.

(3-4) Factorio v1.1.26 Test

This is still a new test, so as we run through more systems, we will get more data.

CPU Tests: Rendering

Rendering tests, compared to others, are often a little more simple to digest and automate. All the tests put out some sort of score or time, usually in an obtainable way that makes it fairly easy to extract. These tests are some of the most strenuous in our list, due to the highly threaded nature of rendering and ray-tracing, and can draw a lot of power. If a system is not properly configured to deal with the thermal requirements of the processor, the rendering benchmarks is where it would show most easily as the frequency drops over a sustained period of time. Most benchmarks in this case are re-run several times, and the key to this is having an appropriate idle/wait time between benchmarks to allow for temperatures to normalize from the last test.

Blender 2.83 LTS: Link

One of the popular tools for rendering is Blender, with it being a public open source project that anyone in the animation industry can get involved in. This extends to conferences, use in films and VR, with a dedicated Blender Institute, and everything you might expect from a professional software package (except perhaps a professional grade support package). With it being open-source, studios can customize it in as many ways as they need to get the results they require. It ends up being a big optimization target for both Intel and AMD in this regard.

For benchmarking purposes, we fell back to one rendering a frame from a detailed project. Most reviews, as we have done in the past, focus on one of the classic Blender renders, known as BMW_27. It can take anywhere from a few minutes to almost an hour on a regular system. However now that Blender has moved onto a Long Term Support model (LTS) with the latest 2.83 release, we decided to go for something different.

We use this scene, called PartyTug at 6AM by Ian Hubert, which is the official image of Blender 2.83. It is 44.3 MB in size, and uses some of the more modern compute properties of Blender. As it is more complex than the BMW scene, but uses different aspects of the compute model, time to process is roughly similar to before. We loop the scene for at least 10 minutes, taking the average time of the completions taken. Blender offers a command-line tool for batch commands, and we redirect the output into a text file.

(4-1) Blender 2.83 Custom Render Test

At 8 cores, Intel gets a lead over AMD, but 10 cores in Comet Lake is better than Rocket Lake.

Corona 1.3: Link

Corona is billed as a popular high-performance photorealistic rendering engine for 3ds Max, with development for Cinema 4D support as well. In order to promote the software, the developers produced a downloadable benchmark on the 1.3 version of the software, with a ray-traced scene involving a military vehicle and a lot of foliage. The software does multiple passes, calculating the scene, geometry, preconditioning and rendering, with performance measured in the time to finish the benchmark (the official metric used on their website) or in rays per second (the metric we use to offer a more linear scale).

The standard benchmark provided by Corona is interface driven: the scene is calculated and displayed in front of the user, with the ability to upload the result to their online database. We got in contact with the developers, who provided us with a non-interface version that allowed for command-line entry and retrieval of the results very easily.  We loop around the benchmark five times, waiting 60 seconds between each, and taking an overall average. The time to run this benchmark can be around 10 minutes on a Core i9, up to over an hour on a quad-core 2014 AMD processor or dual-core Pentium.

(4-2) Corona 1.3 Benchmark

Crysis CPU-Only Gameplay

One of the most oft used memes in computer gaming is ‘Can It Run Crysis?’. The original 2007 game, built in the Crytek engine by Crytek, was heralded as a computationally complex title for the hardware at the time and several years after, suggesting that a user needed graphics hardware from the future in order to run it. Fast forward over a decade, and the game runs fairly easily on modern GPUs.

But can we also apply the same concept to pure CPU rendering? Can a CPU, on its own, render Crysis? Since 64 core processors entered the market, one can dream. So we built a benchmark to see whether the hardware can.

For this test, we’re running Crysis’ own GPU benchmark, but in CPU render mode. This is a 2000 frame test, with medium and low settings.

(4-3b) Crysis CPU Render at 1080p Low

POV-Ray 3.7.1: Link

A long time benchmark staple, POV-Ray is another rendering program that is well known to load up every single thread in a system, regardless of cache and memory levels. After a long period of POV-Ray 3.7 being the latest official release, when AMD launched Ryzen the POV-Ray codebase suddenly saw a range of activity from both AMD and Intel, knowing that the software (with the built-in benchmark) would be an optimization tool for the hardware.

We had to stick a flag in the sand when it came to selecting the version that was fair to both AMD and Intel, and still relevant to end-users. Version 3.7.1 fixes a significant bug in the early 2017 code that was advised against in both Intel and AMD manuals regarding to write-after-read, leading to a nice performance boost.

The benchmark can take over 20 minutes on a slow system with few cores, or around a minute or two on a fast system, or seconds with a dual high-core count EPYC. Because POV-Ray draws a large amount of power and current, it is important to make sure the cooling is sufficient here and the system stays in its high-power state. Using a motherboard with a poor power-delivery and low airflow could create an issue that won’t be obvious in some CPU positioning if the power limit only causes a 100 MHz drop as it changes P-states.

(4-4) POV-Ray 3.7.1

V-Ray: Link

We have a couple of renderers and ray tracers in our suite already, however V-Ray’s benchmark came through for a requested benchmark enough for us to roll it into our suite. Built by ChaosGroup, V-Ray is a 3D rendering package compatible with a number of popular commercial imaging applications, such as 3ds Max, Maya, Undreal, Cinema 4D, and Blender.

We run the standard standalone benchmark application, but in an automated fashion to pull out the result in the form of kilosamples/second. We run the test six times and take an average of the valid results.

(4-5) V-Ray Renderer

Cinebench R20: Link

Another common stable of a benchmark suite is Cinebench. Based on Cinema4D, Cinebench is a purpose built benchmark machine that renders a scene with both single and multi-threaded options. The scene is identical in both cases. The R20 version means that it targets Cinema 4D R20, a slightly older version of the software which is currently on version R21. Cinebench R20 was launched given that the R15 version had been out a long time, and despite the difference between the benchmark and the latest version of the software on which it is based, Cinebench results are often quoted a lot in marketing materials.

Results for Cinebench R20 are not comparable to R15 or older, because both the scene being used is different, but also the updates in the code bath. The results are output as a score from the software, which is directly proportional to the time taken. Using the benchmark flags for single CPU and multi-CPU workloads, we run the software from the command line which opens the test, runs it, and dumps the result into the console which is redirected to a text file. The test is repeated for a minimum of 10 minutes for both ST and MT, and then the runs averaged.

(4-6a) CineBench R20 Single Thread(4-6b) CineBench R20 Multi-Thread

The improvement in Cinebench R20 is a good measure over previous generations of Intel. However mobile Tiger Lake scores 593 at 28 W, still ahead of the 11700K.



CPU Tests: Encoding

One of the interesting elements on modern processors is encoding performance. This covers two main areas: encryption/decryption for secure data transfer, and video transcoding from one video format to another.

In the encrypt/decrypt scenario, how data is transferred and by what mechanism is pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security.

Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.

HandBrake 1.32: Link

Video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codecs, VP9 and AV1, there are others that are prominent: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H.265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content. There are other codecs coming to market designed for specific use cases all the time.

Handbrake is a favored tool for transcoding, with the later versions using copious amounts of newer APIs to take advantage of co-processors, like GPUs. It is available on Windows via an interface or can be accessed through the command-line, with the latter making our testing easier, with a redirection operator for the console output.

We take the compiled version of this 16-minute YouTube video about Russian CPUs at 1080p30 h264 and convert into three different files: (1) 480p30 ‘Discord’, (2) 720p30 ‘YouTube’, and (3) 4K60 HEVC.

(5-1a) Handbrake 1.3.2, 1080p30 H264 to 480p Discord(5-1b) Handbrake 1.3.2, 1080p30 H264 to 720p YouTube(5-1c) Handbrake 1.3.2, 1080p30 H264 to 4K60 HEVC

Up to the final 4K60 HEVC, in CPU-only mode, the Intel CPU puts up some good gen-on-gen numbers.

7-Zip 1900: Link

The first compression benchmark tool we use is the open-source 7-zip, which typically offers good scaling across multiple cores. 7-zip is the compression tool most cited by readers as one they would rather see benchmarks on, and the program includes a built-in benchmark tool for both compression and decompression.

The tool can either be run from inside the software or through the command line. We take the latter route as it is easier to automate, obtain results, and put through our process. The command line flags available offer an option for repeated runs, and the output provides the average automatically through the console. We direct this output into a text file and regex the required values for compression, decompression, and a combined score.

(5-2c) 7-Zip 1900 Combined Score

AES Encoding

Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.

(5-3) AES Encoding

WinRAR 5.90: Link

For the 2020 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack

  • 33 video files , each 30 seconds, in 1.37 GB,
  • 2834 smaller website files in 370 folders in 150 MB,
  • 100 Beat Saber music tracks and input files, for 451 MB

This is a mixture of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test for 20 minutes times and take the average of the last five runs when the benchmark is in a steady state.

For automation, we use AHK’s internal timing tools from initiating the workload until the window closes signifying the end. This means the results are contained within AHK, with an average of the last 5 results being easy enough to calculate.

(5-4) WinRAR 5.90 Test, 3477 files, 1.96 GB

 

CPU Tests: Synthetic

Most of the people in our industry have a love/hate relationship when it comes to synthetic tests. On the one hand, they’re often good for quick summaries of performance and are easy to use, but most of the time the tests aren’t related to any real software. Synthetic tests are often very good at burrowing down to a specific set of instructions and maximizing the performance out of those. Due to requests from a number of our readers, we have the following synthetic tests.

Linux OpenSSL Speed: SHA256

One of our readers reached out in early 2020 and stated that he was interested in looking at OpenSSL hashing rates in Linux. Luckily OpenSSL in Linux has a function called ‘speed’ that allows the user to determine how fast the system is for any given hashing algorithm, as well as signing and verifying messages.

OpenSSL offers a lot of algorithms to choose from, and based on a quick Twitter poll, we narrowed it down to the following:

  1. rsa2048 sign and rsa2048 verify
  2. sha256 at 8K block size
  3. md5 at 8K block size

For each of these tests, we run them in single thread and multithreaded mode. All the graphs are in our benchmark database, Bench, and we use the sha256 results in published reviews.

(8-3c) Linux OpenSSL Speed sha256 8K Block (1T)(8-4c) Linux OpenSSL Speed sha256 8K Block (nT)

Intel comes back into the game in our OpenSSL sha256 test as the AVX512 helps accelerate SHA instructions. It still isn't enough to overcome the dedicated sha256 units inside AMD.

CPU Tests: Legacy and Web

In order to gather data to compare with older benchmarks, we are still keeping a number of tests under our ‘legacy’ section. This includes all the former major versions of CineBench (R15, R11.5, R10) as well as x264 HD 3.0 and the first very naïve version of 3DPM v2.1. We won’t be transferring the data over from the old testing into Bench, otherwise it would be populated with 200 CPUs with only one data point, so it will fill up as we test more CPUs like the others.

The other section here is our web tests.

Web Tests: Kraken, Octane, and Speedometer

Benchmarking using web tools is always a bit difficult. Browsers change almost daily, and the way the web is used changes even quicker. While there is some scope for advanced computational based benchmarks, most users care about responsiveness, which requires a strong back-end to work quickly to provide on the front-end. The benchmarks we chose for our web tests are essentially industry standards – at least once upon a time.

It should be noted that for each test, the browser is closed and re-opened a new with a fresh cache. We use a fixed Chromium version for our tests with the update capabilities removed to ensure consistency.

Mozilla Kraken 1.1

Kraken is a 2010 benchmark from Mozilla and does a series of JavaScript tests. These tests are a little more involved than previous tests, looking at artificial intelligence, audio manipulation, image manipulation, json parsing, and cryptographic functions. The benchmark starts with an initial download of data for the audio and imaging, and then runs through 10 times giving a timed result.

We loop through the 10-run test four times (so that’s a total of 40 runs), and average the four end-results. The result is given as time to complete the test, and we’re reaching a slow asymptotic limit with regards the highest IPC processors.

(7-1) Kraken 1.1 Web Test

Google Octane 2.0

Our second test is also JavaScript based, but uses a lot more variation of newer JS techniques, such as object-oriented programming, kernel simulation, object creation/destruction, garbage collection, array manipulations, compiler latency and code execution.

Octane was developed after the discontinuation of other tests, with the goal of being more web-like than previous tests. It has been a popular benchmark, making it an obvious target for optimizations in the JavaScript engines. Ultimately it was retired in early 2017 due to this, although it is still widely used as a tool to determine general CPU performance in a number of web tasks.

(7-2) Google Octane 2.0 Web Test

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a test over a series of JavaScript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics.

We repeat over the benchmark for a dozen loops, taking the average of the last five.

(7-3) Speedometer 2.0 Web Test

Legacy Tests

(6-3a) CineBench R15 ST(6-3b) CineBench R15 MT(6-5a) x264 HD 3.0 Pass 1(6-5b) x264 HD 3.0 Pass 2



Gaming Tests: Deus Ex Mankind Divided

Deus Ex is a franchise with a wide level of popularity. Despite the Deus Ex: Mankind Divided (DEMD) version being released in 2016, it has often been heralded as a game that taxes the CPU. It uses the Dawn Engine to create a very complex first-person action game with science-fiction based weapons and interfaces. The game combines first-person, stealth, and role-playing elements, with the game set in Prague, dealing with themes of transhumanism, conspiracy theories, and a cyberpunk future. The game allows the player to select their own path (stealth, gun-toting maniac) and offers multiple solutions to its puzzles.

DEMD has an in-game benchmark, an on-rails look around an environment showcasing some of the game’s most stunning effects, such as lighting, texturing, and others. Even in 2020, it’s still an impressive graphical showcase when everything is jumped up to the max. For this title, we are testing the following resolutions:

  • 600p Low, 1440p Low, 4K Low, 1080p Max

The benchmark runs for about 90 seconds. We do as many runs within 10 minutes per resolution/setting combination, and then take averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Final Fantasy XIV

Despite being one number less than Final Fantasy 15, because FF14 is a massively-multiplayer online title, there are always yearly update packages which give the opportunity for graphical updates too. In 2019, FFXIV launched its Shadowbringers expansion, and an official standalone benchmark was released at the same time for users to understand what level of performance they could expect. Much like the FF15 benchmark we’ve been using for a while, this test is a long 7-minute scene of simulated gameplay within the title. There are a number of interesting graphical features, and it certainly looks more like a 2019 title than a 2010 release, which is when FF14 first came out.

With this being a standalone benchmark, we do not have to worry about updates, and the idea for these sort of tests for end-users is to keep the code base consistent. For our testing suite, we are using the following settings:

  • 768p Minimum, 1440p Minimum, 4K Minimum, 1080p Maximum

As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS

As the resolution increases, the 11900K seemed to get a better average frame rate, but with the quality increased, it falls back down again, coming behind the older Intel CPUs.

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Final Fantasy XV

Upon arriving to PC, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console. As a fantasy RPG with a long history, the fruits of Square-Enix’s successful partnership with NVIDIA are on display. The game uses the internal Luminous Engine, and as with other Final Fantasy games, pushes the imagination of what we can do with the hardware underneath us. To that end, FFXV was one of the first games to promote the use of ‘video game landscape photography’, due in part to the extensive detail even at long range but also with the integration of NVIDIA’s Ansel software, that allowed for super-resolution imagery and post-processing effects to be applied.

In preparation for the launch of the game, Square Enix opted to release a standalone benchmark. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues. We use the following settings:

  • 720p Standard, 1080p Standard, 4K Standard, 8K Standard

For automation, the title accepts command line inputs for both resolution and settings, and then auto-quits when finished. As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: World of Tanks

Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved esports status when it debuted at the World Cyber Games back in 2012.

World of Tanks enCore is a demo application for its new graphics engine penned by the Wargaming development team. Over time the new core engine has been implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine runs optimally on their system. There is technically a Ray Tracing version of the enCore benchmark now available, however because it can’t be deployed standalone without the installer, we decided against using it. If that gets fixed, then we can look into it.

The benchmark tool comes with a number of presets:

  • 768p Minimum, 1080p Standard, 1080p Max, 4K Max (not a preset)

The odd one out is the 4K Max preset, because the benchmark doesn’t automatically have a 4K option – to get this we edit the acceptable resolutions ini file, and then we can select 4K. The benchmark outputs its own results file, with frame times, making it very easy to parse the data needed for average and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

WoT is a fun test to see 700 FPS+ numbers with the best CPUs. However the differences between the CPUs end up being minor.

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Borderlands 3

As a big Borderlands fan, having to sit and wait six months for the EPIC Store exclusive to expire before we saw it on Steam felt like a long time to wait. The fourth title of the franchise, if you exclude the TellTale style-games, BL3 expands the universe beyond Pandora and its orbit, with the set of heroes (plus those from previous games) now cruising the galaxy looking for vaults and the treasures within. Popular Characters like Tiny Tina, Claptrap, Lilith, Dr. Zed, Zer0, Tannis, and others all make appearances as the game continues its cel-shaded design but with the graphical fidelity turned up. Borderlands 1 gave me my first ever taste of proper in-game second order PhysX, and it’s a high standard that continues to this day.

BL3 works best with online access, so it is filed under our online games section. BL3 is also one of our biggest downloads, requiring 100+ GB. As BL3 supports resolution scaling, we are using the following settings:

  • 360p Very Low, 1440p Very Low, 4K Very Low, 1080p Badass

BL3 has its own in-game benchmark, which recreates a set of on-rails scenes with a variety of activity going on in each, such as shootouts, explosions, and wildlife. The benchmark outputs its own results files, including frame times, which can be parsed for our averages/percentile data.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: F1 2019

The F1 racing games from Codemasters have been popular benchmarks in the tech community, mostly for ease-of-use and that they seem to take advantage of any area of a machine that might be better than another. The 2019 edition of the game features all 21 circuits on the calendar for that year, and includes a range of retro models and DLC focusing on the careers of Alain Prost and Ayrton Senna. Built on the EGO Engine 3.0, the game has been criticized similarly to most annual sports games, by not offering enough season-to-season graphical fidelity updates to make investing in the latest title worth it, however the 2019 edition revamps up the Career mode, with features such as in-season driver swaps coming into the mix. The quality of the graphics this time around is also superb, even at 4K low or 1080p Ultra.

For our test, we put Alex Albon in the Red Bull in position #20, for a dry two-lap race around Austin. We test at the following settings:

  • 768p Ultra Low, 1440p Ultra Low, 4K Ultra Low, 1080p Ultra

In terms of automation, F1 2019 has an in-game benchmark that can be called from the command line, and the output file has frame times. We repeat each resolution setting for a minimum of 10 minutes, taking the averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

The Ego engine is usually a good bet where cores, IPC, and frequency matters.

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Far Cry 5

The fifth title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration with a lot of configurability.

Unfortunately, the game doesn’t like us changing the resolution in the results file when using certain monitors, resorting to 1080p but keeping the quality settings. But resolution scaling does work, so we decided to fix the resolution at 1080p and use a variety of different scaling factors to give the following:

  • 720p Low, 1440p Low, 4K Low, 1440p Max.

Far Cry 5 outputs a results file here, but that the file is a HTML file, which showcases a graph of the FPS detected. At no point in the HTML file does it contain the frame times for each frame, but it does show the frames per second, as a value once per second in the graph. The graph in HTML form is a series of (x,y) co-ordinates scaled to the min/max of the graph, rather than the raw (second, FPS) data, and so using regex I carefully tease out the values of the graph, convert them into a (second, FPS) format, and take our values of averages and percentiles that way.

If anyone from Ubisoft wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to [email protected]. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).

As with the other gaming tests, we run each resolution/setting combination for a minimum of 10 minutes and take the relevant frame data for averages and percentiles.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Gears Tactics

Remembering the original Gears of War brings back a number of memories – some good, and some involving online gameplay. The latest iteration of the franchise was launched as I was putting this benchmark suite together, and Gears Tactics is a high-fidelity turn-based strategy game with an extensive single player mode. As with a lot of turn-based games, there is ample opportunity to crank up the visual effects, and here the developers have put a lot of effort into creating effects, a number of which seem to be CPU limited.

Gears Tactics has an in-game benchmark, roughly 2.5 minutes of AI gameplay starting from the same position but using a random seed for actions. Much like the racing games, this usually leads to some variation in the run-to-run data, so for this benchmark we are taking the geometric mean of the results. One of the biggest things that Gears Tactics can do is on the resolution scaling, supporting 8K, and so we are testing the following settings:

  • 720p Low, 4K Low, 8K Low, 1080p Ultra

For results, the game showcases a mountain of data when the benchmark is finished, such as how much the benchmark was CPU limited and where, however none of that is ever exported into a file we can use. It’s just a screenshot which we have to read manually.

If anyone from the Gears Tactics team wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to [email protected]. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).

As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed. For this benchmark, we manually read each of the screenshots for each quality/setting/run combination. The benchmark does also give 95th percentiles and frame averages, so we can use both of these data points.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

Gears is the one test where at our 1080p Maximum settings it shines ahead of the pack. Although at high resolution, low quality, although all five CPUs are essentially equal, it still sits behind AMD's Ryzen APU.

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: GTA 5

The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA to help optimize the title. At this point GTA V is super old, but still super useful as a benchmark – it is a complicated test with many features that modern titles today still struggle with. With rumors of a GTA 6 on the horizon, I hope Rockstar make that benchmark as easy to use as this one is.

GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

We are using the following settings:

  • 720p Low, 1440p Low, 4K Low, 1080p Max

The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data. The benchmark can also be called from the command line, making it very easy to use.

There is one funny caveat with GTA. If the CPU is too slow, or has too few cores, the benchmark loads, but it doesn’t have enough time to put items in the correct position. As a result, for example when running our single core Sandy Bridge system, the jet ends up stuck at the middle of an intersection causing a traffic jam. Unfortunately this means the benchmark never ends, but still amusing.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Red Dead Redemption 2

It’s great to have another Rockstar benchmark in the mix, and the launch of Red Dead Redemption 2 (RDR2) on the PC gives us a chance to do that. Building on the success of the original RDR, the second incarnation came to Steam in December 2019 having been released on consoles first. The PC version takes the open-world cowboy genre into the start of the modern age, with a wide array of impressive graphics and features that are eerily close to reality.

For RDR2, Rockstar kept the same benchmark philosophy as with Grand Theft Auto V, with the benchmark consisting of several cut scenes with different weather and lighting effects, with a final scene focusing on an on-rails environment, only this time with mugging a shop leading to a shootout on horseback before riding over a bridge into the great unknown. Luckily most of the command line options from GTA V are present here, and the game also supports resolution scaling. We have the following tests:

  • 384p Minimum, 1440p Minimum, 8K Minimum, 1080p Max

For that 8K setting, I originally thought I had the settings file at 4K and 1.0x scaling, but it was actually set at 2.0x giving that 8K.  For the sake of it, I decided to keep the 8K settings.

For our results, we run through each resolution and setting configuration for a minimum of 10 minutes, before averaging and parsing the frame time data.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Gaming Tests: Strange Brigade

Strange Brigade is based in 1903’s Egypt, and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen, who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.

The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark as an on-rails experience through the game. For quality, the game offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. Strange Brigade supports Vulkan and DX12, and so we test on both.

  • 720p Low, 1440p Low, 4K Low, 1080p Ultra

The automation for Strange Brigade is one of the easiest in our suite – the settings and quality can be changed by pre-prepared .ini files, and the benchmark is called via the command line. The output includes all the frame time data.

AnandTech Low Resolution
Low Quality
Medium Resolution
Low Quality
High Resolution
Low Quality
Medium Resolution
Max Quality
Average FPS
95th Percentile

All of our benchmark results can also be found in our benchmark engine, Bench.



Conclusion

For anyone buying a new system today, the market is a little bleak. Anyone wanting a new GPU has to actively pay attention to stock levels, or drive to a local store for when a delivery arrives. The casual buyers then either look to pre-built systems (which are also flying off the shelves), or just hang on to what they have for another year.

But there is another way. I find that users fall in to two camps.

The first camp is the ‘upgrade everything at once’ attitude. These users sell their old systems and buy, mostly, all anew. Depending on budget and savings, this is probably a good/average system, and it means you get a good run of what’s available at that time. It’s a multi-year upgrade cycle where you might get something good for that generation, and hopefully everything is balanced.

The other camp is the ‘upgrade one piece at a time’. This means that if it’s time to upgrade a storage drive, or a memory kit, or a GPU, or a CPU, you get the best you can afford at that time. So you might end up with an older CPU but a top end GPU, good storage, good power supply, and then next time around, it’s all about CPU and motherboard upgrades. This attitude has the potential for more bottlenecks, but it means you often get the best of a generation, and each piece holds its resale value more.

In a time where we have limited GPUs available, I can very much see users going all out on the CPU/memory side of the equation, perhaps spending a bit extra on the CPU, while they wait for the graphics market to come back into play. After all, who really wants to pay $1300 for an RTX 3070 right now?

Performance and Analysis

In our Core i7-11700K review, our conclusions there are very much broadly applicable here. Intel’s Rocket Lake as a backported processor design has worked, but has critical issues with efficiency and peak power draw. Compared to the previous generation, clock-for-clock performance gains for math workloads are 16-22% or 6-18% for other workloads, however the loss of two cores really does restrict how much of a halo product it can be in light of what AMD is offering.

Rocket Lake makes good in offering PCIe 4.0, and enabling new features like Gear ratios for the memory controller, as well as pushing for more support for 2.5 gigabit Ethernet, however it becomes a tough sell. At the time we reviewed the Core i7-11700K, we didn’t know the pricing, and it was looking like AMD’s stock levels were pretty bad, subsequently making Intel the default choice. Since then, Intel's pricing hasn't turned out too bad for its performance compared to AMD (except for the Core i9), however AMD’s stock is a lot more bountiful.

For anyone looking at the financials for Intel, the new processor is 25% bigger than before, but not being sold for as big a margin as you might expect. In some discussions in the industry, it looks like retailers are getting roughly 20%/80% stock for Core i9 to Core i7, indicating that Intel is going to be very focused on that Core i7 market around $400-$450. In that space, AMD and Intel both have well-performing products, however AMD gets an overall small lead and is much more efficient.

However, with the GPU market being so terrible, users could jump an extra $100 and get 50% more AMD cores. When AMD is in stock, Intel’s Rocket Lake is more about the platform than the processor. If I said that that the Rocket Lake LGA1200 platform had no upgrade potential, for users buying in today, an obvious response might be that neither does AM4, and you’d be correct. However, for any user buying a Core i7-11700K on an LGA1200 today, compared to a Ryzen 7 5800X customer on AM4, the latter still has the opportunity to go to 16 cores if needed. Rocket Lake comes across with a lot of dead-ends in that regard, especially as the next generation is meant to be on a new socket, and with supposedly new memory.

Rocket Lake: Failed Experiment, or Good Attempt?

For Intel, Rocket Lake is a dual purpose design. On the one hand, it provides Intel with something to put into its desktop processor roadmap while the manufacturing side of the business is still getting sorted. On the other hand it gives Intel a good marker in the sand for what it means to backport a processor.

Rocket Lake, in the context of backporting, has been a ‘good attempt’ – good enough to at least launch into the market. It does offer performance gains in several key areas, and does bring AVX-512 to the consumer market, albeit at the expense of power. However in a lot of use cases that people are enabling today, which aren’t AVX-512 enabled, there’s more performance to be had with older processors, or the competition. Rocket Lake also gets you PCIe 4.0, however users might feel that is a small add-in when AMD has PCIe 4.0, lower power, and better general performance for the same price.

Intel’s future is going to be full of processor cores built for multiple process nodes. What makes Rocket Lake different is that when the core was designed for 10nm, it was solely designed for 10nm, and no thought was ever given to a 14nm version. The results in this review show that this sort of backporting doesn’t really work, not to the same level of die size, performance, and profit margin needed to move forward. It was a laudable experiment, but in the future, Intel will need to co-design with multiple process nodes in mind.

Log in

Don't have an account? Sign up now