Original Link: https://www.anandtech.com/show/16341/intel-core-i9-10850k-review-the-real-intel-flagship
Intel Core i9-10850K Review: The Real Intel Flagship
by Dr. Ian Cutress on January 4, 2021 9:00 AM EST- Posted in
- CPUs
- Intel
- Core
- Z490
- 10th Gen Core
- Comet Lake
- LGA1200
- i9-10850K
When a company like Intel creates a CPU design, the process of manufacturing brings about variation on the quality of the product. Some cores will only reach a certain frequency, while others have surprisingly good voltage characteristics. Two goals of processor design are minimizing this variance, but also shifting the peak higher, all while controlling how much of the silicon is actually useable. This is part of the magic of ‘binning’, the process of filtering the silicon into different ‘bins’ for applicability to a given product. It is through this process that the Core i9-10850K exists, albeit reluctantly.
Intel’s Core i9-10850K: Doing The Heavy Lifting
The Core i9-10850K is the entry member to Intel’s Core i9 lineup, with 10 unlocked cores and hyperthreading, and can turbo up to 5.2 GHz under Thermal Velocity Boost. The added bonus is that it's more widely available in the market than the Core i9-10900K.
Intel 10th Gen Comet Lake Core i9 and Core i7 |
||||||||||
AnandTech | Cores | Base Freq |
TB2 2C |
TB2 nT |
TB3 2C |
TVB 2C |
TVB nT |
TDP (W) |
IGP | MSRP 1ku |
Core i9 | ||||||||||
i9-10900K | 10C/20T | 3.7 | 5.1 | 4.8 | 5.2 | 5.3 | 4.9 | 125 | 630 | $488 |
i9-10900KF | 10C/20T | 3.7 | 5.1 | 4.8 | 5.2 | 5.3 | 4.9 | 125 | - | $472 |
i9-10900 | 10C/20T | 2.8 | 5.0 | 4.5 | 5.1 | 5.2 | 4.6 | 65 | 630 | $439 |
i9-10900F | 10C/20T | 2.8 | 5.0 | 4.5 | 5.1 | 5.2 | 4.6 | 65 | - | $422 |
i9-10900T | 10C/20T | 1.9 | 4.5 | 3.7 | 4.6 | - | - | 35 | 630 | $439 |
i9-10850K | 10C/20T | 3.6 | 5.0 | 4.7 | 5.1 | 5.2 | 4.8 | 125 | 630 | $453 |
Core i7 | ||||||||||
i7-10700K | 8C/16T | 3.8 | 5.0 | 4.7 | 5.1 | - | - | 125 | 630 | $374 |
i7-10700KF | 8C/16T | 3.8 | 5.0 | 4.7 | 5.1 | - | - | 125 | - | $349 |
i7-10700 | 8C/16T | 2.9 | 4.7 | 4.6 | 4.8 | - | - | 65 | 630 | $323 |
i7-10700F | 8C/16T | 2.9 | 4.7 | 4.6 | 4.8 | - | - | 65 | - | $298 |
i7-10700T | 8C/16T | 2.0 | 4.4 | 3.7 | 4.5 | - | - | 35 | 630 | $325 |
As with the other 10th Gen Intel Core i9 processors, this one supports two channels of DDR4-2933, uses the LGA1200 socket on Intel 400-series motherboards, and has sixteen lanes of PCIe 3.0 for add-in hardware. Intel likes to point out it has another 24 PCIe 3.0 lanes through the chipset, however this is limited by the DMI/PCIe 3.0 x4 uplink to the processor.
With the Core i9-10850K, users are essentially getting the Core i9-10900K top tier model, but at 100 MHz lower across the board. The peak turbo is 5.2 GHz rather than 5.3 GHz, the base frequency is 3.6 GHz rather than 3.7 GHz, and both processors are set at a 125 W TDP. Saving 100 MHz also saves $35 from the bulk pricing of the processor, but because of the lack of availability in the 10900K, we’ve seen the difference between the two vary from $50 to $200 in recent months. This all feeds into the main story of what is going on here.
Despite being part of Intel 10th Generation Core family, the Core i9-10850K was released after the launch of the more public and vocal members, such as the 10900K or 10700K. The email about the 10850K dropped in our inbox on July 27th, two months after the official launch of the rest of the family, and while Intel didn’t have samples ready for that launch today (to be honest, it took us by surprise), requests were lodged and our arrived a few weeks later.
While not completely surreptitious, Intel pushed this processor onto the market without so much fanfare.
The Secret Art of Binning
Binning is a fancy word for quality management and filtering – when the silicon is manufactured, some of it is better quality than others, and by testing the quality each product can be filtered into where it is best suited, filtered into ‘bins’. The nature of binning is not new in the industry by any stretch of the imagination, as depending on the manufacturing process quality can vary wildly, and binning enables a semiconductor company to make the most out of the fixed price wafer costs. If a given wafer provides 50 processors, only 10 meet the ideal quality level but another 35 meet a lower quality level, then the yield is 45 out of 50, rather than just 10, enabling less waste and arguably better value for the end customer. Normally when a company talks about yield, they are talking about this 45 out of 50 number.
The main element to how this binning manifests is in two forms of variability: the variability in the process and the variability in the design. Manufacturing is vastly complex, however the methods and order of tasks in the lithography process, as well as the speed of production, can affect this variability (it comes down to a lot of R&D). Design variability is somewhat different, as it requires engineers to build mechanisms into a processor design to minimize quality variability, and this might come at the expense of power or die area.
For a unique company like Intel, technically they can manipulate both of these sources of variability, but for others who rely on foundry manufacturing, it’s a one sided affair and the companies that pay the most to TSMC (like Apple) get key details on the other side of the equation.
The end goal is to minimize variability (make each processor off the line reach the same target), but also to move that variability peak nearer a more desirable goal, such as performance, or power. The whole process is a conveyor belt of 10000 levers and switches, where each one can affect the performance of a dozen others, and so finding the best configuration in a sea of options can be very difficult.
Example shmoo plots of Samsung's 5nm Test SoC, showing simple pass/fail at voltage and frequency
Simple pass/fail metrics are often graphed in a shmoo plot, like the one above. Beyond a pass/fail metric, companies like Intel have to also determine what percentage of the processors on a given wafer or batch meet those requirements. Because a graph of variability in the quality of silicon can be so varied, where the processor company defines its product binning targets is very important. A company like Intel needs to decide how many processors of each type it will need, what its customers will need, how that will change over time, and what it can do to maximize sales $ per square millimeter of silicon. In a situation where customers might want a cheaper product, Intel could take higher quality silicon and label it as a lower quality product, so it doesn’t sit on a lot of unsold processors. But contrary, if customers are demanding a higher quality product that manufacturing can’t deliver, then it can become an issue. It also matters when it comes to marketing as well.
A shmoo plot of Intel's Centrino CPU, with the letters potentially indicating
what % of processors pass/fail at those points.
A semiconductor company like Intel can do themselves a favor by choosing binning targets and metrics that are not as aggressive. But it’s really the high-end halo products that matter when it comes to promoting the best of the best. Intel has a history of very aggressive binning, to the point where its quality requirements for the top product are so strict that only a handful of processors will ever meet that level for every million produced.
Over the past two decades Intel has made super not-so-secret ‘Everest’ or ‘BlackOps’ processor models. These are technically off-roadmap processors not for general sale, because of the very strict quality requirements. These units are selected because of the super-high frequency possible, usually at a completely disregard for power or cooling requirements. One of the first good examples of this was the special-order-only dual core Xeon X5698 rated for 4.4 GHz in Q1 2011, based on Intel’s 32nm Westmere platform, and was built solely for high-frequency stock market traders who needed the lowest latency whatever the cost. The concept of a microsecond for these traders can be millions of dollars, so throwing $20k+ each for the fastest processor available is chump change (that includes OEM markup). These were 1000 MHz faster than any of Intel’s regularly binned processors for the open market.
Sandy Bridge also had an Everest model, whereas for Ivy Bridge it was known as BlackOps, offering 6 cores at 4.6 GHz all-core and a massive 250 W TDP. These were again destined for Wall Street, but came with no product identifier, and as far as we can tell, no warranty except for dead-on-arrival (DOA). These were such at the edge of Intel’s manufacturing capabilities that if you wanted one, you had to accept that it might not work beyond a couple of months. Again, for these traders, we’re talking fractions of a percent of cost, so that was almost of zero concern.
The only reason we know about these is because over time some have filtered into the hands of enthusiasts and collectors. More recently, Intel’s latest high-frequency trading processor was a bit of a doozy. We overheard (and confirmed) at an event that Intel was planning to launch an auction-only OEM-only high-performance processor where it couldn’t guarantee stock nor would it offer any warranty.
This was a 14-core Core i9-9990XE processor, with all cores running at 5.0 GHz all the time, built for Intel’s high-end desktop platform (quad channel memory, 48 PCIe lanes). We actually got one into review through our industry contacts, in a special single socket server unit and custom cooling. The unit had a peak power loading somewhere north of 600 W. It went well above Intel’s 28-core high-end Xeon products. Single threaded performance was crazy.
Unfortunately the Core i9-9990XE didn’t do so well commercially. The idea was that the OEMs that ended up ‘winning’ these CPUs at Intel’s auctions would build systems for high-frequency traders and sell them on that way. Most of the companies that ‘won’ the processors at Intel’s auctions ended up selling them as CPU-only models instead, after it couldn’t get sufficient interest. Intel has since stopped making these CPUs available, except for the one company that is still selling to HFT.
Since then, Intel has launched its consumer grade 8-core Core i9-9900KS, which also had all eight cores at 5.0 GHz, albeit with only dual channel memory and 16 PCIe lanes. This was still a very limited edition product, and was only on sale for four months before being discontinued. Intel also gave away 400 special binned versions of its 10900K as part of an influencer giveaway, centered around its new thermoelectric/liquid all-in-one cooler designed to bring temperatures below ambient during idle operation. These are not a special numbered version you can buy, however.
When Intel (or anyone else) designs a processor, there are a number of tools at the company’s disposal in order to help the efficacy and variability of the manufacturing to help guide its products into given quality bins. Should Intel focus on making a very specific HFT processor, then there are 1000 design decisions that would be made which would make no sense for a consumer product, and so you can see the complexity of trying to extract the best performance out of the silicon if the bulk of the sales for that design are going to be for a more mainstream product. For an example of a processor built for extreme tasks, then I point towards IBM’s z-series hardware, which starts at 12 cores running 5.2 GHz, and four processors in a server share a 960 MB L4 cache chip. It’s a crazy (but fun) rabbit hole of processor design to go down.
Binning: Core i9-10850K
So why spend so much talking about special 1-in-a-million processors? The point is that when Intel decides where to draw the line on silicon quality, where it draws the line for its commercial flagship is very important. It needs to draw the line at a place where it has a competitive product, but can also produce enough to satisfy the needs of the market. There’s no point drawing that line for consumer flagship at the 1-in-a-million level if you can only sell 10 a month. If reviewers lead with that 1-in-a-million review on launch day but no-one can buy it, ever, then as a brand you’ve misdirected your customer base. At least with a BlackOps processor most users understand they will never see one, but for the top Core i9 product, it has to be widely available - it needs to be somewhere in that 10000-in-a-million level at least (one might imagine).
Intel's 10-core Flagship Comet Lake Family: 10900K, 10850K, W-1290P
Which brings me to the sole reason why the Core i9-10850K exists and why it was launched two months after the main product line, after the Core i9-10900K. Simply put, Intel drew that quality line for its top bin processor too aggressive to the point where the company could not meet the demand. We’ve never had a x50K processor in previous generations because that drawn line was so aggressive, we’ve only ever had the line drawn further away (such as Devil’s Canyon on Haswell). This is very much the first instance in recent memory I’ve seen a vendor have to introduce a processor with more lax requirements because it couldn’t build enough of the top-end chip.
In the first six months since the launch of the Core i9-10900K, stock has been scarce is popular markets, and non-existent in secondary markets, with a very slow trickle through. That level has somewhat normalized now, but users who could not purchase the 10900K had two options: (1) wait until they could get one, or (2) buy something else, perhaps from a competitor.
In order to bridge that gap, the Core i9-10850K came out onto the market, where the binning line was ever so slightly more lax compared to the 10900K, and more silicon could achieve the targets. We’ve seen in the SKU list that it appears to be a 100 MHz drop across the board, but as we’ll see in this review, the voltage also seems to be higher, so it’s burning more power to get a lower frequency.
The Core i9-10850K has been widely available since launch. In our monthly CPU Guides, I’ve seen sufficient quantities of 10850K units with prices hovering around the Intel expected pricing (sometimes cheaper with OEM-only non-packaged 1-year seller-only versions), especially at a time where stock of the 10900K is so limited that sellers were charging +$200 premiums for the stock they did have. At that price, users were looking at 16-core AMD models, especially for a high-performance market where multi-core performance is still a key metric.
If Intel had drawn the line at the 10850K level in the first instance, the concept of binning wouldn’t be a discussion at this time, and there would have been sufficient stock on the shelves since launch. The question at the time was all about the message that Intel wanted to project with its top-tier 10-core overclockable model, and the 5.3 GHz mark provided that message, even if quantity of available processors was limited. There’s no point having a BlackOps style processor as the flagship if no-one can buy it. This is why the title of this review is that the Core i9-10850K is the true flagship – it’s the one that has been actively available to purchase.
This Review
Intel provided a Core i9-10850K retail processor for review, and as a result we’ve put it through our benchmark suite. Over the next few pages we will cover CPU performance, gaming performance, and some of our microbenchmarks for power and frequency ramping response. The Core i9-10850K was launched before the Ryzen 5000 processors, however as both are available today, both sets of numbers are included.
Test Setup
As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer.
Test Setup | |||||
Intel LGA1200 | Core i9-10900K Core i9-10850K Core i7-10700K |
ASRock Z490 PG Velocita |
BIOS P1.50 |
TRUE Copper |
Corsair DomRGB 4x8 GB DDR4-2933 |
AMD AM4 | Ryzen 9 5900X Ryzen 7 5800X Ryzen 5 5600X |
MSI MEG X570 Godlike |
1.B3 T13 |
Noctua NHU-12S SE-AM4 |
ADATA 2x32 GB DDR4-3200 |
GPU | Sapphire RX 460 2GB (CPU Tests) NVIDIA RTX 2080 Ti FE (Gaming Tests) |
||||
PSU | Corsair AX860i Corsair AX1200i Silverstone SST-ST1000-P |
||||
SSD | Crucial MX500 2TB | ||||
Additional Cooling provided by SST-FHP141-VF 173 CFM fans |
Many thanks to...
We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.
Hardware Providers for CPU and Motherboard Reviews | |||
Sapphire RX 460 Nitro |
NVIDIA RTX 2080 Ti |
Crucial SSDs | Corsair PSUs |
G.Skill DDR4 | ADATA DDR4 | Silverstone Coolers |
Noctua Coolers |
A big thanks to ADATA for the AD4U3200716G22-SGN modules for this review. They're currently the backbone of our AMD testing.
Users interested in the details of our current CPU benchmark suite can refer to our #CPUOverload article which covers the topics of benchmark automation as well as what our suite runs and why. We also benchmark much more data than is shown in a typical review, all of which you can see in our benchmark database. We call it ‘Bench’, and there’s also a link on the top of the website in case you need it for processor comparison in the future.
Power Consumption
The nature of reporting processor power consumption has become, in part, a dystopian nightmare. Historically the peak power consumption of a processor, as purchased, is given by its Thermal Design Power (TDP, or PL1). For many markets, such as embedded processors, that value of TDP still signifies the peak power consumption. For the processors we test at AnandTech, either desktop, notebook, or enterprise, this is not always the case.
Modern high performance processors implement a feature called Turbo. This allows, usually for a limited time, a processor to go beyond its rated frequency. Exactly how far the processor goes depends on a few factors, such as the Turbo Power Limit (PL2), whether the peak frequency is hard coded, the thermals, and the power delivery. Turbo can sometimes be very aggressive, allowing power values 2.5x above the rated TDP.
AMD and Intel have different definitions for TDP, but are broadly speaking applied the same. The difference comes to turbo modes, turbo limits, turbo budgets, and how the processors manage that power balance. These topics are 10000-12000 word articles in their own right, and we’ve got a few articles worth reading on the topic.
- Why Intel Processors Draw More Power Than Expected: TDP and Turbo Explained
- Talking TDP, Turbo and Overclocking: An Interview with Intel Fellow Guy Therien
- Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics
- Intel’s TDP Shenanigans Hurts Everyone
In simple terms, processor manufacturers only ever guarantee two values which are tied together - when all cores are running at base frequency, the processor should be running at or below the TDP rating. All turbo modes and power modes above that are not covered by warranty. Intel kind of screwed this up with the Tiger Lake launch in September 2020, by refusing to define a TDP rating for its new processors, instead going for a range. Obfuscation like this is a frustrating endeavor for press and end-users alike.
However, for our tests in this review, we measure the power consumption of the processor in a variety of different scenarios. These include full workflows, real-world image-model construction, and others as appropriate. These tests are done as comparative models. We also note the peak power recorded in any of our tests.
I’m here plotting the 10900K against the 10850K as we load the threads with AIDA’s stress test. Peak values are being reported.
On the front page, I stated that one of the metrics on where those quality lines were drawn, aside from frequency, is power and voltage response. Moving the needle for binning by 100 MHz is relatively easy, but binning for power is more difficult beast to control. Our tests show that for any full-threaded workload, despite being a lower frequency than the 10900K, our 10850K actually uses more power. At the extreme, this is +15-20W more, or up to 2 W per core, showcasing just how strict the metrics on the 10900K had to be (and perhaps why Intel has had difficulty manufacturing enough). However, one could argue that it was Intel’s decision to draw the line that aggressive.
In more lightly threaded workloads, the 10850K actually seems to use less power, which might indicate that this could be a current density issue being the prime factor in binning.
For a real workload, we’re using our Agisoft Photoscan benchmark. This test has a number of different areas that involve single thread, multi-thread, or memory limited algorithms.
At first glance, it looks as if the Core i9-10850K consumes more power at any loading, but it is worth noting the power levels in the 80-100% region of the test, when we dip below 50 W. This is when we’re likely using 1 or 2 threads, and the power of the Core i9-10900K is much higher as a percentage here, likely because of the 5300 MHz setting.
After getting these results, it caused me to look more at the data underneath. In terms of power per core, when testing POV-Ray at full load the difference is about a watt per core or just under. What surprised me more was the frequency response as well as the core loading temperature.
Starting with the 10900K:
In the initial loading, we get 5300 MHz and temperatures up into the 85-90ºC bracket. It’s worth noting that at these temperatures the CPU shouldn’t be in Thermal Velocity Boost, which should have a hard ceiling of 70ºC, but most modern motherboards will ignore that ‘Intel recommendation’. Also, when we look at watts per core, on the 10900K we’re looking at 26 W on a single core, just to get 5300 MHz, so no wonder it drops down to 15-19W per core very quickly.
The processor runs down to 5000 MHz at 3 cores loaded, sitting at 81ºC. Then as we go beyond three cores, the frequency dips only slightly, and the temperature of the whole package increases steadily up and up, until quite toasty 98ºC. This is even with our 2 kg copper cooler, indicating that at this point it’s more about thermal transfer inside the silicon itself rather than radiating away from the cooler.
When we do the same comparison for the Core i9-10850K however, the results are a bit more alarming.
This graph comes in two phases.
The first phase is the light loading, and because we’re not grasping for 5300 MHz, the temperature doesn’t go into the 90ºC segment at light loading like the 10900K does. The frequency profile is a bit more stair shaped than the 10900K, but as we ramp up the cores, even at a lower frequency, the power and the thermals increase. At full loading, with the same cooler and the same benchmarks in the same board, we’re seeing reports of 102ºC all-package temperature. The cooler is warm, but not excessively so, again showcasing that this is more an issue for thermal migration inside the silicon rather than cooling capacity.
To a certain degree, silicon is already designed with thermal migration in mind. It’s what we call ‘dark’ silicon, essentially silicon that is disabled/not anything that acts as a thermal (or power/electrical) barrier between different parts of the CPU. Modern processors already have copious amounts of dark silicon, and as we move to denser process node technologies, it will require even more. The knock on effect on this is die size, which could also affect yields for a given defect density.
Despite these thermals, none of our benchmarks (either gaming or high-performance compute) seemed to be out of line based on expectations – if anything the 10850K outperforms what we expected. The only gripe is going to be cooling, as we used an open test bed and arguably the best air cooler on the market, and users building into a case will need something similarly substantial, probably of the liquid cooling variety.
CPU Tests: Microbenchmarks
Core-to-Core Latency
As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.
But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.
If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.
When we first reviewed the 10-core Comet Lake processors, we noticed that a core (or two) seemed to take slightly longer to ping/pong than the others. We see the same pattern here again with the final core.
Frequency Ramping
Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.
Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.
One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.
We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.
We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.
The Core i9-10850K ramps up extremely quickly from idle to peak turbo, in the region of about 5 milliseconds. This is faster than the 16 ms we typically observe.
CPU Tests: Office and Science
Our previous set of ‘office’ benchmarks have often been a mix of science and synthetics, so this time we wanted to keep our office section purely on real world performance.
Agisoft Photoscan 1.3.3: link
The concept of Photoscan is about translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the final 3D model in both spatial accuracy and texturing accuracy. The algorithm has four stages, with some parts of the stages being single-threaded and others multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.
For the update to version 1.3.3, the Agisoft software now supports command line operation. Agisoft provided us with a set of new images for this version of the test, and a python script to run it. We’ve modified the script slightly by changing some quality settings for the sake of the benchmark suite length, as well as adjusting how the final timing data is recorded. The python script dumps the results file in the format of our choosing. For our test we obtain the time for each stage of the benchmark, as well as the overall time.
Application Opening: GIMP 2.10.18
First up is a test using a monstrous multi-layered xcf file to load GIMP. While the file is only a single ‘image’, it has so many high-quality layers embedded it was taking north of 15 seconds to open and to gain control on the mid-range notebook I was using at the time.
What we test here is the first run - normally on the first time a user loads the GIMP package from a fresh install, the system has to configure a few dozen files that remain optimized on subsequent opening. For our test we delete those configured optimized files in order to force a ‘fresh load’ each time the software in run. As it turns out, GIMP does optimizations for every CPU thread in the system, which requires that higher thread-count processors take a lot longer to run.
We measure the time taken from calling the software to be opened, and until the software hands itself back over to the OS for user control. The test is repeated for a minimum of ten minutes or at least 15 loops, whichever comes first, with the first three results discarded.
GIMP as a test can be confusing, but it's best to look at it as a single threaded test but the number of loops required is a function of the thread count. Fast small core count chips do best here.
Science
In this version of our test suite, all the science focused tests that aren’t ‘simulation’ work are now in our science section. This includes Brownian Motion, calculating digits of Pi, molecular dynamics, and for the first time, we’re trialing an artificial intelligence benchmark, both inference and training, that works under Windows using python and TensorFlow. Where possible these benchmarks have been optimized with the latest in vector instructions, except for the AI test – we were told that while it uses Intel’s Math Kernel Libraries, they’re optimized more for Linux than for Windows, and so it gives an interesting result when unoptimized software is used.
3D Particle Movement v2.1: Non-AVX and AVX2/AVX512
This is the latest version of this benchmark designed to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. This involves randomly moving particles in a 3D space using a set of algorithms that define random movement. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in.
The initial version of v2.1 is a custom C++ binary of my own code, and flags are in place to allow for multiple loops of the code with a custom benchmark length. By default this version runs six times and outputs the average score to the console, which we capture with a redirection operator that writes to file.
For v2.1, we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere. According to Jim Keller, there are only a couple dozen or so people who understand how to extract the best performance out of a CPU, and this guy is one of them. To keep things honest, AMD also has a copy of the code, but has not proposed any changes.
The 3DPM test is set to output millions of movements per second, rather than time to complete a fixed number of movements.
y-Cruncher 0.78.9506: www.numberworld.org/y-cruncher
If you ask anyone what sort of computer holds the world record for calculating the most digits of pi, I can guarantee that a good portion of those answers might point to some colossus super computer built into a mountain by a super-villain. Fortunately nothing could be further from the truth – the computer with the record is a quad socket Ivy Bridge server with 300 TB of storage. The software that was run to get that was y-cruncher.
Built by Alex Yee over the last part of a decade and some more, y-Cruncher is the software of choice for calculating billions and trillions of digits of the most popular mathematical constants. The software has held the world record for Pi since August 2010, and has broken the record a total of 7 times since. It also holds records for e, the Golden Ratio, and others. According to Alex, the program runs around 500,000 lines of code, and he has multiple binaries each optimized for different families of processors, such as Zen, Ice Lake, Sky Lake, all the way back to Nehalem, using the latest SSE/AVX2/AVX512 instructions where they fit in, and then further optimized for how each core is built.
For our purposes, we’re calculating Pi, as it is more compute bound than memory bound. In single thread mode we calculate 250 million digits, while in multithreaded mode we go for 2.5 billion digits. That 2.5 billion digit value requires ~12 GB of DRAM, and so is limited to systems with at least 16 GB.
NAMD 2.13 (ApoA1): Molecular Dynamics
One of the popular science fields is modeling the dynamics of proteins. By looking at how the energy of active sites within a large protein structure over time, scientists behind the research can calculate required activation energies for potential interactions. This becomes very important in drug discovery. Molecular dynamics also plays a large role in protein folding, and in understanding what happens when proteins misfold, and what can be done to prevent it. Two of the most popular molecular dynamics packages in use today are NAMD and GROMACS.
NAMD, or Nanoscale Molecular Dynamics, has already been used in extensive Coronavirus research on the Frontier supercomputer. Typical simulations using the package are measured in how many nanoseconds per day can be calculated with the given hardware, and the ApoA1 protein (92,224 atoms) has been the standard model for molecular dynamics simulation.
Luckily the compute can home in on a typical ‘nanoseconds-per-day’ rate after only 60 seconds of simulation, however we stretch that out to 10 minutes to take a more sustained value, as by that time most turbo limits should be surpassed. The simulation itself works with 2 femtosecond timesteps. We use version 2.13 as this was the recommended version at the time of integrating this benchmark into our suite. The latest nightly builds we’re aware have started to enable support for AVX-512, however due to consistency in our benchmark suite, we are retaining with 2.13. Other software that we test with has AVX-512 acceleration.
AI Benchmark 0.1.2 using TensorFlow: Link
Finding an appropriate artificial intelligence benchmark for Windows has been a holy grail of mine for quite a while. The problem is that AI is such a fast moving, fast paced word that whatever I compute this quarter will no longer be relevant in the next, and one of the key metrics in this benchmarking suite is being able to keep data over a long period of time. We’ve had AI benchmarks on smartphones for a while, given that smartphones are a better target for AI workloads, but it also makes some sense that everything on PC is geared towards Linux as well.
Thankfully however, the good folks over at ETH Zurich in Switzerland have converted their smartphone AI benchmark into something that’s useable in Windows. It uses TensorFlow, and for our benchmark purposes we’ve locked our testing down to TensorFlow 2.10, AI Benchmark 0.1.2, while using Python 3.7.6.
The benchmark runs through 19 different networks including MobileNet-V2, ResNet-V2, VGG-19 Super-Res, NVIDIA-SPADE, PSPNet, DeepLab, Pixel-RNN, and GNMT-Translation. All the tests probe both the inference and the training at various input sizes and batch sizes, except the translation that only does inference. It measures the time taken to do a given amount of work, and spits out a value at the end.
There is one big caveat for all of this, however. Speaking with the folks over at ETH, they use Intel’s Math Kernel Libraries (MKL) for Windows, and they’re seeing some incredible drawbacks. I was told that MKL for Windows doesn’t play well with multiple threads, and as a result any Windows results are going to perform a lot worse than Linux results. On top of that, after a given number of threads (~16), MKL kind of gives up and performance drops of quite substantially.
So why test it at all? Firstly, because we need an AI benchmark, and a bad one is still better than not having one at all. Secondly, if MKL on Windows is the problem, then by publicizing the test, it might just put a boot somewhere for MKL to get fixed. To that end, we’ll stay with the benchmark as long as it remains feasible.
As mentioned, AI-Benchmark is not really optimized at all for Windows, with some issues in the threading due to MKL. AMD seems to perform better here, but it's an interesting mix between chiplets and cores vs frequency.
CPU Tests: Simulation
Simulation and Science have a lot of overlap in the benchmarking world, however for this distinction we’re separating into two segments mostly based on the utility of the resulting data. The benchmarks that fall under Science have a distinct use for the data they output – in our Simulation section, these act more like synthetics but at some level are still trying to simulate a given environment.
DigiCortex v1.35: link
DigiCortex is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation, similar to a small slug.
The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.
The software originally shipped with a benchmark that recorded the first few cycles and output a result. So while fast multi-threaded processors this made the benchmark last less than a few seconds, slow dual-core processors could be running for almost an hour. There is also the issue of DigiCortex starting with a base neuron/synapse map in ‘off mode’, giving a high result in the first few cycles as none of the nodes are currently active. We found that the performance settles down into a steady state after a while (when the model is actively in use), so we asked the author to allow for a ‘warm-up’ phase and for the benchmark to be the average over a second sample time.
For our test, we give the benchmark 20000 cycles to warm up and then take the data over the next 10000 cycles seconds for the test – on a modern processor this takes 30 seconds and 150 seconds respectively. This is then repeated a minimum of 10 times, with the first three results rejected. Results are shown as a multiple of real-time calculation.
The wide variation on AMD seems to prefer high-core-count single chiplet processors. Intel is taking a back seat here, as it is also using slower memory.
Dwarf Fortress 0.44.12: Link
Another long standing request for our benchmark suite has been Dwarf Fortress, a popular management/roguelike indie video game, first launched in 2006 and still being regularly updated today, aiming for a Steam launch sometime in the future.
Emulating the ASCII interfaces of old, this title is a rather complex beast, which can generate environments subject to millennia of rule, famous faces, peasants, and key historical figures and events. The further you get into the game, depending on the size of the world, the slower it becomes as it has to simulate more famous people, more world events, and the natural way that humanoid creatures take over an environment. Like some kind of virus.
For our test we’re using DFMark. DFMark is a benchmark built by vorsgren on the Bay12Forums that gives two different modes built on DFHack: world generation and embark. These tests can be configured, but range anywhere from 3 minutes to several hours. After analyzing the test, we ended up going for three different world generation sizes:
- Small, a 65x65 world with 250 years, 10 civilizations and 4 megabeasts
- Medium, a 127x127 world with 550 years, 10 civilizations and 4 megabeasts
- Large, a 257x257 world with 550 years, 40 civilizations and 10 megabeasts
DFMark outputs the time to run any given test, so this is what we use for the output. We loop the small test for as many times possible in 10 minutes, the medium test for as many times in 30 minutes, and the large test for as many times in an hour.
Dolphin v5.0 Emulation: Link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in seconds, where the Wii itself scores 1051 seconds.
CPU Tests: Rendering
Rendering tests, compared to others, are often a little more simple to digest and automate. All the tests put out some sort of score or time, usually in an obtainable way that makes it fairly easy to extract. These tests are some of the most strenuous in our list, due to the highly threaded nature of rendering and ray-tracing, and can draw a lot of power. If a system is not properly configured to deal with the thermal requirements of the processor, the rendering benchmarks is where it would show most easily as the frequency drops over a sustained period of time. Most benchmarks in this case are re-run several times, and the key to this is having an appropriate idle/wait time between benchmarks to allow for temperatures to normalize from the last test.
Blender 2.83 LTS: Link
One of the popular tools for rendering is Blender, with it being a public open source project that anyone in the animation industry can get involved in. This extends to conferences, use in films and VR, with a dedicated Blender Institute, and everything you might expect from a professional software package (except perhaps a professional grade support package). With it being open-source, studios can customize it in as many ways as they need to get the results they require. It ends up being a big optimization target for both Intel and AMD in this regard.
For benchmarking purposes, we fell back to one rendering a frame from a detailed project. Most reviews, as we have done in the past, focus on one of the classic Blender renders, known as BMW_27. It can take anywhere from a few minutes to almost an hour on a regular system. However now that Blender has moved onto a Long Term Support model (LTS) with the latest 2.83 release, we decided to go for something different.
We use this scene, called PartyTug at 6AM by Ian Hubert, which is the official image of Blender 2.83. It is 44.3 MB in size, and uses some of the more modern compute properties of Blender. As it is more complex than the BMW scene, but uses different aspects of the compute model, time to process is roughly similar to before. We loop the scene for at least 10 minutes, taking the average time of the completions taken. Blender offers a command-line tool for batch commands, and we redirect the output into a text file.
Corona 1.3: Link
Corona is billed as a popular high-performance photorealistic rendering engine for 3ds Max, with development for Cinema 4D support as well. In order to promote the software, the developers produced a downloadable benchmark on the 1.3 version of the software, with a ray-traced scene involving a military vehicle and a lot of foliage. The software does multiple passes, calculating the scene, geometry, preconditioning and rendering, with performance measured in the time to finish the benchmark (the official metric used on their website) or in rays per second (the metric we use to offer a more linear scale).
The standard benchmark provided by Corona is interface driven: the scene is calculated and displayed in front of the user, with the ability to upload the result to their online database. We got in contact with the developers, who provided us with a non-interface version that allowed for command-line entry and retrieval of the results very easily. We loop around the benchmark five times, waiting 60 seconds between each, and taking an overall average. The time to run this benchmark can be around 10 minutes on a Core i9, up to over an hour on a quad-core 2014 AMD processor or dual-core Pentium.
Crysis CPU-Only Gameplay
One of the most oft used memes in computer gaming is ‘Can It Run Crysis?’. The original 2007 game, built in the Crytek engine by Crytek, was heralded as a computationally complex title for the hardware at the time and several years after, suggesting that a user needed graphics hardware from the future in order to run it. Fast forward over a decade, and the game runs fairly easily on modern GPUs.
But can we also apply the same concept to pure CPU rendering? Can a CPU, on its own, render Crysis? Since 64 core processors entered the market, one can dream. So we built a benchmark to see whether the hardware can.
For this test, we’re running Crysis’ own GPU benchmark, but in CPU render mode. This is a 2000 frame test, with medium and low settings.
POV-Ray 3.7.1: Link
A long time benchmark staple, POV-Ray is another rendering program that is well known to load up every single thread in a system, regardless of cache and memory levels. After a long period of POV-Ray 3.7 being the latest official release, when AMD launched Ryzen the POV-Ray codebase suddenly saw a range of activity from both AMD and Intel, knowing that the software (with the built-in benchmark) would be an optimization tool for the hardware.
We had to stick a flag in the sand when it came to selecting the version that was fair to both AMD and Intel, and still relevant to end-users. Version 3.7.1 fixes a significant bug in the early 2017 code that was advised against in both Intel and AMD manuals regarding to write-after-read, leading to a nice performance boost.
The benchmark can take over 20 minutes on a slow system with few cores, or around a minute or two on a fast system, or seconds with a dual high-core count EPYC. Because POV-Ray draws a large amount of power and current, it is important to make sure the cooling is sufficient here and the system stays in its high-power state. Using a motherboard with a poor power-delivery and low airflow could create an issue that won’t be obvious in some CPU positioning if the power limit only causes a 100 MHz drop as it changes P-states.
V-Ray: Link
We have a couple of renderers and ray tracers in our suite already, however V-Ray’s benchmark came through for a requested benchmark enough for us to roll it into our suite. Built by ChaosGroup, V-Ray is a 3D rendering package compatible with a number of popular commercial imaging applications, such as 3ds Max, Maya, Undreal, Cinema 4D, and Blender.
We run the standard standalone benchmark application, but in an automated fashion to pull out the result in the form of kilosamples/second. We run the test six times and take an average of the valid results.
Cinebench R20: Link
Another common stable of a benchmark suite is Cinebench. Based on Cinema4D, Cinebench is a purpose built benchmark machine that renders a scene with both single and multi-threaded options. The scene is identical in both cases. The R20 version means that it targets Cinema 4D R20, a slightly older version of the software which is currently on version R21. Cinebench R20 was launched given that the R15 version had been out a long time, and despite the difference between the benchmark and the latest version of the software on which it is based, Cinebench results are often quoted a lot in marketing materials.
Results for Cinebench R20 are not comparable to R15 or older, because both the scene being used is different, but also the updates in the code bath. The results are output as a score from the software, which is directly proportional to the time taken. Using the benchmark flags for single CPU and multi-CPU workloads, we run the software from the command line which opens the test, runs it, and dumps the result into the console which is redirected to a text file. The test is repeated for a minimum of 10 minutes for both ST and MT, and then the runs averaged.
CPU Tests: Encoding
One of the interesting elements on modern processors is encoding performance. This covers two main areas: encryption/decryption for secure data transfer, and video transcoding from one video format to another.
In the encrypt/decrypt scenario, how data is transferred and by what mechanism is pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security.
Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.
HandBrake 1.32: Link
Video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codecs, VP9 and AV1, there are others that are prominent: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H.265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content. There are other codecs coming to market designed for specific use cases all the time.
Handbrake is a favored tool for transcoding, with the later versions using copious amounts of newer APIs to take advantage of co-processors, like GPUs. It is available on Windows via an interface or can be accessed through the command-line, with the latter making our testing easier, with a redirection operator for the console output.
We take the compiled version of this 16-minute YouTube video about Russian CPUs at 1080p30 h264 and convert into three different files: (1) 480p30 ‘Discord’, (2) 720p30 ‘YouTube’, and (3) 4K60 HEVC.
7-Zip 1900: Link
The first compression benchmark tool we use is the open-source 7-zip, which typically offers good scaling across multiple cores. 7-zip is the compression tool most cited by readers as one they would rather see benchmarks on, and the program includes a built-in benchmark tool for both compression and decompression.
The tool can either be run from inside the software or through the command line. We take the latter route as it is easier to automate, obtain results, and put through our process. The command line flags available offer an option for repeated runs, and the output provides the average automatically through the console. We direct this output into a text file and regex the required values for compression, decompression, and a combined score.
AES Encoding
Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.
WinRAR 5.90: Link
For the 2020 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack
- 33 video files , each 30 seconds, in 1.37 GB,
- 2834 smaller website files in 370 folders in 150 MB,
- 100 Beat Saber music tracks and input files, for 451 MB
This is a mixture of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test for 20 minutes times and take the average of the last five runs when the benchmark is in a steady state.
For automation, we use AHK’s internal timing tools from initiating the workload until the window closes signifying the end. This means the results are contained within AHK, with an average of the last 5 results being easy enough to calculate.
CPU Tests: Legacy and Web
In order to gather data to compare with older benchmarks, we are still keeping a number of tests under our ‘legacy’ section. This includes all the former major versions of CineBench (R15, R11.5, R10) as well as x264 HD 3.0 and the first very naïve version of 3DPM v2.1. We won’t be transferring the data over from the old testing into Bench, otherwise it would be populated with 200 CPUs with only one data point, so it will fill up as we test more CPUs like the others.
The other section here is our web tests.
Web Tests: Kraken, Octane, and Speedometer
Benchmarking using web tools is always a bit difficult. Browsers change almost daily, and the way the web is used changes even quicker. While there is some scope for advanced computational based benchmarks, most users care about responsiveness, which requires a strong back-end to work quickly to provide on the front-end. The benchmarks we chose for our web tests are essentially industry standards – at least once upon a time.
It should be noted that for each test, the browser is closed and re-opened a new with a fresh cache. We use a fixed Chromium version for our tests with the update capabilities removed to ensure consistency.
Mozilla Kraken 1.1
Kraken is a 2010 benchmark from Mozilla and does a series of JavaScript tests. These tests are a little more involved than previous tests, looking at artificial intelligence, audio manipulation, image manipulation, json parsing, and cryptographic functions. The benchmark starts with an initial download of data for the audio and imaging, and then runs through 10 times giving a timed result.
We loop through the 10-run test four times (so that’s a total of 40 runs), and average the four end-results. The result is given as time to complete the test, and we’re reaching a slow asymptotic limit with regards the highest IPC processors.
Google Octane 2.0
Our second test is also JavaScript based, but uses a lot more variation of newer JS techniques, such as object-oriented programming, kernel simulation, object creation/destruction, garbage collection, array manipulations, compiler latency and code execution.
Octane was developed after the discontinuation of other tests, with the goal of being more web-like than previous tests. It has been a popular benchmark, making it an obvious target for optimizations in the JavaScript engines. Ultimately it was retired in early 2017 due to this, although it is still widely used as a tool to determine general CPU performance in a number of web tasks.
Speedometer 2: JavaScript Frameworks
Our newest web test is Speedometer 2, which is a test over a series of JavaScript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.
Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics.
We repeat over the benchmark for a dozen loops, taking the average of the last five.
Legacy Tests
CPU Tests: Synthetic
Most of the people in our industry have a love/hate relationship when it comes to synthetic tests. On the one hand, they’re often good for quick summaries of performance and are easy to use, but most of the time the tests aren’t related to any real software. Synthetic tests are often very good at burrowing down to a specific set of instructions and maximizing the performance out of those. Due to requests from a number of our readers, we have the following synthetic tests.
Linux OpenSSL Speed: SHA256
One of our readers reached out in early 2020 and stated that he was interested in looking at OpenSSL hashing rates in Linux. Luckily OpenSSL in Linux has a function called ‘speed’ that allows the user to determine how fast the system is for any given hashing algorithm, as well as signing and verifying messages.
OpenSSL offers a lot of algorithms to choose from, and based on a quick Twitter poll, we narrowed it down to the following:
- rsa2048 sign and rsa2048 verify
- sha256 at 8K block size
- md5 at 8K block size
For each of these tests, we run them in single thread and multithreaded mode. All the graphs are in our benchmark database, Bench, and we use the sha256 results in published reviews.
GeekBench 5: Link
As a common tool for cross-platform testing between mobile, PC, and Mac, GeekBench is an ultimate exercise in synthetic testing across a range of algorithms looking for peak throughput. Tests include encryption, compression, fast Fourier transform, memory operations, n-body physics, matrix operations, histogram manipulation, and HTML parsing.
I’m including this test due to popular demand, although the results do come across as overly synthetic, and a lot of users often put a lot of weight behind the test due to the fact that it is compiled across different platforms (although with different compilers).
We have both GB5 and GB4 results in our benchmark database. GB5 was introduced to our test suite after already having tested ~25 CPUs, and so the results are a little sporadic by comparison. These spots will be filled in when we retest any of the CPUs.
CPU Tests: SPEC
SPEC2017 and SPEC2006 is a series of standardized tests used to probe the overall performance between different systems, different architectures, different microarchitectures, and setups. The code has to be compiled, and then the results can be submitted to an online database for comparison. It covers a range of integer and floating point workloads, and can be very optimized for each CPU, so it is important to check how the benchmarks are being compiled and run.
We run the tests in a harness built through Windows Subsystem for Linux, developed by our own Andrei Frumusanu. WSL has some odd quirks, with one test not running due to a WSL fixed stack size, but for like-for-like testing is good enough. SPEC2006 is deprecated in favor of 2017, but remains an interesting comparison point in our data. Because our scores aren’t official submissions, as per SPEC guidelines we have to declare them as internal estimates from our part.
For compilers, we use LLVM both for C/C++ and Fortan tests, and for Fortran we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.
clang version 10.0.0
clang version 7.0.1 (ssh://[email protected]/flang-compiler/flang-driver.git
24bd54da5c41af04838bbe7b68f830840d47fc03)
-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2
-mfma -mavx -mavx2
Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions. We decided to build our SPEC binaries on AVX2, which puts a limit on Haswell as how old we can go before the testing will fall over. This also means we don’t have AVX512 binaries, primarily because in order to get the best performance, the AVX-512 intrinsic should be packed by a proper expert, as with our AVX-512 benchmark. All of the major vendors, AMD, Intel, and Arm, all support the way in which we are testing SPEC.
To note, the requirements for the SPEC licence state that any benchmark results from SPEC have to be labelled ‘estimated’ until they are verified on the SPEC website as a meaningful representation of the expected performance. This is most often done by the big companies and OEMs to showcase performance to customers, however is quite over the top for what we do as reviewers.
For each of the SPEC targets we are doing, SPEC2006 rate-1, SPEC2017 speed-1, and SPEC2017 speed-N, rather than publish all the separate test data in our reviews, we are going to condense it down into a few interesting data points. The full per-test values are in our benchmark database.
Gaming Tests: Chernobylite
Despite the advent of recent TV shows like Chernobyl, recreating the situation revolving around the 1986 Chernobyl nuclear disaster, the concept of nuclear fallout and the town of Pripyat have been popular settings for a number of games – mostly first person shooters. Chernobylite is an indie title that plays on a science-fiction survival horror experience and uses a 3D-scanned recreation of the real Chernobyl Exclusion Zone. It involves challenging combat, a mix of free exploration with crafting and non-linear story telling. While still in early access, it is already picking up plenty of awards.
I picked up Chernobylite while still in early access, and was impressed by its in-game benchmark, showcasing complex building structure with plenty of trees and structures where aliasing becomes important. The in-game benchmark is an on-rails experience through the scenery, covering both indoor and outdoor scenes – it ends up being very CPU limited in the way it is designed. We have taken an offline version of Chernobylite to use in our tests, and we are testing the following settings combinations:
- 360p Low, 1440p Low, 4K Low, 1080p Max
We do as many runs within 10 minutes per resolution/setting combination, and then take averages.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Civilization 6
Originally penned by Sid Meier and his team, the Civilization series of turn-based strategy games are a cult classic, and many an excuse for an all-nighter trying to get Gandhi to declare war on you due to an integer underflow. Truth be told I never actually played the first version, but I have played every edition from the second to the sixth, including the fourth as voiced by the late Leonard Nimoy, and it a game that is easy to pick up, but hard to master.
Benchmarking Civilization has always been somewhat of an oxymoron – for a turn based strategy game, the frame rate is not necessarily the important thing here and even in the right mood, something as low as 5 frames per second can be enough. With Civilization 6 however, Firaxis went hardcore on visual fidelity, trying to pull you into the game. As a result, Civilization can taxing on graphics and CPUs as we crank up the details, especially in DirectX 12.
For this benchmark, we are using the following settings:
- 480p Low, 1440p Low, 4K Low, 1080p Max
For automation, Firaxis supports the in-game automated benchmark from the command line, and output a results file with frame times. We do as many runs within 10 minutes per resolution/setting combination, and then take averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Deus Ex Mankind Divided
Deus Ex is a franchise with a wide level of popularity. Despite the Deus Ex: Mankind Divided (DEMD) version being released in 2016, it has often been heralded as a game that taxes the CPU. It uses the Dawn Engine to create a very complex first-person action game with science-fiction based weapons and interfaces. The game combines first-person, stealth, and role-playing elements, with the game set in Prague, dealing with themes of transhumanism, conspiracy theories, and a cyberpunk future. The game allows the player to select their own path (stealth, gun-toting maniac) and offers multiple solutions to its puzzles.
DEMD has an in-game benchmark, an on-rails look around an environment showcasing some of the game’s most stunning effects, such as lighting, texturing, and others. Even in 2020, it’s still an impressive graphical showcase when everything is jumped up to the max. For this title, we are testing the following resolutions:
- 600p Low, 1440p Low, 4K Low, 1080p Max
The benchmark runs for about 90 seconds. We do as many runs within 10 minutes per resolution/setting combination, and then take averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Final Fantasy XIV
Despite being one number less than Final Fantasy 15, because FF14 is a massively-multiplayer online title, there are always yearly update packages which give the opportunity for graphical updates too. In 2019, FFXIV launched its Shadowbringers expansion, and an official standalone benchmark was released at the same time for users to understand what level of performance they could expect. Much like the FF15 benchmark we’ve been using for a while, this test is a long 7-minute scene of simulated gameplay within the title. There are a number of interesting graphical features, and it certainly looks more like a 2019 title than a 2010 release, which is when FF14 first came out.
With this being a standalone benchmark, we do not have to worry about updates, and the idea for these sort of tests for end-users is to keep the code base consistent. For our testing suite, we are using the following settings:
- 768p Minimum, 1440p Minimum, 4K Minimum, 1080p Maximum
As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.
All of our benchmark results can also be found in our benchmark engine, Bench.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS |
Gaming Tests: Final Fantasy XV
Upon arriving to PC, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console. As a fantasy RPG with a long history, the fruits of Square-Enix’s successful partnership with NVIDIA are on display. The game uses the internal Luminous Engine, and as with other Final Fantasy games, pushes the imagination of what we can do with the hardware underneath us. To that end, FFXV was one of the first games to promote the use of ‘video game landscape photography’, due in part to the extensive detail even at long range but also with the integration of NVIDIA’s Ansel software, that allowed for super-resolution imagery and post-processing effects to be applied.
In preparation for the launch of the game, Square Enix opted to release a standalone benchmark. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues. We use the following settings:
- 720p Standard, 1080p Standard, 4K Standard, 8K Standard
For automation, the title accepts command line inputs for both resolution and settings, and then auto-quits when finished. As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed, and then take averages. Realistically, because of the length of this test, this equates to two runs per setting.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: World of Tanks
Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved eSports status when it debuted at the World Cyber Games back in 2012.
World of Tanks enCore is a demo application for its new graphics engine penned by the Wargaming development team. Over time the new core engine has been implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine runs optimally on their system. There is technically a Ray Tracing version of the enCore benchmark now available, however because it can’t be deployed standalone without the installer, we decided against using it. If that gets fixed, then we can look into it.
The benchmark tool comes with a number of presets:
- 768p Minimum, 1080p Standard, 1080p Max, 4K Max (not a preset)
The odd one out is the 4K Max preset, because the benchmark doesn’t automatically have a 4K option – to get this we edit the acceptable resolutions ini file, and then we can select 4K. The benchmark outputs its own results file, with frame times, making it very easy to parse the data needed for average and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench
Gaming Tests: Borderlands 3
As a big Borderlands fan, having to sit and wait six months for the EPIC Store exclusive to expire before we saw it on Steam felt like a long time to wait. The fourth title of the franchise, if you exclude the TellTale style-games, BL3 expands the universe beyond Pandora and its orbit, with the set of heroes (plus those from previous games) now cruising the galaxy looking for vaults and the treasures within. Popular Characters like Tiny Tina, Claptrap, Lilith, Dr. Zed, Zer0, Tannis, and others all make appearances as the game continues its cel-shaded design but with the graphical fidelity turned up. Borderlands 1 gave me my first ever taste of proper in-game second order PhysX, and it’s a high standard that continues to this day.
BL3 works best with online access, so it is filed under our online games section. BL3 is also one of our biggest downloads, requiring 100+ GB. As BL3 supports resolution scaling, we are using the following settings:
- 360p Very Low, 1440p Very Low, 4K Very Low, 1080p Badass
BL3 has its own in-game benchmark, which recreates a set of on-rails scenes with a variety of activity going on in each, such as shootouts, explosions, and wildlife. The benchmark outputs its own results files, including frame times, which can be parsed for our averages/percentile data.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: F1 2019
The F1 racing games from Codemasters have been popular benchmarks in the tech community, mostly for ease-of-use and that they seem to take advantage of any area of a machine that might be better than another. The 2019 edition of the game features all 21 circuits on the calendar for that year, and includes a range of retro models and DLC focusing on the careers of Alain Prost and Ayrton Senna. Built on the EGO Engine 3.0, the game has been criticized similarly to most annual sports games, by not offering enough season-to-season graphical fidelity updates to make investing in the latest title worth it, however the 2019 edition revamps up the Career mode, with features such as in-season driver swaps coming into the mix. The quality of the graphics this time around is also superb, even at 4K low or 1080p Ultra.
For our test, we put Alex Albon in the Red Bull in position #20, for a dry two-lap race around Austin. We test at the following settings:
- 768p Ultra Low, 1440p Ultra Low, 4K Ultra Low, 1080p Ultra
In terms of automation, F1 2019 has an in-game benchmark that can be called from the command line, and the output file has frame times. We repeat each resolution setting for a minimum of 10 minutes, taking the averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Far Cry 5
The fifth title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration with a lot of configurability.
Unfortunately, the game doesn’t like us changing the resolution in the results file when using certain monitors, resorting to 1080p but keeping the quality settings. But resolution scaling does work, so we decided to fix the resolution at 1080p and use a variety of different scaling factors to give the following:
- 720p Low, 1440p Low, 4K Low, 1440p Max.
Far Cry 5 outputs a results file here, but that the file is a HTML file, which showcases a graph of the FPS detected. At no point in the HTML file does it contain the frame times for each frame, but it does show the frames per second, as a value once per second in the graph. The graph in HTML form is a series of (x,y) co-ordinates scaled to the min/max of the graph, rather than the raw (second, FPS) data, and so using regex I carefully tease out the values of the graph, convert them into a (second, FPS) format, and take our values of averages and percentiles that way.
If anyone from Ubisoft wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to [email protected]. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).
As with the other gaming tests, we run each resolution/setting combination for a minimum of 10 minutes and take the relevant frame data for averages and percentiles.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Gears Tactics
Remembering the original Gears of War brings back a number of memories – some good, and some involving online gameplay. The latest iteration of the franchise was launched as I was putting this benchmark suite together, and Gears Tactics is a high-fidelity turn-based strategy game with an extensive single player mode. As with a lot of turn-based games, there is ample opportunity to crank up the visual effects, and here the developers have put a lot of effort into creating effects, a number of which seem to be CPU limited.
Gears Tactics has an in-game benchmark, roughly 2.5 minutes of AI gameplay starting from the same position but using a random seed for actions. Much like the racing games, this usually leads to some variation in the run-to-run data, so for this benchmark we are taking the geometric mean of the results. One of the biggest things that Gears Tactics can do is on the resolution scaling, supporting 8K, and so we are testing the following settings:
- 720p Low, 4K Low, 8K Low, 1080p Ultra
For results, the game showcases a mountain of data when the benchmark is finished, such as how much the benchmark was CPU limited and where, however none of that is ever exported into a file we can use. It’s just a screenshot which we have to read manually.
If anyone from the Gears Tactics team wants to chat about building a benchmark platform that would not only help me but also every other member of the tech press build our benchmark testing platform to help our readers decide what is the best hardware to use on your games, please reach out to [email protected]. Some of the suggestions I want to give you will take less than half a day and it’s easily free advertising to use the benchmark over the next couple of years (or more).
As with the other benchmarks, we do as many runs until 10 minutes per resolution/setting combination has passed. For this benchmark, we manually read each of the screenshots for each quality/setting/run combination. The benchmark does also give 95th percentiles and frame averages, so we can use both of these data points.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Grand Theft Auto V
The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA to help optimize the title. At this point GTA V is super old, but still super useful as a benchmark – it is a complicated test with many features that modern titles today still struggle with. With rumors of a GTA 6 on the horizon, I hope Rockstar make that benchmark as easy to use as this one is.
GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
We are using the following settings:
- 720p Low, 1440p Low, 4K Low, 1080p Max
The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data. The benchmark can also be called from the command line, making it very easy to use.
There is one funny caveat with GTA. If the CPU is too slow, or has too few cores, the benchmark loads, but it doesn’t have enough time to put items in the correct position. As a result, for example when running our single core Sandy Bridge system, the jet ends up stuck at the middle of an intersection causing a traffic jam. Unfortunately this means the benchmark never ends, but still amusing.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Red Dead Redemption 2
It’s great to have another Rockstar benchmark in the mix, and the launch of Red Dead Redemption 2 (RDR2) on the PC gives us a chance to do that. Building on the success of the original RDR, the second incarnation came to Steam in December 2019 having been released on consoles first. The PC version takes the open-world cowboy genre into the start of the modern age, with a wide array of impressive graphics and features that are eerily close to reality.
For RDR2, Rockstar kept the same benchmark philosophy as with Grand Theft Auto V, with the benchmark consisting of several cut scenes with different weather and lighting effects, with a final scene focusing on an on-rails environment, only this time with mugging a shop leading to a shootout on horseback before riding over a bridge into the great unknown. Luckily most of the command line options from GTA V are present here, and the game also supports resolution scaling. We have the following tests:
- 384p Minimum, 1440p Minimum, 8K Minimum, 1080p Max
For that 8K setting, I originally thought I had the settings file at 4K and 1.0x scaling, but it was actually set at 2.0x giving that 8K. For the sake of it, I decided to keep the 8K settings.
For our results, we run through each resolution and setting configuration for a minimum of 10 minutes, before averaging and parsing the frame time data.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Gaming Tests: Strange Brigade
Strange Brigade is based in 1903’s Egypt, and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen, who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.
The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark as an on-rails experience through the game. For quality, the game offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. Strange Brigade supports Vulkan and DX12, and so we test on both.
- 720p Low, 1440p Low, 4K Low, 1080p Ultra
The automation for Strange Brigade is one of the easiest in our suite – the settings and quality can be changed by pre-prepared .ini files, and the benchmark is called via the command line. The output includes all the frame time data.
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
AnandTech | Low Resolution Low Quality |
Medium Resolution Low Quality |
High Resolution Low Quality |
Medium Resolution Max Quality |
Average FPS | ||||
95th Percentile |
All of our benchmark results can also be found in our benchmark engine, Bench.
Conclusion: Bin to Win
Ever since we taught fancy rocks to think, despite the decades of research and billions of dollars that go into creating grand pyramid-scale structures at the nanoscale, it still is very much an imperfect process. The tiny nibs of shiny silicon that come out, even if they are made with the same design masks, will vary in peak performance, power, and potential.
There are levers and switches that both the designer and the manufacturer can use to adjust and move the variability of the processor quality to a more favorable outcome. Before these become products for end users, the processor maker has to decide where it draws the lines in production variability. Those lines have to take into account how many of a given processor will be produced, at what power, at what frequency, and at the end of it, cost and expected longevity.
Those lines in production variability give companies like Intel an opportunity to build its product stack to focus on different markets. A good processor that’s easy to make, for example, might fall within a 98% line and be really easy to do. As a company gets more aggressive with its design, and yield, we start looking at drawing lines where only 10000-in-a-million (1%) hit that target, or even fewer than that.
The current line of Comet Lake processors features two silicon designs. There’s a 10-core variant, which supplies all the retail parts offered at 6 cores (Core i5-K), 8 cores (Core i7), and 10 cores (Core i9). The other is a 6-core variant for everything else Core i3 and below, as well as some of the 6 core parts. Out of that 10-core variant, Intel sells 12-19 mainstream Core processors, 7 Xeon W-1200 processors, and an unknown number of embedded products.
Out of these, how many of the top Core i9-K does Intel expect to sell, and where does the line need to be drawn in the current design variability to meet that target with the wafer production expected? So the question becomes, where is the line drawn for something like Intel’s consumer flagship processor?
At this point someone like Intel has two choices.
They could draw the line at this exact intersection, regardless of performance. The performance and power figures would fall where they are, which in turn would affect the marketing strategy. The issue here is that the marketing strategy in-of-itself would directly affect how many of that product Intel tends to sell. Past sales performance is no guarantee of future success, and so this has to be managed.
The other choice is to draw the line at a point more aggressive, where Intel know it won’t be able to meet demand, but it will be able to leverage the increased performance processor in its marketing strategy and keep their premium product feeling premium. The problem here is if that line is drawn too aggressive – even those launch day performance figures look good, interest in the product will likely diminish if people can’t get hold of it, even with the higher performance level. This is important if system integrators that build machines directly to end-users can’t offer the flagship processor in their best systems.
Ultimately, this is what I think happened to Intel with the Core i9-10900K. The silicon quality level required to manufacturer the hardware was strict to provide a higher performance product, but too strict to be able to manufacturer a sufficient quantity to meet demand, especially for system integrators that rely on a steady source of good performance products. For all the plaudits Intel has received for eking out the 14nm process, the line for the 10900K was drawn too far, and the company wasn’t able to meet its own goals.
Thus entered the Core i9-10850K. A slightly less aggressive product, offered at a cheaper price, and because of the less aggressive bin, available in sufficient quantities to keep system builders and end-users happy if the 10900K was not in stock locally. In order for Intel to keep up volume of expected high-end Core i9 system sales, they had to re-bin to win.
Core i9-10850K Performance
Going through our benchmark tests, the performance differential between the Core i9-10850K and the Core i9-10900K is almost zero, so there’s nothing much that’s going to separate our conclusion of either chip.
In our CPU benchmark tests, the Core i9-10850K either matched the higher clocked part or was ever so slightly behind, often within error margins, but sometimes within 1-2%. In our CPU gaming tests, is was more of a mixed bag, with the 10900K taking advantage in CPU heavy tests, but the 10850K also getting a slight lead now and again.
The point at which the two processors mostly differ is on power and thermals. The Core i9-10850K is a less strict bin of the silicon, and this is showcased very much in a couple of metrics. If you get over the fact that both processors are going north of 250 W at full load, our Core i9-10850K was drawing 15-20W more peak power, which is 6-8% higher, despite being 100 MHz slower. This manifested moreso in the processor thermals, where we were easily going north of 100ºC on this newer processor.
It’s easy to get freaked out by a triple digit number, especially given that I was testing on an open test bed with a chunky copper cooler. At this stage it’s more about thermal gradients inside the processor and how easily the thermals can move – so while users will still need something sufficient to migrate the extra thermal energy, it isn’t as bad as it sounds. We saw no obvious example where the 10850K was hitting thermal throttling in our testing. It might mean that home users might want to make their annual PC dust removal and checkups a bit more often though.
Which to buy is a tough question. In my mind, if your heart is set on these two processors, at a MSRP $35 difference, I’d get the 10900K, just for the slightly better performing silicon, even if the performance isn’t going to be that different. But the stock levels are so varied for the 10900K that the difference in price, depending on location, has been $200+, making it less than viable.
The other alternative is to look at AMD, assuming there are AMD Ryzen 5000 processors in stock as well. At ~$450, the direct competitor is the Ryzen 7 5800X, which is eight cores and a 4.7 GHz turbo. With the Ryzen 7 5800X, there’s no worrying about excessive power or thermals, which in of itself is perhaps peace of mind.
On performance against AMD, the 5800X wins on single threaded loads by 15-20% and encoding, while the 10850K wins on rendering multithreaded workloads like Blender by up to 10%. For 1080p maximum gaming with our RTX 2080 Ti, the differences in the most modern titles are minor at best, even with 5% lows. Certain titles will lean up to 5-8% in one direction (FF14 to AMD, F1 2019 to Intel, Civ6 to AMD). If I had to choose between the Intel and AMD, I’d have to recommend the AMD, but I'm the sort of person who temperature watches when I'm doing a heavy workload.