Original Link: https://www.anandtech.com/show/14979/the-intel-core-i9-9900ks-review



Intel likes 5.0 GHz processors. The one area where it claims a clear advantage over AMD is in its ability to drive the frequency of its popular 14nm process. Earlier this week, we reviewed the Core i9-9990XE, which is a rare auction only CPU but with 14 cores at 5.0 GHz, built for the high-end desktop and high frequency trading market. Today we are looking at its smaller sibling, the Core i9-9900KS, built in numbers for the consumer market: eight cores at 5.0 GHz. But you’ll have to be quick, as Intel isn’t keeping this one around forever.

The Battle of the Bits

Every time a new processor comes to market, several questions get asked: how many cores, how fast, how much power? We’ve come through generations of promises of many GHz and many cores for little power, but right now we have an intense battle on our hands. The red team is taking advantage of a paradigm shift in computing with an advanced process node to offer many cores at a high power efficiency as well as at a good frequency. In the other corner is team blue, which has just equipped its arsenal by taking advantage of its most aggressive binning of 14nm yet, with the highest frequency processor for the consumer market, enabled across all eight cores and to hell with the power. Intel’s argument here is fairly simple:

Do you want good all-around, or do you want the one with the fastest raw speed?

The Intel Core i9-9900KS is borne from the battle. In essence it looks like an overclocked Core i9-9900K, however by that logic everything is an overclocked version of something else. In order for Intel to give a piece of silicon off the manufacturing like the name of a Core i9-9900KS rather than a Core i9-9900K requires additional binning and validation, to the extent where it has taken several months from announcement just for Intel to be happy that they have enough chips for demand that will meet the warranty standards.

At the time Intel launched its 9th Generation Core desktop processors, like the Core i9-9900K, I perhaps would not have expected them to launch something like the Core i9-9900KS. It’s a big step up in the binning, and I’d be surprised if Intel gets one chip per wafer that hits this designation. Intel announced the Core i9-9900KS after AMD had launched its Zen 2 Ryzen 3000 family, offering 12 cores with an all core turbo around 4.2 GHz and a +10% IPC advantage over Intel’s Skylake microarchitecture (and derivatives) for a lower price per core. In essence, Intel’s Core i9-9900K consumer flagship processor had a chip that was pretty close to it in performance with several more cores.

Intel is pushing the Core i9-9900KS as the ultimate consumer processor. With eight cores all running at 5.0 GHz, it is promising fast response and clock rates without any slowdown. Intel has many marketing arguments as to why the KS is the best processor on the market, especially when it comes to gaming: having a 5.0 GHz frequency keeps it top of the pile for gaming where frequency matters (low resolution), and many games don’t scale beyond four cores, let alone eight, and so the extra cores on the competition don’t really help here. It will be interesting to see where the 9900KS comes out in standard workload tests however, where cores can matter.

Intel’s 9th Generation Core Processors

The Intel Core i9-9900KS now sits atop of Intel’s consumer product portfolio. The processor is the same 8-core die as the 9900K, unlocked with UHD 620 integrated graphics, but has a turbo of 5.0 GHz. All cores can turbo to 5.0 GHz. The length of the turbo will be motherboard dependent, however.

Intel 9th Gen Core 8-Core Desktop CPUs
AnandTech Cores Base
Freq
All-Core Turbo Single
Core Turbo
Freq
IGP DDR4 TDP Price
(1ku)
i9-9900KS 8 / 16 4.0 GHz 5.0 GHz 5.0 GHz UHD 630 2666 127 W $513
i9-9900K 8 / 16 3.6 GHz 4.7 GHz 5.0 GHz UHD 630 2666 95 W $488
i9-9900KF 8 / 16 3.6 GHz 4.7 GHz 5.0 GHz - 2666 95 W $488
i7-9700K 8 / 8 3.6 GHz 4.6 GHz 4.9 GHz UHD 630 2666 95 W $374
i7-9700KF 8 / 8 3.6 GHz 4.6 GHz 4.9 GHz - 2666 95 W $374

The Core i9-9900KS has an tray price of $513 (when purchased in 1000 unit bulk), which means we’re likely to see an on-shelf price of $529-$549, depending on if it gets packaged in its dodecanal box that our review sample came in.

Compared to the Core i9-9900K or Core i9-9900KF, the Core i9-9900KS extends its 5.0 GHz all through from when 2 cores are active to 8 cores are active. There is still no Turbo Boost Max 3.0 here, which means that all cores are guaranteed to hit this 5.0 GHz number. The TDP is 127 W, which is the maximum power consumption of the processor at its base frequency, 4.0 GHz. Above 4.0 GHz Intel does not state what sort of power to expect. We have this testing further in the review.

Competition

At present, Intel is competing against two major angles with the Core i9-9900KS. On the one side, it already has the Core i9-9900K, which if a user gets a good enough sample, can be overclocked to emulate the 9900KS. Intel does not offer warranty on an overclocked CPU, so there is something to be taken into account – the warranty on the Core i9-9900KS is only a limited 1 year warranty, rather than the standard 3 years it offers to the majority of its other parts, which perhaps indicates the lengths it went to for binning these processors.

From AMD, the current 12-core Ryzen 9 3900X that is already in the market has become a popular processor for users going onto 7nm and PCIe 4.0. It offers more PCIe lanes from the CPU to take advantage of PCIe storage and such, and there are a wealth of motherboards on the market that can take advantage of this processor. It also has an MSRP around the same price, at $499, although is often being sold for much higher due to availability.

AMD also has the 16-core Ryzen 9 3950X coming around the corner, promising slightly more performance than the 3900X, and aside from the $749 MSRP, it’s going to be an unknown on availability until it gets released in November.

The Competition
Intel i9-9900KS Intel i9-9900K Anand
Tech
AMD
2920X
AMD
3950X
AMD
3900X
AMD
3800X
8 8 Cores 12 16 12 8
16 16 Threads 24 32 24 16
4.0 3.6 Base 3.5 3.5 3.8 3.9
8 x 5.0 2 x 5.0 Turbo 4.3 4.7 4.6 4.5
2 x 2666 2 x 2666 DDR4 4 x 2933 2 x 3200 2 x 3200 2 x 3200
3.0 x16 3.0 x16 PCIe 3.0 x64 4.0 x24 4.0 x24 4.0 x24
127 W 95 W TDP 180 W 105 W 105 W 105 W
$513 $486 Price $649 $749 $499 $399

It’s worth noting here that while Intel has committed to delivering ‘10nm class’ processors on the desktop in the future, it currently has made zero mention of exactly when this is going to happen. Offering a limited edition all-core 5.0 GHz part like the Core i9-9900KS into the market is a brave thing indeed – it will have to provide something similar or better when it gets around to producing 10nm processors for this market. We saw this once before, when Intel launched Devil’s Canyon: super binned parts that ultimately ended up being faster than those that followed on an optimized process, because the binning aspect ended up being a large factor. Intel either has extreme confidence in its 10nm process for the desktop family, or doesn’t know what to expect.

This Review

In our review, we’re going to cover the usual benchmarking scenarios for a processor like this, as well as examine Intel’s relationship with turbo and how much a motherboard manufacturer can affect the performance.



Test Bed and Setup

As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer. Where possible, we will extend out testing to include faster memory modules either at the same time as the review or a later date.

Test Setup
Intel 9th Gen Intel Core i9-9900KS
Motherboard MSI Z390 Gaming Edge AC (A.60 BIOS)
CPU Cooler TRUE Copper
DRAM Corsair Vengeance 2x8 GB DDR4-2666
GPU Sapphire RX 460 2GB (CPU Tests)
MSI GTX 1080 Gaming 8G (Gaming Tests)
PSU Corsair AX860i
SSD Crucial MX200 1TB

Many thanks to...

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Hardware Providers
Sapphire RX 460 Nitro MSI GTX 1080 Gaming X OC Crucial MX200 +
MX500 SSDs
Corsair AX860i +
AX1200i PSUs
G.Skill RipjawsV,
SniperX, FlareX
Crucial Ballistix
DDR4
Silverstone
Coolers
Silverstone
Fans


Going for Power

How to Manage 5.0 GHz Turbo

Intel lists the Core i9-9900KS processor as having a 127W TDP. As we’ve discussed at length [1,2] regarding what TDP means, as well as interviewing Intel Fellows about it, this means that the Core i9-9900KS is rated to require a cooling power of 127W when running at its base frequency, 4.0 GHz. Above this frequency, for example at its turbo frequency of 5.0 GHz, we are likely to see higher than 127W.

Now, I started saying in this review that the length of time that the processor will spend at 5.0 GHz will be motherboard dependent. This is true: Intel does not strictly define how long turbo should be enabled on any processor. It allows the motherboard manufacturer to ‘over-engineer’ the motherboard in order to help push the power behind the turbo higher and enable turbo for longer. The specific values that matter here are called PL2 (Power Limit 2, or peak turbo power limit), and Tau (a time for turbo).

For an Intel processor, each one has a ‘bucket’ of extra turbo energy. As the processor draws more power above its TDP (also called PL1), the bucket is drained to provide this energy. When the bucket is empty, the processor has to come back down to the PL1 power value, and eventually when the processor is less active below PL1, the bucket will refill. How big this bucket is depends on the value of PL1, PL2, and Tau. The bigger the bucket, the longer an Intel processor can hold its turbo frequency. Typically Tau isn’t so much as a time for turbo, but a scalar based on how big that bucket should be.


An example graph showing the effect of implementing Turbo on power/frequency

Motherboard manufacturers can set PL1, PL2, and Tau as they wish – they have to engineer the motherboard in order to cope with high numbers, but it means that every motherboard can have different long turbo performance. Intel even suggests testing processors on high-end and low-end motherboards to see the difference. Users can also manually adjust PL1, PL2, and Tau, based on the cooling they are providing.

For the Core i9-9900KS, Intel has given the PL1 value on the box, of 127 W. PL2 it says should at least be 1.25x the value of PL1, which is 159 W. Tau should be at least 28 seconds. This means, with a given workload (typically 95% equivalent of a power virus), the CPU should turbo up to 159 W for 28 seconds before coming back down to 127 W. A very important thing to note is that if the CPU needs more than 159 W to hit the 5.0 GHz turbo frequency, it will reduce the frequency until it hits 159 W. This might mean 4.8 GHz, or lower.

Despite giving us these numbers for PL1, PL2, and Tau, Intel also stated to us that they recommend that motherboard manufacturers determine the best values based on the hardware capabilities. The values of 127 W, 159 W, and 28 seconds are merely guidelines – most motherboards should be able to go beyond this, and Intel encourages its partners to adjust these values by default as required.

We tested Intel’s guidelines with a 10 minute run of Cinebench R20.

He we can see that at idle, the CPU sits at 5.0 GHz. But immediately when the workload comes on, it has to reduce the average CPU frequency because it goes straight up to the 159W limit – simply put, 159W isn’t enough to hit 5.0 GHz. We see the temperature slowly rise to 92C, but because the power isn’t enough the frequency keeps fluctuating.

By the end of the first Cinebench R20 section, it seems that the majority of it occurred during the turbo period. This means that this run scored almost the same as a pure 5.0 GHz run. However the subsequent runs were not as performant.

Because the turbo budget had been used up, the processor had to sit at 127 W, its PL1 value. At this power, the processor kept bouncing between 4.6 GHz and 4.7 GHz to find the balance. The temperatures in this mode kept stable, nearer 80C, but the performance of Cinebench R20 dropped around 8-10% because the CPU was now limited by its PL1/TDP value, as per Intel’s base configuration recommendation.

Going Beyond

Because motherboard manufactuers can do what they want with these values, we set the task on the motherboard we tested, the MSI Z390 Gaming Edge AC. By default, MSI has set the BIOS for the Core i9-9900KS with a simple equation. PL1 = PL2 = 255 W. When PL1 and PL2 are equal to each other, then Tau doesn’t matter. But what this setting does is state that MSI will allow the processor to consume as much power as it needs to up to 255 W. If it can hit 5.0 GHz before this value (hint, it does), then the user can turbo at 5.0 GHz forever. The only way that this processor will reduce in frequency is either at idle or due to thermal issues.

Here’s the same run but done with MSI’s own settings:

The processor stayed at a constant 5.0 GHz through the whole run. The CPU started pulling around 172W on average during the test, fluctuating a little bit based on exactly which 1s and 0s were going through. The CPU temperature is obviously higher, as we used the same cooling setup as before, and peaked at 92C, but the system was fully table the entire time.

Here was our system setup – a 2kg TRUE Copper air cooler powered by an average fan running at full speed in an open test bed.

But what this means is that users are going to have to be wary of exactly what settings the motherboard manufacturers are using. For those of you reading this review on the day it goes live, you’ll likely see more than a dozen other reviews testing this chip – each one is likely using a different motherboard, and each one might be using different PL2 and Tau values. What you’ve got here are the two extremes: Intel’s recommendation and MSI’s ‘going to the max’. Be prepared for a range of results. Where time has permitted, we’ve tested both extremes.



CPU Performance: System Tests

Our System Test section focuses significantly on real-world testing, user experience, with a slight nod to throughput. In this section we cover application loading time, image processing, simple scientific physics, emulation, neural simulation, optimized compute, and 3D model development, with a combination of readily available and custom software. For some of these tests, the bigger suites such as PCMark do cover them (we publish those values in our office section), although multiple perspectives is always beneficial. In all our tests we will explain in-depth what is being tested, and how we are testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Application Load: GIMP 2.10.4

One of the most important aspects about user experience and workflow is how fast does a system respond. A good test of this is to see how long it takes for an application to load. Most applications these days, when on an SSD, load fairly instantly, however some office tools require asset pre-loading before being available. Most operating systems employ caching as well, so when certain software is loaded repeatedly (web browser, office tools), then can be initialized much quicker.

In our last suite, we tested how long it took to load a large PDF in Adobe Acrobat. Unfortunately this test was a nightmare to program for, and didn’t transfer over to Win10 RS3 easily. In the meantime we discovered an application that can automate this test, and we put it up against GIMP, a popular free open-source online photo editing tool, and the major alternative to Adobe Photoshop. We set it to load a large 50MB design template, and perform the load 10 times with 10 seconds in-between each. Due to caching, the first 3-5 results are often slower than the rest, and time to cache can be inconsistent, we take the average of the last five results to show CPU processing on cached loading.

AppTimer: GIMP 2.10.4

The 9900KS hits the top of all the consumer processors in our app loading test.

FCAT: Image Processing

The FCAT software was developed to help detect microstuttering, dropped frames, and run frames in graphics benchmarks when two accelerators were paired together to render a scene. Due to game engines and graphics drivers, not all GPU combinations performed ideally, which led to this software fixing colors to each rendered frame and dynamic raw recording of the data using a video capture device.

The FCAT software takes that recorded video, which in our case is 90 seconds of a 1440p run of Rise of the Tomb Raider, and processes that color data into frame time data so the system can plot an ‘observed’ frame rate, and correlate that to the power consumption of the accelerators. This test, by virtue of how quickly it was put together, is single threaded. We run the process and report the time to completion.

FCAT Processing ROTR 1440p GTX980Ti Data

For some reason our default 9900KS run didn't seem to perform properly, but the 9900KS at Intel guidelines did, within the margin of error of the 9900K which also does turbo at 5.0 GHz.

3D Particle Movement v2.1: Brownian Motion

Our 3DPM test is a custom built benchmark designed to simulate six different particle movement algorithms of points in a 3D space. The algorithms were developed as part of my PhD., and while ultimately perform best on a GPU, provide a good idea on how instruction streams are interpreted by different microarchitectures.

A key part of the algorithms is the random number generation – we use relatively fast generation which ends up implementing dependency chains in the code. The upgrade over the naïve first version of this code solved for false sharing in the caches, a major bottleneck. We are also looking at AVX2 and AVX512 versions of this benchmark for future reviews.

For this test, we run a stock particle set over the six algorithms for 20 seconds apiece, with 10 second pauses, and report the total rate of particle movement, in millions of operations (movements) per second. We have a non-AVX version and an AVX version, with the latter implementing AVX512 and AVX2 where possible.

3DPM v2.1 can be downloaded from our server: 3DPMv2.1.rar (13.0 MB)

3D Particle Movement v2.1

Without AVX acceleration, the Core i9-9900KS hardware manages to push ahead of the 9900K due to the extra frequency, and even above the 10-core 7900X. Because these are non-AVX instructions, they aren't pushing the CPU as hard as it can be, so we're not really draining the turbo bucket in our 159W PL2 test.

3D Particle Movement v2.1 (with AVX)

On the other hand, our AVX2 accelerated test is also showing both PL2 settings performing about equal. This test does involve a 10-second delay between each of its six subtests, which allows some turbo budget to be regained. Couple that with the 30 second delay between individual runs, it would appear that there's enough turbo budget for the whole run.

Dolphin 5.0: Console Emulation

One of the popular requested tests in our suite is to do with console emulation. Being able to pick up a game from an older system and run it as expected depends on the overhead of the emulator: it takes a significantly more powerful x86 system to be able to accurately emulate an older non-x86 console, especially if code for that console was made to abuse certain physical bugs in the hardware.

For our test, we use the popular Dolphin emulation software, and run a compute project through it to determine how close to a standard console system our processors can emulate. In this test, a Nintendo Wii would take around 1050 seconds.

The latest version of Dolphin can be downloaded from https://dolphin-emu.org/

Dolphin 5.0 Render Test

Dolphin loves single threaded performance, so we see the 9900 series at the top here.

DigiCortex 1.20: Sea Slug Brain Simulation

This benchmark was originally designed for simulation and visualization of neuron and synapse activity, as is commonly found in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron / 1.8B synapse simulation, equivalent to a Sea Slug.

Example of a 2.1B neuron simulation

We report the results as the ability to simulate the data as a fraction of real-time, so anything above a ‘one’ is suitable for real-time work. Out of the two modes, a ‘non-firing’ mode which is DRAM heavy and a ‘firing’ mode which has CPU work, we choose the latter. Despite this, the benchmark is still affected by DRAM speed a fair amount.

DigiCortex can be downloaded from http://www.digicortex.net/

DigiCortex 1.20 (32k Neuron, 1.8B Synapse)

Interestingly enough the big splot in this benchmark series is here with DigiCortex. I'm not sure what's going on here; not only with the result being low (due to DDR4-2666 compared to AMD's higher support) but also lower than the 9900K.

y-Cruncher v0.7.6: Microarchitecture Optimized Compute

I’ve known about y-Cruncher for a while, as a tool to help compute various mathematical constants, but it wasn’t until I began talking with its developer, Alex Yee, a researcher from NWU and now software optimization developer, that I realized that he has optimized the software like crazy to get the best performance. Naturally, any simulation that can take 20+ days can benefit from a 1% performance increase! Alex started y-cruncher as a high-school project, but it is now at a state where Alex is keeping it up to date to take advantage of the latest instruction sets before they are even made available in hardware.

For our test we run y-cruncher v0.7.6 through all the different optimized variants of the binary, single threaded and multi-threaded, including the AVX-512 optimized binaries. The test is to calculate 250m digits of Pi, and we use the single threaded and multi-threaded versions of this test.

Users can download y-cruncher from Alex’s website: http://www.numberworld.org/y-cruncher/

y-Cruncher 0.7.6 Single Thread, 250m Digits

y-Cruncher can use AVX512 for the HEDT chips, as they are faster than the 9900KS, but all the 9900 series are performing similarly at 5.0 GHz single threaded here.

Agisoft Photoscan 1.3.3: 2D Image to 3D Model Conversion

One of the ISVs that we have worked with for a number of years is Agisoft, who develop software called PhotoScan that transforms a number of 2D images into a 3D model. This is an important tool in model development and archiving, and relies on a number of single threaded and multi-threaded algorithms to go from one side of the computation to the other.

In our test, we take v1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos and push it through a reasonably fast variant of the algorithms, but is still more stringent than our 2017 test. We report the total time to complete the process.

Agisoft’s Photoscan website can be found here: http://www.agisoft.com/

Agisoft Photoscan 1.3.3, Complex Test

Agisoft is a more variable workload, so there will be bits here and there where both processors can fully go to 5.0 GHz turbo and recover budget. The 12-core AMD chip is ahead, and both 9900KS settings are almost equal. They are both ahead of the normal 9900K by just over 10%.



CPU Performance: Rendering Tests

Rendering is often a key target for processor workloads, lending itself to a professional environment. It comes in different formats as well, from 3D rendering through rasterization, such as games, or by ray tracing, and invokes the ability of the software to manage meshes, textures, collisions, aliasing, physics (in animations), and discarding unnecessary work. Most renderers offer CPU code paths, while a few use GPUs and select environments use FPGAs or dedicated ASICs. For big studios however, CPUs are still the hardware of choice.

All of our benchmark results can also be found in our benchmark engine, Bench.

Corona 1.3: Performance Render

An advanced performance based renderer for software such as 3ds Max and Cinema 4D, the Corona benchmark renders a generated scene as a standard under its 1.3 software version. Normally the GUI implementation of the benchmark shows the scene being built, and allows the user to upload the result as a ‘time to complete’.

We got in contact with the developer who gave us a command line version of the benchmark that does a direct output of results. Rather than reporting time, we report the average number of rays per second across six runs, as the performance scaling of a result per unit time is typically visually easier to understand.

The Corona benchmark website can be found at https://corona-renderer.com/benchmark

Corona 1.3 Benchmark

Interestingly both 9900KS settings performed slightly worse than the 9900K here, which you wouldn't expect given the all-core turbo being higher. It would appear that there is something else the bottleneck in this test.

Blender 2.79b: 3D Creation Suite

A high profile rendering tool, Blender is open-source allowing for massive amounts of configurability, and is used by a number of high-profile animation studios worldwide. The organization recently released a Blender benchmark package, a couple of weeks after we had narrowed our Blender test for our new suite, however their test can take over an hour. For our results, we run one of the sub-tests in that suite through the command line - a standard ‘bmw27’ scene in CPU only mode, and measure the time to complete the render.

Blender can be downloaded at https://www.blender.org/download/

Blender 2.79b bmw27_cpu Benchmark

All the 9900 parts and settings perform roughly the same with one another, however the PL2 255W setting on the 9900KS does allow it to get a small ~5% advantage over the standard 9900K.

LuxMark v3.1: LuxRender via Different Code Paths

As stated at the top, there are many different ways to process rendering data: CPU, GPU, Accelerator, and others. On top of that, there are many frameworks and APIs in which to program, depending on how the software will be used. LuxMark, a benchmark developed using the LuxRender engine, offers several different scenes and APIs.

In our test, we run the simple ‘Ball’ scene on both the C++ and OpenCL code paths, but in CPU mode. This scene starts with a rough render and slowly improves the quality over two minutes, giving a final result in what is essentially an average ‘kilorays per second’.

LuxMark v3.1 C++

Both 9900KS settings perform equally well here, and a sizeable jump over the standard 9900K.

POV-Ray 3.7.1: Ray Tracing

The Persistence of Vision ray tracing engine is another well-known benchmarking tool, which was in a state of relative hibernation until AMD released its Zen processors, to which suddenly both Intel and AMD were submitting code to the main branch of the open source project. For our test, we use the built-in benchmark for all-cores, called from the command line.

POV-Ray can be downloaded from http://www.povray.org/

POV-Ray 3.7.1 Benchmark

One of the biggest differences between the two power settings is in POV-Ray, with a marked frequency difference. In fact, the 159W setting on the 9900KS puts it below our standard settings for the 9900K, which likely had an big default turbo budget on the board it was on at the time.



CPU Performance: Encoding Tests

With the rise of streaming, vlogs, and video content as a whole, encoding and transcoding tests are becoming ever more important. Not only are more home users and gamers needing to convert video files into something more manageable, for streaming or archival purposes, but the servers that manage the output also manage around data and log files with compression and decompression. Our encoding tasks are focused around these important scenarios, with input from the community for the best implementation of real-world testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Handbrake 1.1.0: Streaming and Archival Video Transcoding

A popular open source tool, Handbrake is the anything-to-anything video conversion software that a number of people use as a reference point. The danger is always on version numbers and optimization, for example the latest versions of the software can take advantage of AVX-512 and OpenCL to accelerate certain types of transcoding and algorithms. The version we use here is a pure CPU play, with common transcoding variations.

We have split Handbrake up into several tests, using a Logitech C920 1080p60 native webcam recording (essentially a streamer recording), and convert them into two types of streaming formats and one for archival. The output settings used are:

  • 720p60 at 6000 kbps constant bit rate, fast setting, high profile
  • 1080p60 at 3500 kbps constant bit rate, faster setting, main profile
  • 1080p60 HEVC at 3500 kbps variable bit rate, fast setting, main profile

Handbrake 1.1.0 - 720p60 x264 6000 kbps FastHandbrake 1.1.0 - 1080p60 x264 3500 kbps FasterHandbrake 1.1.0 - 1080p60 HEVC 3500 kbps Fast

The 9900KS performed worse than our 9900K in our Handbrake tests, and we're not entirely sure why. It might be related to the regression we saw with DigiCortex.

7-zip v1805: Popular Open-Source Encoding Engine

Out of our compression/decompression tool tests, 7-zip is the most requested and comes with a built-in benchmark. For our test suite, we’ve pulled the latest version of the software and we run the benchmark from the command line, reporting the compression, decompression, and a combined score.

It is noted in this benchmark that the latest multi-die processors have very bi-modal performance between compression and decompression, performing well in one and badly in the other. There are also discussions around how the Windows Scheduler is implementing every thread. As we get more results, it will be interesting to see how this plays out.

Please note, if you plan to share out the Compression graph, please include the Decompression one. Otherwise you’re only presenting half a picture.

7-Zip 1805 Compression7-Zip 1805 Decompression7-Zip 1805 Combined

Both the 9900KS settings perform identically here, however the Compression test shows a performance regression compared to the standard 9900K. It does make me wonder if there are additional differences between the two chips (such as an internal clock).

WinRAR 5.60b3: Archiving Tool

My compression tool of choice is often WinRAR, having been one of the first tools a number of my generation used over two decades ago. The interface has not changed much, although the integration with Windows right click commands is always a plus. It has no in-built test, so we run a compression over a set directory containing over thirty 60-second video files and 2000 small web-based files at a normal compression rate.

WinRAR is variable threaded but also susceptible to caching, so in our test we run it 10 times and take the average of the last five, leaving the test purely for raw CPU compute performance.

WinRAR 5.60b3

AES Encryption: File Security

A number of platforms, particularly mobile devices, are now offering encryption by default with file systems in order to protect the contents. Windows based devices have these options as well, often applied by BitLocker or third-party software. In our AES encryption test, we used the discontinued TrueCrypt for its built-in benchmark, which tests several encryption algorithms directly in memory.

The data we take for this test is the combined AES encrypt/decrypt performance, measured in gigabytes per second. The software does use AES commands for processors that offer hardware selection, however not AVX-512.

AES Encoding



CPU Performance: Web and Legacy Tests

While more the focus of low-end and small form factor systems, web-based benchmarks are notoriously difficult to standardize. Modern web browsers are frequently updated, with no recourse to disable those updates, and as such there is difficulty in keeping a common platform. The fast paced nature of browser development means that version numbers (and performance) can change from week to week. Despite this, web tests are often a good measure of user experience: a lot of what most office work is today revolves around web applications, particularly email and office apps, but also interfaces and development environments. Our web tests include some of the industry standard tests, as well as a few popular but older tests.

We have also included our legacy benchmarks in this section, representing a stack of older code for popular benchmarks.

All of our benchmark results can also be found in our benchmark engine, Bench.

WebXPRT 3: Modern Real-World Web Tasks, including AI

The company behind the XPRT test suites, Principled Technologies, has recently released the latest web-test, and rather than attach a year to the name have just called it ‘3’. This latest test (as we started the suite) has built upon and developed the ethos of previous tests: user interaction, office compute, graph generation, list sorting, HTML5, image manipulation, and even goes as far as some AI testing.

For our benchmark, we run the standard test which goes through the benchmark list seven times and provides a final result. We run this standard test four times, and take an average.

Users can access the WebXPRT test at http://principledtechnologies.com/benchmarkxprt/webxprt/

WebXPRT 3 (2018)

WebXPRT 2015: HTML5 and Javascript Web UX Testing

The older version of WebXPRT is the 2015 edition, which focuses on a slightly different set of web technologies and frameworks that are in use today. This is still a relevant test, especially for users interacting with not-the-latest web applications in the market, of which there are a lot. Web framework development is often very quick but with high turnover, meaning that frameworks are quickly developed, built-upon, used, and then developers move on to the next, and adjusting an application to a new framework is a difficult arduous task, especially with rapid development cycles. This leaves a lot of applications as ‘fixed-in-time’, and relevant to user experience for many years.

Similar to WebXPRT3, the main benchmark is a sectional run repeated seven times, with a final score. We repeat the whole thing four times, and average those final scores.

WebXPRT15

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a accrued test over a series of javascript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics. We report this final score.

Speedometer 2

Google Octane 2.0: Core Web Compute

A popular web test for several years, but now no longer being updated, is Octane, developed by Google. Version 2.0 of the test performs the best part of two-dozen compute related tasks, such as regular expressions, cryptography, ray tracing, emulation, and Navier-Stokes physics calculations.

The test gives each sub-test a score and produces a geometric mean of the set as a final result. We run the full benchmark four times, and average the final results.

Google Octane 2.0

Mozilla Kraken 1.1: Core Web Compute

Even older than Octane is Kraken, this time developed by Mozilla. This is an older test that does similar computational mechanics, such as audio processing or image filtering. Kraken seems to produce a highly variable result depending on the browser version, as it is a test that is keenly optimized for.

The main benchmark runs through each of the sub-tests ten times and produces an average time to completion for each loop, given in milliseconds. We run the full benchmark four times and take an average of the time taken.

Mozilla Kraken 1.1

3DPM v1: Naïve Code Variant of 3DPM v2.1

The first legacy test in the suite is the first version of our 3DPM benchmark. This is the ultimate naïve version of the code, as if it was written by scientist with no knowledge of how computer hardware, compilers, or optimization works (which in fact, it was at the start). This represents a large body of scientific simulation out in the wild, where getting the answer is more important than it being fast (getting a result in 4 days is acceptable if it’s correct, rather than sending someone away for a year to learn to code and getting the result in 5 minutes).

In this version, the only real optimization was in the compiler flags (-O2, -fp:fast), compiling it in release mode, and enabling OpenMP in the main compute loops. The loops were not configured for function size, and one of the key slowdowns is false sharing in the cache. It also has long dependency chains based on the random number generation, which leads to relatively poor performance on specific compute microarchitectures.

3DPM v1 can be downloaded with our 3DPM v2 code here: 3DPMv2.1.rar (13.0 MB)

3DPM v1 Single Threaded3DPM v1 Multi-Threaded

x264 HD 3.0: Older Transcode Test

This transcoding test is super old, and was used by Anand back in the day of Pentium 4 and Athlon II processors. Here a standardized 720p video is transcoded with a two-pass conversion, with the benchmark showing the frames-per-second of each pass. This benchmark is single-threaded, and between some micro-architectures we seem to actually hit an instructions-per-clock wall.

x264 HD 3.0 Pass 1x264 HD 3.0 Pass 2



Gaming: World of Tanks enCore

Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved eSports status when it debuted at the World Cyber Games back in 2012.

World of Tanks enCore is a demo application for a new and unreleased graphics engine penned by the Wargaming development team. Over time the new core engine will implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine run optimally on their system.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

 



Gaming: Final Fantasy XV

Upon arriving to PC earlier this, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console, fruits of their successful partnership with NVIDIA, with hardly any hint of the troubles during Final Fantasy XV's original production and development.

In preparation for the launch, Square Enix opted to release a standalone benchmark that they have since updated. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues.

Square Enix has patched the benchmark with custom graphics settings and bugfixes to be much more accurate in profiling in-game performance and graphical options. For our testing, we run the standard benchmark with a FRAPs overlay, taking a 6 minute recording of the test.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile



Gaming: Shadow of War

Next up is Middle-earth: Shadow of War, the sequel to Shadow of Mordor. Developed by Monolith, whose last hit was arguably F.E.A.R., Shadow of Mordor returned them to the spotlight with an innovative NPC rival generation and interaction system called the Nemesis System, along with a storyline based on J.R.R. Tolkien's legendarium, and making it work on a highly modified engine that originally powered F.E.A.R. in 2005.

Using the new LithTech Firebird engine, Shadow of War improves on the detail and complexity, and with free add-on high-resolution texture packs, offers itself as a good example of getting the most graphics out of an engine that may not be bleeding edge.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS



Gaming: Ashes Classic (DX12)

Seen as the holy child of DirectX12, Ashes of the Singularity (AoTS, or just Ashes) has been the first title to actively go explore as many of the DirectX12 features as it possibly can. Stardock, the developer behind the Nitrous engine which powers the game, has ensured that the real-time strategy title takes advantage of multiple cores and multiple graphics cards, in as many configurations as possible.

As a real-time strategy title, Ashes is all about responsiveness during both wide open shots but also concentrated battles. With DirectX12 at the helm, the ability to implement more draw calls per second allows the engine to work with substantial unit depth and effects that other RTS titles had to rely on combined draw calls to achieve, making some combined unit structures ultimately very rigid.

Stardock clearly understand the importance of an in-game benchmark, ensuring that such a tool was available and capable from day one, especially with all the additional DX12 features used and being able to characterize how they affected the title for the developer was important. The in-game benchmark performs a four minute fixed seed battle environment with a variety of shots, and outputs a vast amount of data to analyze.

For our benchmark, we run Ashes Classic: an older version of the game before the Escalation update. The reason for this is that this is easier to automate, without a splash screen, but still has a strong visual fidelity to test.

Ashes has dropdown options for MSAA, Light Quality, Object Quality, Shading Samples, Shadow Quality, Textures, and separate options for the terrain. There are several presents, from Very Low to Extreme: we run our benchmarks at the above settings, and take the frame-time output for our average and percentile numbers.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile



Gaming: Strange Brigade (DX12, Vulkan)

Strange Brigade is based in 1903’s Egypt and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.

The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark which offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. AMD has boasted previously that Strange Brigade is part of its Vulkan API implementation offering scalability for AMD multi-graphics card configurations.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

AnandTech IGP Low Medium High
Average FPS
95th Percentile



Gaming: Grand Theft Auto V

The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark. The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data.

There are no presets for the graphics options on GTA, allowing the user to adjust options such as population density and distance scaling on sliders, but others such as texture/shadow/shader/water quality from Low to Very High. Other options include MSAA, soft shadows, post effects, shadow resolution and extended draw distance options. There is a handy option at the top which shows how much video memory the options are expected to consume, with obvious repercussions if a user requests more video memory than is present on the card (although there’s no obvious indication if you have a low end GPU with lots of GPU memory, like an R7 240 4GB).

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile



Gaming: Far Cry 5

The latest title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration.

Far Cry 5 does support Vega-centric features with Rapid Packed Math and Shader Intrinsics. Far Cry 5 also supports HDR (HDR10, scRGB, and FreeSync 2). We use the in-game benchmark for our data, and report the average/minimum frame rates.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile



Gaming: F1 2018

Aside from keeping up-to-date on the Formula One world, F1 2017 added HDR support, which F1 2018 has maintained; otherwise, we should see any newer versions of Codemasters' EGO engine find its way into F1. Graphically demanding in its own right, F1 2018 keeps a useful racing-type graphics workload in our benchmarks.

We use the in-game benchmark, set to run on the Montreal track in the wet, driving as Lewis Hamilton from last place on the grid. Data is taken over a one-lap race.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

 



Conclusion

The Intel Core i9-9900KS is Intel’s first consumer level all-core 5.0 GHz processor. Technically Intel has launched an all 5.0 GHz processor before: earlier this year the Core i9-9990XE was launched into the high-frequency trading market, which had 14 cores at 5.0 GHz, but that part is an auction only part for select business partners. What the Core i9-9900KS does is bring the same principle down to a more consumer friendly core count and a more consumer friendly price point. The tray price is set at $513, although it’s likely to be sold for much more than that.

Playing To Power

One of the key elements I wanted to test in this review is how the chip responds to Turbo. As we’ve discussed at length, and confirmed by Intel: the guidelines for the Turbo settings are not set in stone. Intel actively encourages its motherboard partners to increase these settings if the motherboards are over-engineered to be able to do so. This means that a high-end motherboard should be able to give a longer turbo than a cheap board.

A longer turbo might not mean much. When the turbo budget has run out, the system will limit the chip to the TDP setting in the BIOS (which should be the one on the box), and will try and maximise the frequency for the power limit. On a lot of chips, this means you still have a very high frequency, nowhere near the base frequency. But the power limit does have benefits such as acting as a thermal control at least.

In our test, we used MSI’s Z390 Gaming Edge AC. It’s a mid-upper motherboard, but it set our Core i9-9900KS to have a TDP and turbo power limit of 255W. Intel’s ‘guidelines’ state a TDP of 127 W and a turbo power limit of 159 W. When comparing the two, there are some distinct advantages for the 255W setting, such as 10%+ performance on rendering, but the 159W setting does afford 10C lower temperatures in those heavy workloads. Ultimately, as the name TDP = Thermal Design Power implies, it all comes down on your ability to cool the chip.

For gaming, the turbo budget didn’t seem to matter at all, except in a few tests at super low resolution and settings.

One question that does remain however, is which set of results should we keep? The 255W results are what we get out of the box, and the 159W results are only 'Intel guidelines that Intel expects none of the board manufacturers to keep to'. Ideally we keep both, but that's a mess in its own right.

Planning Against The Competition

There’s no getting around giving Intel kudos for binning enough processors to commercially sell an all-core 5.0 GHz chip. In our benchmarks, we see it steaming ahead of any other consumer grade processor when it comes to single core performance. Users are likely to be able to push that single (or dual) core turbo a bit higher as well, although the power limits should be monitored.

It should be noted that in most cases, the Core i9-9900KS either matched or excelled against the previous king of Intel’s consumer desktop line, the Core i9-9900K. There were a few select instances, namely benchmarks like Handbrake, DigiCortex, F1 2018, and 7-zip, where we did see performance regressions that we weren’t expecting. We’re going to have to go back to Intel to see exactly what these are. But they seem confined to very specific workloads.

Overall, the Core i9-9900KS is Intel’s best ever consumer processor.

In ST performance metrics, it wins. In variable threaded metrics, it either wins or does really well. In MT performance metrics, it depends on how strong AMD’s 12-core hardware really is, and how multithreaded the calculation really is. As Intel slowly adds AVX-512 to its consumer line, as it is with Ice Lake, then the MT competition is going to be really interesting.

Only Available For A Short Time Only

While the Core i9-9990XE is a 14-core 5.0 GHz chip, it is an OEM only part sold by Intel at auction only, whereas the Core i9-9900KS should experience wider availability at retail, albeit for a limited time.

Our colleagues at Tom’s Hardware reported that Intel stated in a promotional video that the processor would only be available during the holiday season of 2019 – or at least that the stock level would not be replenished after the holiday season. When we approached Intel asking for confirmation, we were told:

This special edition processor will be available for a limited time only. It can be found at retailers worldwide. We are not disclosing unit quantity information. However, the Core i9-9900KS will have very limited availability.

There is no doubt that there will be some CPUs available into 2020, however it would appear that Intel is only making one main batch of hardware, and once it has gone, it has gone. This might make the $513 tray price that Intel is putting on the part a bit of a misnomer, as retailers might take advantage of this. This will take the shine off the Core i9-9900KS a little, as at $529 or so it would easily be recommended over a Core i9-9900K. If it goes to $599 or $649 because of its limited release, then it becomes less of an interesting buy.

Ultimately the Core i9-9900KS is going to end up in the hands of enthusiasts who want nothing more than the best, but don’t want to jump to the high-end desktop platform. Despite the Intel chipsets for consumers, it’s still a shame that these processors only have 16 PCIe 3.0 lanes, given the desire for direct attached PCIe storage in this market.

Log in

Don't have an account? Sign up now