Original Link: https://www.anandtech.com/show/14980/the-intel-core-i9-9990xe-review



Within a few weeks, Intel is set to launch its most daring consumer desktop processor yet: the Core i9-9900KS, which offers eight cores all running at 5.0 GHz. There’s going to be a lot of buzz about this processor, but what people don’t know is that Intel already has an all 5.0 GHz processor, and it actually has 14 cores: the Core i9-9990XE. This ultra-rare thing isn’t sold to consumers – Intel only sells it to select partners, and even then it is only sold via an auction, once per quarter, with no warranty from Intel. How much would you pay for one? Well we got one to test.

Build It, And They Will Come

The Core i9-9990XE is the pinnacle of Intel’s 14nm process, binned to such an nth degree that Intel can neither guarantee how many it can produce nor support it in any way or fashion. Unlike other mass market processors, there is no product support on this thing, no such thing as ‘EOL’ – once a system integrator wins it at auction it’s a sunk cost to that integrator. The idea is to sell it on for a premium, before the boss wants it for his own personal system. I mean, who wouldn’t want 14 cores at 5.0 GHz?

This CPU is part of the high-end desktop family of processors, and runs in select X299 motherboards. It’s a Core i9, rather than a Xeon, which means only four memory channels and no ECC support. It does technically support overclocking, although your mileage may vary. This here is a processor for only one market, and it’s a market willing to spend big bucks to get any sort of millisecond latency advantage: high-frequency trading.

At the first auction, we initially knew of three companies that took part. The closed auction was somewhat of a mystery to those wanting to bid: they knew what the hardware was, but not how many Intel were going to offer. Out of the three companies we spoke to, one sat by and didn’t bid, the second got three processors, and a third got the rest. How many that was, we’re not sure – just like how much value these companies put in these parts. As I mentioned at the start: how much would you pay for a 14-core 5.0 GHz all-core processor?

High-Frequency Trading systems are no stranger to esoteric arrangements. Stories of companies spending 10s of millions to implement line-of-sight microwave transmitter towers to shave off 3 milliseconds from the latency time is a story I once heard. All the big financial traders have their servers located as close to the exchange as possible, because the speed of light through an optical cable still isn’t fast enough. These companies not only pay through the nose for the hardware, but also pay experts and specialists to tune those systems for low latency. That means tweaking the memory, overclocking the processor, and even implementing chillers to get a fully stable but the fastest possible system.

So how much would these people pay for a pre-binned 14-core 5.0 GHz processor? Some of them might already be running higher than that, as a standard Core i9-9980XE off the shelf, if you buy enough of them and bin them, could potentially run at this speed. In the end, we got an answer from CaseKing, the recipient of most of these Core i9-9990XE processors: $2800. In fact, since that initial price, it has actually gone up to $2850. Compared to the Core i9-9980XE ($1979), or the newly announced Core i9-10980XE ($999), then yes, traders will easily spend $1000-$2000 more for the lowest latency x86 CPU on the market.

Intel's HEDT CPUs
AnandTech Cores
Threads
Base
Freq
All
Core
Turbo
2.0
Turbo
3.0
TDP PCIe
3.0
MSRP
Cascade Lake-X
i9-10980XE 18 / 36 3.0 3.8 4.6 4.8 165 W 48 $979
i9-10940X 14 / 28 3.3 4.1 4.6 4.6 165 W 48 $784
i9-10920X 12 / 24 3.5 4.2 4.6 4.8 165 W 48 $689
i9-10900X 10 / 24 3.7 4.3 4.5 4.7 165 W 48 $590
Skylake-X
i9-9990XE 14 / 28 4.0 5.0 5.0 5.0 255 W 44 $auction
i9-9980XE 18 / 36 3.0 3.8 4.4 4.5 165 W 44 $1979
i9-9960X 16 / 32 4.1   4.4 4.5 165 W 44 $1684
i9-9940X 14 / 28 3.3   4.4 4.5 165 W 44 $1387
i9-9920X 12 / 24 3.5   4.4 4.5 165 W 44 $1189
i9-9900X 10 / 20 3.5   4.4 4.5 165 W 44 $989
Coffee Lake Refresh
i9-9900KS 8 / 16 4.0 5.0 5.0 - 127 W? 16 $513

So where do we come in? We have a sample. Technically we have a whole system, from International Computer Concepts, or ICC. ICC is a server specialist – we first met them at Supercomputing 2015 showing off a crazy tower system with 8 different servers in side, but they work closely with Intel to provide specific solutions for different vertical markets: oil and gas, medical, high performance computing, and very importantly, financial. They will sell a system overclocked to the gills.

Unfortunately, due to some proprietary technology, we can’t show you the inside of the server they sent us. It’s a standard 1U design, with an ASUS X299 motherboard inside and 32GB of customized memory. It uses an all-copper custom liquid cooled system that is absolutely overkill for most hardware, but does enough to keep this Core i9-9990XE under control. Being a 1U system, which means 1.75-inches tall (4.45cm), and having to house this monstrous beast means the cooling has to be top class, and ICC doesn’t skimp. To that end, it is also loud. There’s no way you’re having a 1U like this in the same room as you are working, as this thing is loud. More detail inside the review.

On top of the standard out-of-the-box specifications, ICC has done further tweaks to the BIOS to ensure the lowest latency and stability. Again, we’re not able to show you what these are, but we were told not to update the BIOS as part of our testing. The 1U server does have space for two graphics cards, two M.2 drives, four SATA drives, and does come with 1200W power supply. We do have some measurements inside the review for the power as well.

Don’t Drop It

On the face of it, the Core i9-9990XE is a standard LGA2066 chip. It uses Intel’s regular 18-core ‘HCC’ Skylake silicon, however it’s geared towards the ‘consumer’ platform, which is part of Intel’s product segmentation strategy. It doesn’t have ECC support, and so is limited to 128 GB of standard DDR4 memory, although you can bet that any HFT system that uses this part will run high speed memory. The chip has 44 PCIe 3.0 lanes, in line with other LGA2066 consumer parts, and because it isn’t a Xeon, does not support RAS features or vPro for management.

One of the issues with this chip is that at this price, typically we have professional users that require in-band management features and other security elements to make sure their expensive hardware remains secure and affords appropriate manageability. By designating this part a Core i9, rather than something like a Xeon W, Intel takes those offerings off the table: OEMs that purchase and resell the part to end-users have to explain to end-users that this rare chip comes with these limitations.

At this point we do not know how many chips Intel intends to put into the market. Intel is having an auction every quarter with what chips do pass the grade, assuming that any OEMs want to actually buy them for their customers. We could be talking sub-100 units per year, which is a little odd given that Intel doesn’t need to bin these to the same strict longevity standard as other chips as it doesn’t provide a warranty. Because of all this ‘product / not a product’, the Core i9-9990XE doesn’t get its own page in Intel’s processor database, and it will never be given a strict ‘end-of-life’ program as it doesn’t fall under the standard product order/shipping regime. All the long-term support falls at the hands of the company or OEM that buys them.

The Chip and the Competition

Strictly speaking, this Core i9-9990XE is a 14-core processor with a base frequency of 4.0 GHz and a thermal design power at that frequency of 255W. The turbo frequency for this processor is 5.0 GHz on all cores. But this creates a little bit of an issue for an ‘all core 5.0 GHz turbo’ classification.

As stated in our interviews with Intel Fellows about how turbo response should be presented, we explained that how long a system has turbo enabled is dependent on the instructions being used but also by the motherboard manufacturer. Turbo is defined by a higher level power limit (PL2) and a turbo budget time (Tau) which is indicative of a percentage of a power virus. Normally Intel ‘suggests’ a turbo power of 25% higher than TDP (so for 255W, that is 319W), and anywhere from 8-200 seconds of turbo depending on the platform.

For the 1U server we were given for testing, ICC has enabled turbo for an unlimited power for an unlimited time (technically up to 4096 seconds I believe), as they want to enable this CPU to hit 5.0 GHz on all cores all the time. In order to do this, as mentioned above, requires some very effective cooling. It becomes doubly complicated for ICC, given that they want to do this in a 1U, and so have developed some proprietary cooling technology to enable this.


This is as much as I can legally show you about the cooling

Technically this chip supports Turbo Max 3.0, whereby Intel designates the best performing cores for even higher turbo frequencies. In our case, out of the 14 cores, Core 10 was considered the best. Inside Windows, the ACPI interface will detect key software (or software defined by the active window) and try to run it on these cores with an extra frequency bump (+100 MHz or so). For our system, while the TBM3 and ACPI interface did lock software to specific cores, we saw no increase in frequency, due to the way the system has been set up. One of the other key areas for ICC’s customers is low latency but consistent low latency. In order not to modify that consistency, TBM 3.0 has no effect on the processor frequencies for our testing.

The other features of the chip are the quad-channel memory support of DDR4-2666 in single rank mode. ICC supplied our system with custom memory modules and appropriate heatsinks, with the system running at DDR4-3600 CL16. This chip also has 44 PCIe 3.0 lanes, in line with other 9th series Intel HEDT processors.

Competition for the Core i9-9990XE comes from several sides.

One CPU on the books is the upcoming Core i9-9900KS, an eight-core processor that also promotes all eight cores at 5.0 GHz. This chip uses the consumer grade mainstream silicon, and thus only has two memory channels and 16 PCIe 3.0 lanes. This CPU is going to be launched in a couple of days (October 30th), with a $513 MSRP.

Another CPU is the new Cascade Lake-X 18-core flagship, the Core i9-10980XE, for $999. This is the latest high-end desktop processor, with (we assume) the latest security updates from Intel as well as a boost in some of the freuqencies from the Core i9-9980XE. Ultimately this has four more cores than the 9990XE, but lower frequencies, and is cheaper. The user that is lucky enough to get a good sample could perhaps overclock it to match the 9990XE. The Core i9-10980XE also has four more PCIe 3.0 lanes and the same number of memory channels.

From AMD’s side, the upcoming 16-core Ryzen 9 3950X in November is one angle. Being on 7nm it is certainly more energy efficient, and the Zen 2 microarchitecture has a higher IPC than the Intel part, but the CPU won’t be able to reach the same frequencies. It is also aimed at consumers, with 24 PCIe 4.0 lanes and two memory channels. At an MSRP of $749, it will certainly cost a lot less, however.

We can also look towards AMD’s launch of the next generation of Threadripper, also based on Zen 2 and 7nm. At this point, aside from AMD announcing that they are coming in November and starting with a 24-core CPU, we don’t have many details. It is expected to have four memory channels, 64 PCIe lanes, and might come in around 4.0 GHz. It will still have the issue of not clocking as high as the Intel part, and price/power is an unknown at this point.

AMD has however launched its Zen 2 server hardware, the EPYC 7002 series. Rather than looking at a high frequency 14-core part, users might consider a 32-core CPU here, with eight memory channels, a high IPC, and 128 PCIe 4.0 lanes. Again, the deficit is going to be in the frequency, which is something that HPC traders desire. The EPYC 7502P retails for around $3400, so in the right server if a HPC trader needs to scale out, this could be an option.

Comparing the i9-9990XE
Intel   AMD
Xeon
W-3175X
Core i9
9990XE
Core i9
9900KS
AnandTech Ryzen
7 3950X
TR 2
2990WX
EPYC
7542
28 14 8 Cores 16 32 32
56 28 16 Threads 32 64 64
3.1 4.0 4.0 Base 3.5 3.0 2.9
  5.0 5.0 All-Core      
4.5 5.0 5.0 Turbo 4.7 4.2 3.4
255 W 255 W 127 W? TDP 105 W 250 W 225W
6 x 2666 4 x 2666 2 x 2666 DDR4 2 x 3200 4 x 2933 8 x 3200
48 44 16 PCIe 24 64 128
$2999 $auction $513 MSRP $749 $1799 $3400

For any comparison you make, there’s no denying that the Core i9-9990XE pushes the boundaries for Intel’s binning on its 14nm process. This is why it has no MSRP, and why Intel can’t predict how many it will be able to manufacture in any given quarter. For whatever the OEMs end up paying for it at auction, the fact that CaseKing has it for sale (with 1 year OEM warranty) for 2849 Euro, means that it sits well above any other Intel high-end desktop processor, and with good reason.

Our Testbed

It should be noted that Intel’s recent updates regarding Spectre, Meltdown, and ZombieLoad may have an effect on performance. Based on data we’ve seen at Intel, the mitigations hurt the newest hardware the least (compared to say, Broadwell). The system provided by ICC does not have firmware mitigations in place, however we did use an OS version that had some of the software implemented fixes. ICC was clear that some of its customers, while concerned about these issues, just want the fastest system possible based on the way they use these systems.

As a result, our results here are ultimately not in the same ‘ilk’ as our previous reviews. Because of the custom BIOS being used, with the overclock options locked down, the benchmark data will not necessarily mirror an ‘off-the-shelf’ installation, but will mirror a pre-built system which is ultimately what these chips are aimed for. As a result, we’re putting an Asterisk by our results, to indicate that the environment for this chip was different.

CPU: Intel Core i9-9990XE, 14 Cores, 4.0 GHz Base, 5.0 GHz Turbo, 255W TDP, $Auction
DRAM: 4x8 GB Custom ICC Modules, DDR4-3600 CL16
Motherboard: ASUS X299
GPU: Sapphire Radeon RX460 2GB
Cooling: ICC Proprietary Liquid Cooling
Power Supply: Dual 1200W 1U Redundant Supplies
Storage: Micron MX500 1TB SSD
Chassis: 1U Rack Server

In our reviews, we normally take an open-air testbed with powerful cooling, a powerful motherboard, DRAM at manufacturer supported frequencies, and the latest public BIOS for that motherboard.

For our benchmarks, we ran our standard CPU suite. Due to the 1U arrangement, and where this chip is focused, we did not install a large GPU for gaming tests. Users looking at this system wanting to pair it with a large CUDA card for financial simulation will likely have a field day, but for gaming, that is best left to the Core i9-9900KS when it comes out.

Also, while this CPU is overclockable, the motherboard supplied had a locked BIOS on overclocking: ICC has configured it for performance and stability, and we were unable to even open the appropriate menus in the BIOS to perform overclocking.

If there is a sufficient request from readers, we’ll look into taking the chip and running it in a different motherboard for gaming and overclocking performance. I’ll have to see if my best cooling solution will be sufficient.

Pages In This Review

  1. Analysis and Competition
  2. The Core i9-9990XE: Compilation Champion
  3. CPU Performance: Rendering Tests
  4. CPU Performance: Encoding Tests
  5. CPU Performance: System Tests
  6. CPU Performance: Office Tests
  7. CPU Performance: Web and Legacy Tests
  8. Power Consumption and Thermals
  9. Conclusions and Final Words


Chromium Compile: Windows VC++ Compile of Chrome

A large number of AnandTech readers are software engineers, looking at how the hardware they use performs. While compiling a Linux kernel is ‘standard’ for the reviewers who often compile, our test is a little more varied – we are using the windows instructions to compile Chrome, specifically a Chrome 56 build from March 2017, as that was when we built the test. Google quite handily gives instructions on how to compile with Windows, along with a 400k file download for the repo. This is by far one of our most popular benchmarks, and is a good measure of core performance, multithreading performance, and also memory accesses.

In our test, using Google’s instructions, we use the MSVC compiler and ninja developer tools to manage the compile. As you may expect, the benchmark is variably threaded, with a mix of DRAM requirements that benefit from faster caches. Data procured in our test is the time taken for the compile, which we convert into compiles per day. The benchmark takes anywhere from an hour on a fast single high-end desktop processor to several hours on the slowest offerings.

Compile Chromium (Rate)

Prior to this test, the two CPUs battling it out for supremacy were the 16-core Ryzen Threadripper 2950X, and the 8-core i9-9900K. By adding six more cores, a lot more frequency, and two more memory channels, the Core i9-9990XE plows through this test very easily, perfoming the compile in 42 minutes and 10 seconds, and is the only processor to broach the 50 minute mark, let alone the 45 minute mark. 



CPU Performance: Rendering Tests

Rendering is often a key target for processor workloads, lending itself to a professional environment. It comes in different formats as well, from 3D rendering through rasterization, such as games, or by ray tracing, and invokes the ability of the software to manage meshes, textures, collisions, aliasing, physics (in animations), and discarding unnecessary work. Most renderers offer CPU code paths, while a few use GPUs and select environments use FPGAs or dedicated ASICs. For big studios however, CPUs are still the hardware of choice.

All of our benchmark results can also be found in our benchmark engine, Bench.

Blender 2.79b: 3D Creation Suite

A high profile rendering tool, Blender is open-source allowing for massive amounts of configurability, and is used by a number of high-profile animation studios worldwide. The organization recently released a Blender benchmark package, a couple of weeks after we had narrowed our Blender test for our new suite, however their test can take over an hour. For our results, we run one of the sub-tests in that suite through the command line - a standard ‘bmw27’ scene in CPU only mode, and measure the time to complete the render.

Blender can be downloaded at https://www.blender.org/download/

Blender 2.79b bmw27_cpu Benchmark

Blender can take advantage of more cores, and whule the frequency of the 9990XE helps compared to the 7940X, it isn't enough to overtake 18-core hardware.

LuxMark v3.1: LuxRender via Different Code Paths

As stated at the top, there are many different ways to process rendering data: CPU, GPU, Accelerator, and others. On top of that, there are many frameworks and APIs in which to program, depending on how the software will be used. LuxMark, a benchmark developed using the LuxRender engine, offers several different scenes and APIs.


Taken from the Linux Version of LuxMark

In our test, we run the simple ‘Ball’ scene on both the C++ and OpenCL code paths, but in CPU mode. This scene starts with a rough render and slowly improves the quality over two minutes, giving a final result in what is essentially an average ‘kilorays per second’.

LuxMark v3.1 C++

We see a slight regression in performance here compared to the 7940X, which is interesting. I wonder if that 2.4 GHz fixed mesh is a limiting factor.

POV-Ray 3.7.1: Ray Tracing

The Persistence of Vision ray tracing engine is another well-known benchmarking tool, which was in a state of relative hibernation until AMD released its Zen processors, to which suddenly both Intel and AMD were submitting code to the main branch of the open source project. For our test, we use the built-in benchmark for all-cores, called from the command line.

POV-Ray can be downloaded from http://www.povray.org/

POV-Ray 3.7.1 Benchmark



CPU Performance: Encoding Tests

With the rise of streaming, vlogs, and video content as a whole, encoding and transcoding tests are becoming ever more important. Not only are more home users and gamers needing to convert video files into something more manageable, for streaming or archival purposes, but the servers that manage the output also manage around data and log files with compression and decompression. Our encoding tasks are focused around these important scenarios, with input from the community for the best implementation of real-world testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Handbrake 1.1.0: Streaming and Archival Video Transcoding

A popular open source tool, Handbrake is the anything-to-anything video conversion software that a number of people use as a reference point. The danger is always on version numbers and optimization, for example the latest versions of the software can take advantage of AVX-512 and OpenCL to accelerate certain types of transcoding and algorithms. The version we use here is a pure CPU play, with common transcoding variations.

We have split Handbrake up into several tests, using a Logitech C920 1080p60 native webcam recording (essentially a streamer recording), and convert them into two types of streaming formats and one for archival. The output settings used are:

  • 720p60 at 6000 kbps constant bit rate, fast setting, high profile
  • 1080p60 at 3500 kbps constant bit rate, faster setting, main profile
  • 1080p60 HEVC at 3500 kbps variable bit rate, fast setting, main profile

Handbrake 1.1.0 - 720p60 x264 6000 kbps FastHandbrake 1.1.0 - 1080p60 x264 3500 kbps FasterHandbrake 1.1.0 - 1080p60 HEVC 3500 kbps Fast

Our encoding tests require a good balance of cores and frequency, and the 5.0 GHz 14-core hardware easily pulls ahead of the 7940X, and shows that having 28 cores isn't always a good thing.

7-zip v1805: Popular Open-Source Encoding Engine

Out of our compression/decompression tool tests, 7-zip is the most requested and comes with a built-in benchmark. For our test suite, we’ve pulled the latest version of the software and we run the benchmark from the command line, reporting the compression, decompression, and a combined score.

It is noted in this benchmark that the latest multi-die processors have very bi-modal performance between compression and decompression, performing well in one and badly in the other. There are also discussions around how the Windows Scheduler is implementing every thread. As we get more results, it will be interesting to see how this plays out.

Please note, if you plan to share out the Compression graph, please include the Decompression one. Otherwise you’re only presenting half a picture.

7-Zip 1805 Compression7-Zip 1805 Decompression7-Zip 1805 Combined

This is where having 28-cores helps, as the extra frequency can't beat some extra cores.

WinRAR 5.60b3: Archiving Tool

My compression tool of choice is often WinRAR, having been one of the first tools a number of my generation used over two decades ago. The interface has not changed much, although the integration with Windows right click commands is always a plus. It has no in-built test, so we run a compression over a set directory containing over thirty 60-second video files and 2000 small web-based files at a normal compression rate.

WinRAR is variable threaded but also susceptible to caching, so in our test we run it 10 times and take the average of the last five, leaving the test purely for raw CPU compute performance.

WinRAR 5.60b3

WinRAR is one of our variable threaded tests, so here a mix of cores and frequency helps. Interestingly enough, the 9990XE despite with the higher frequency is slightly slower than the 7940X - this might be a function of the test getting too fast, or the fact that the extra power needed to drive up the cores to peak frequency might be causing additional delays with all the small files.

AES Encryption: File Security

A number of platforms, particularly mobile devices, are now offering encryption by default with file systems in order to protect the contents. Windows based devices have these options as well, often applied by BitLocker or third-party software. In our AES encryption test, we used the discontinued TrueCrypt for its built-in benchmark, which tests several encryption algorithms directly in memory.

The data we take for this test is the combined AES encrypt/decrypt performance, measured in gigabytes per second. The software does use AES commands for processors that offer hardware selection, however not AVX-512.

AES Encoding



CPU Performance: System Tests

Our System Test section focuses significantly on real-world testing, user experience, with a slight nod to throughput. In this section we cover application loading time, image processing, simple scientific physics, emulation, neural simulation, optimized compute, and 3D model development, with a combination of readily available and custom software. For some of these tests, the bigger suites such as PCMark do cover them (we publish those values in our office section), although multiple perspectives is always beneficial. In all our tests we will explain in-depth what is being tested, and how we are testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Application Load: GIMP 2.10.4

One of the most important aspects about user experience and workflow is how fast does a system respond. A good test of this is to see how long it takes for an application to load. Most applications these days, when on an SSD, load fairly instantly, however some office tools require asset pre-loading before being available. Most operating systems employ caching as well, so when certain software is loaded repeatedly (web browser, office tools), then can be initialized much quicker.

In our last suite, we tested how long it took to load a large PDF in Adobe Acrobat. Unfortunately this test was a nightmare to program for, and didn’t transfer over to Win10 RS3 easily. In the meantime we discovered an application that can automate this test, and we put it up against GIMP, a popular free open-source online photo editing tool, and the major alternative to Adobe Photoshop. We set it to load a large 50MB design template, and perform the load 10 times with 10 seconds in-between each. Due to caching, the first 3-5 results are often slower than the rest, and time to cache can be inconsistent, we take the average of the last five results to show CPU processing on cached loading.

AppTimer: GIMP 2.10.4

Application loading is a walk in the park for the Core i9-9990XE. 

FCAT: Image Processing

The FCAT software was developed to help detect microstuttering, dropped frames, and run frames in graphics benchmarks when two accelerators were paired together to render a scene. Due to game engines and graphics drivers, not all GPU combinations performed ideally, which led to this software fixing colors to each rendered frame and dynamic raw recording of the data using a video capture device.

The FCAT software takes that recorded video, which in our case is 90 seconds of a 1440p run of Rise of the Tomb Raider, and processes that color data into frame time data so the system can plot an ‘observed’ frame rate, and correlate that to the power consumption of the accelerators. This test, by virtue of how quickly it was put together, is single threaded. We run the process and report the time to completion.

FCAT Processing ROTR 1440p GTX980Ti Data

FCAT is getting fairly unified across all the processors, with only a few percent separating all the Intel parts.

3D Particle Movement v2.1: Brownian Motion

Our 3DPM test is a custom built benchmark designed to simulate six different particle movement algorithms of points in a 3D space. The algorithms were developed as part of my PhD., and while ultimately perform best on a GPU, provide a good idea on how instruction streams are interpreted by different microarchitectures.

A key part of the algorithms is the random number generation – we use relatively fast generation which ends up implementing dependency chains in the code. The upgrade over the naïve first version of this code solved for false sharing in the caches, a major bottleneck. We are also looking at AVX2 and AVX512 versions of this benchmark for future reviews.

For this test, we run a stock particle set over the six algorithms for 20 seconds apiece, with 10 second pauses, and report the total rate of particle movement, in millions of operations (movements) per second. We have a non-AVX version and an AVX version, with the latter implementing AVX512 and AVX2 where possible.

3DPM v2.1 can be downloaded from our server: 3DPMv2.1.rar (13.0 MB)

3D Particle Movement v2.1

When we run our 3DPM test in a standard mode, the 9990XE again sees a slight regression compared to the 7940X, perhaps indicating that the mesh environment needs some extra MHz.

3D Particle Movement v2.1 (with AVX)

When adding AVX512 into the mix, the 9990XE rises up as with all the other Intel HEDT CPUs, but still can only match the slower 7940X despite having the same number of cores. At this point we're more core limited than frequency limited, indicating that there are some pipeline stalls in this test.

Dolphin 5.0: Console Emulation

One of the popular requested tests in our suite is to do with console emulation. Being able to pick up a game from an older system and run it as expected depends on the overhead of the emulator: it takes a significantly more powerful x86 system to be able to accurately emulate an older non-x86 console, especially if code for that console was made to abuse certain physical bugs in the hardware.

For our test, we use the popular Dolphin emulation software, and run a compute project through it to determine how close to a standard console system our processors can emulate. In this test, a Nintendo Wii would take around 1050 seconds.

The latest version of Dolphin can be downloaded from https://dolphin-emu.org/

Dolphin 5.0 Render Test

Dolphin is a heavily single threaded test, so we see the highest frequency from Intel and AMD at the top here.

DigiCortex 1.20: Sea Slug Brain Simulation

This benchmark was originally designed for simulation and visualization of neuron and synapse activity, as is commonly found in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron / 1.8B synapse simulation, equivalent to a Sea Slug.


Example of a 2.1B neuron simulation

We report the results as the ability to simulate the data as a fraction of real-time, so anything above a ‘one’ is suitable for real-time work. Out of the two modes, a ‘non-firing’ mode which is DRAM heavy and a ‘firing’ mode which has CPU work, we choose the latter. Despite this, the benchmark is still affected by DRAM speed a fair amount.

DigiCortex can be downloaded from http://www.digicortex.net/

DigiCortex 1.20 (32k Neuron, 1.8B Synapse)

DigiCortex likes memory frequency and internal speeds more than raw core frequency, and again the 9990XE doesn't perform too well here.

y-Cruncher v0.7.6: Microarchitecture Optimized Compute

I’ve known about y-Cruncher for a while, as a tool to help compute various mathematical constants, but it wasn’t until I began talking with its developer, Alex Yee, a researcher from NWU and now software optimization developer, that I realized that he has optimized the software like crazy to get the best performance. Naturally, any simulation that can take 20+ days can benefit from a 1% performance increase! Alex started y-cruncher as a high-school project, but it is now at a state where Alex is keeping it up to date to take advantage of the latest instruction sets before they are even made available in hardware.

For our test we run y-cruncher v0.7.6 through all the different optimized variants of the binary, single threaded and multi-threaded, including the AVX-512 optimized binaries. The test is to calculate 250m digits of Pi, and we use the single threaded and multi-threaded versions of this test.

Users can download y-cruncher from Alex’s website: http://www.numberworld.org/y-cruncher/

y-Cruncher 0.7.6 Single Thread, 250m Digitsy-Cruncher 0.7.6 Multi-Thread, 250m Digits

y-Cruncher is an AVX-512 accelerated test, and with the high frequency it gets the top score in our ST test. 

Agisoft Photoscan 1.3.3: 2D Image to 3D Model Conversion

One of the ISVs that we have worked with for a number of years is Agisoft, who develop software called PhotoScan that transforms a number of 2D images into a 3D model. This is an important tool in model development and archiving, and relies on a number of single threaded and multi-threaded algorithms to go from one side of the computation to the other.

In our test, we take v1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos and push it through a reasonably fast variant of the algorithms, but is still more stringent than our 2017 test. We report the total time to complete the process.

Agisoft’s Photoscan website can be found here: http://www.agisoft.com/

Agisoft Photoscan 1.3.3, Complex Test

Agisoft is a variable threaded workload, and it seems the Core i9-9990XE has the best combination of cores and threads.



CPU Performance: Office Tests

The Office test suite is designed to focus around more industry standard tests that focus on office workflows, system meetings, some synthetics, but we also bundle compiler performance in with this section. For users that have to evaluate hardware in general, these are usually the benchmarks that most consider.

All of our benchmark results can also be found in our benchmark engine, Bench.

3DMark Physics: In-Game Physics Compute

Alongside PCMark is 3DMark, Futuremark’s (UL’s) gaming test suite. Each gaming tests consists of one or two GPU heavy scenes, along with a physics test that is indicative of when the test was written and the platform it is aimed at. The main overriding tests, in order of complexity, are Ice Storm, Cloud Gate, Sky Diver, Fire Strike, and Time Spy.

Some of the subtests offer variants, such as Ice Storm Unlimited, which is aimed at mobile platforms with an off-screen rendering, or Fire Strike Ultra which is aimed at high-end 4K systems with lots of the added features turned on. Time Spy also currently has an AVX-512 mode (which we may be using in the future).

3DMark Physics - Ice Storm Unlimited3DMark Physics - Cloud Gate3DMark Physics - Sky Diver

In simpler titles like Ice Storm, having that high frequency causes the 9990XE to be the best physics calculator for this test that we have.

GeekBench4: Synthetics

A common tool for cross-platform testing between mobile, PC, and Mac, GeekBench 4 is an ultimate exercise in synthetic testing across a range of algorithms looking for peak throughput. Tests include encryption, compression, fast Fourier transform, memory operations, n-body physics, matrix operations, histogram manipulation, and HTML parsing.

I’m including this test due to popular demand, although the results do come across as overly synthetic, and a lot of users often put a lot of weight behind the test due to the fact that it is compiled across different platforms (although with different compilers).

We record the main subtest scores (Crypto, Integer, Floating Point, Memory) in our benchmark database, but for the review we post the overall single and multi-threaded results.

Geekbench 4 - ST OverallGeekbench 4 - MT Overall



CPU Performance: Web and Legacy Tests

While more the focus of low-end and small form factor systems, web-based benchmarks are notoriously difficult to standardize. Modern web browsers are frequently updated, with no recourse to disable those updates, and as such there is difficulty in keeping a common platform. The fast paced nature of browser development means that version numbers (and performance) can change from week to week. Despite this, web tests are often a good measure of user experience: a lot of what most office work is today revolves around web applications, particularly email and office apps, but also interfaces and development environments. Our web tests include some of the industry standard tests, as well as a few popular but older tests.

We have also included our legacy benchmarks in this section, representing a stack of older code for popular benchmarks.

All of our benchmark results can also be found in our benchmark engine, Bench.

WebXPRT 3: Modern Real-World Web Tasks, including AI

The company behind the XPRT test suites, Principled Technologies, has recently released the latest web-test, and rather than attach a year to the name have just called it ‘3’. This latest test (as we started the suite) has built upon and developed the ethos of previous tests: user interaction, office compute, graph generation, list sorting, HTML5, image manipulation, and even goes as far as some AI testing.

For our benchmark, we run the standard test which goes through the benchmark list seven times and provides a final result. We run this standard test four times, and take an average.

Users can access the WebXPRT test at http://principledtechnologies.com/benchmarkxprt/webxprt/

WebXPRT 3 (2018)

WebXPRT 2015: HTML5 and Javascript Web UX Testing

The older version of WebXPRT is the 2015 edition, which focuses on a slightly different set of web technologies and frameworks that are in use today. This is still a relevant test, especially for users interacting with not-the-latest web applications in the market, of which there are a lot. Web framework development is often very quick but with high turnover, meaning that frameworks are quickly developed, built-upon, used, and then developers move on to the next, and adjusting an application to a new framework is a difficult arduous task, especially with rapid development cycles. This leaves a lot of applications as ‘fixed-in-time’, and relevant to user experience for many years.

Similar to WebXPRT3, the main benchmark is a sectional run repeated seven times, with a final score. We repeat the whole thing four times, and average those final scores.

WebXPRT15

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a accrued test over a series of javascript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics. We report this final score.

Speedometer 2

Google Octane 2.0: Core Web Compute

A popular web test for several years, but now no longer being updated, is Octane, developed by Google. Version 2.0 of the test performs the best part of two-dozen compute related tasks, such as regular expressions, cryptography, ray tracing, emulation, and Navier-Stokes physics calculations.

The test gives each sub-test a score and produces a geometric mean of the set as a final result. We run the full benchmark four times, and average the final results.

Google Octane 2.0

Mozilla Kraken 1.1: Core Web Compute

Even older than Octane is Kraken, this time developed by Mozilla. This is an older test that does similar computational mechanics, such as audio processing or image filtering. Kraken seems to produce a highly variable result depending on the browser version, as it is a test that is keenly optimized for.

The main benchmark runs through each of the sub-tests ten times and produces an average time to completion for each loop, given in milliseconds. We run the full benchmark four times and take an average of the time taken.

Mozilla Kraken 1.1

3DPM v1: Naïve Code Variant of 3DPM v2.1

The first legacy test in the suite is the first version of our 3DPM benchmark. This is the ultimate naïve version of the code, as if it was written by scientist with no knowledge of how computer hardware, compilers, or optimization works (which in fact, it was at the start). This represents a large body of scientific simulation out in the wild, where getting the answer is more important than it being fast (getting a result in 4 days is acceptable if it’s correct, rather than sending someone away for a year to learn to code and getting the result in 5 minutes).

In this version, the only real optimization was in the compiler flags (-O2, -fp:fast), compiling it in release mode, and enabling OpenMP in the main compute loops. The loops were not configured for function size, and one of the key slowdowns is false sharing in the cache. It also has long dependency chains based on the random number generation, which leads to relatively poor performance on specific compute microarchitectures.

3DPM v1 can be downloaded with our 3DPM v2 code here: 3DPMv2.1.rar (13.0 MB)

3DPM v1 Single Threaded3DPM v1 Multi-Threaded

x264 HD 3.0: Older Transcode Test

This transcoding test is super old, and was used by Anand back in the day of Pentium 4 and Athlon II processors. Here a standardized 720p video is transcoded with a two-pass conversion, with the benchmark showing the frames-per-second of each pass. This benchmark is single-threaded, and between some micro-architectures we seem to actually hit an instructions-per-clock wall.

x264 HD 3.0 Pass 1x264 HD 3.0 Pass 2



Power Consumption, Frequencies, and Thermals

Across several articles we have covered why TDP numbers on the box are useless for most users: the loose definition of Intel’s TDP is that it represents the cooling required for the processor to run at the base frequency. ‘Cooling Required’ is a term referring to the power dissipation of a cooler, which isn’t strictly speaking the same as the CPU power consumption (because of losses), but close enough for our definitions here.

For the Core i9-9990XE, that means that when all 14 cores are running in a normal configuration at 4.0 GHz, with no turbo initiated, the CPU is guaranteed to be running at 255W or less. However, in our case, ICC has pushed the processor up to its turbo speed, 5.0 GHz, for an effective ‘infinite’ time. This means we never see 4.0 GHz, and only ever see 5.0 GHz.

In our testing, ICC did at least have some form of ‘Turbo’ enabled, which meant that the chip could run in idle states. At idle, the system would run at 1.2 GHz, but still at the same 1.29 volts that the chip was set to. This lead to a full-system idle power of 266W and a load temperature on the chip of 24C in a 20C ambient room. Unfortunately we could not measure the chip power directly due to some quirks of how Intel manages the power readouts in software. We were able to detect the mesh frequency at idle, which was 900 MHz.

When running a fully multithreaded test, such as Cinebench R20, the fact that every core hit 5.0 GHz was easy to detect. With the advent of features such as Speed Shift, Intel aims to get the CPU from idle to 5.0 GHz as quickly as possible. During a sustained CB20 run, which is possible through the command line, we were able to observe a peak power consumption of the system at 600W, which indicates that at 5.0 GHz this CPU is pulling an extra 334 W over idle – this power naturally being split mostly to the cores but some will be for the mesh and some will be in the efficiency of the power delivery. At full speed, the mesh will rise up to 2.4 GHz.

Naturally, fitting this into a 1U system requires the substantial cooling we described at the beginning – as this cooling is running at full speed even when idle, it doesn’t affect the power consumption when we ramp up the workload. But tying into the temperature, the internal sensors indicated a 81C peak temperature, while still at 1.290 volts. For a 14-core 5.0 GHz CPU, that’s pretty amazing.

For the audible testing, this thing is loud. With ICC’s proprietary liquid cooling solution, in such a small 1.75-inch form factor, in order to take care of those 350-400W that the CPU could draw, nothing short of some fast flow and high powered fans would suffice. This system runs the cooling at full speed both in idle and at full load, which in this instance measured a massive 78 decibels at only 1 ft (30cm) from a closed system. The fact that this is in a 1U form factor should give you an indication that it should be in a rack in a datacenter somewhere, and not in the office. I am not so lucky, and I was only able to perform testing on the system when everyone in my family and next door was out during the day.

We did some testing with AVX-512 tests. The CPU in this instance only hits 3.8 GHz when at full speed, indicating a -12 offset. It would appear that Intel, while pushing the single core frequency through binning, didn’t so much take into account AVX-512, or at least hoped that it would also be as efficient. In this mode we saw the same power consumption at a system level of around 600W, however the CPU thermals did rise slightly to 82C.

Due to the limitations of the motherboard in the system, which was locked down by the system provider, we were not able to attempt additional overclocking. That being said, I’m sure that the OEM partners and system integrators would prefer it if end users did not perform additional overclocking, lest this MSRP-less ‘no guarantee of any more chips’ processor actually bites the dust.



Intel Core i9-9990XE Conclusion

Intel never really announced the Core i9-9990XE into the market. We broke the story this year at CES in January after confirming from several sources in that initial auction said that it was taking place – a 14-core 5.0 GHz CPU and an unknown quantity would be available for select system integrators and OEM partners to bid on. There is no warranty from Intel, so these integrators were taking a risk, and could ultimately bid too high for a chip that might not sell.

In the end, that initial auction fell to (at least) three companies, of which two ended up with the CPUs. We very quickly found out that CaseKing snapped up most of them, and the company eventually ended up putting them for direct sale (with 1 year warranty) on their connected websites for €2999 (now €2849) as well as offering several of their high profile water cooled extreme overclocked systems with the chip inside. We also saw Puget Systems with at least one, and another companies was ICC, an Intel partner that focuses on a number of markets including the financial market. It was ICC who built a 1U system for this chip and sampled the system for us to review.

The system was provided with custom proprietary liquid cooling, which we’re not able to show you. The thing is a beast, however, and can appropriately cool up to 400W of CPU in a 1.75-inch form factor. It’s also loud, registering 78 decibels whether the system is at idle or running a full workload. Given that it is a 1U server, this would suggest that a datacenter is the best place for it. I have no doubt that it could be transferred into a tower, although much like the 28-core Xeon W-3175X we tested in January, it requires a substantial cooling setup to be tamed.

In performance, the tweaked system from ICC was built for low latency financial trading. It was only paired with 32 GB of DDR4, but running at DDR4-3600 with tuned subtimings. We added in our standard testing SSD and GPU, although due to the complexity of the system build we weren’t able to run games on this thing. But for raw ST performance, the Core i9-9990XE puts all the other high-end desktop chips to shame – as it should do. Everything from Intel on a Core chip gets obliterated, and against the Xeon W-3175X which has 28-cores, the Xeon does go ahead just on the multithreaded stuff but this Core i9-9990XE kills it when frequency is the limiting factor. This shows up in our compile test, where the right balance of cores and frequency are needed - the Core i9-9990XE set a new world record in our benchmark. There are some caveats - the mesh frequency does seem to be a little bit of a hold back in some tests, or frequency going in and out of turbo modes can cause additional delays in tests.

Against AMD counterparts, that 5.0 GHz frequency carves through anything like butter. Where AMD has to play is on its 32-core Threadripper CPUs, and even then it’s a tradeoff – 14 cores at 5.0 GHz against 32 cores at ~3.4 GHz means that the 2990WX has a lead only it’s a raw compute problem, but put in any memory limited scenario, or add in AVX2/AVX512, and the Core i9-9990XE is going to win.

We obviously haven’t talked price. The W-3175X is a similar $3000 to the i9-9990XE, but has ECC support and six memory channels, but doesn’t have that single thread frequency. The 2990WX is a NUMA design that works well in focused applications rather than the i9-9990XE which works well in almost every scenario, but the 2990WX is 30-40% cheaper.

Comparing the i9-9990XE
Intel   AMD
Xeon
W-3175X
Core i9
9990XE
Core i9
9900KS
AnandTech Ryzen
7 3950X
TR 2
2990WX
EPYC
7542
28 14 8 Cores 16 32 32
56 28 16 Threads 32 64 64
3.1 4.0 4.0 Base 3.5 3.0 2.9
  5.0 5.0 All-Core      
4.5 5.0 5.0 Turbo 4.7 4.2 3.4
255 W 255 W 127 W? TDP 105 W 250 W 225W
6 x 2666 4 x 2666 2 x 2666 DDR4 2 x 3200 4 x 2933 8 x 3200
48 44 16 PCIe 24 64 128
$2999 $auction $513 MSRP $749 $1799 $3400

Then around the corner we have Intel’s 8-core 5.0 GHz processor, the Core i9-9900KS. This is a consumer level processor, with only two memory channels and 16 PCIe 3.0 lanes, but is set to be $513 when launched in a couple of days (October 30th). Users interested in an all-core 5.0 GHz processor out of the box (i.e., not overclocked) are likely to find that the 9900KS acts as a good starter position, which might be able to be scaled with the 9990XE when things like memory bandwidth start becoming an issue.

On the topic of sustainability, no-one is going to be able to deploy the Core i9-9990XE en-mass: Intel only has a few chips that meet the specifications, and these are auctioned to system integrators. So unless a customer wants a specific number, they will have to work with an system integrator with a set budget for that auction in mind, and even then, there’s no guarantee that Intel will have that many chips available (or if someone will outbid you). There’s also no-warranty on the parts from the perspective of the system integrator, so that adds additional cost. Companies looking at one of these systems might have to consider them as one-offs for their deployment, whereas by comparison, we expect there to be more Core i9-9900KS processors in the wild for companies to buy direct from retailers.

Ultimately, the Core i9-9990XE is a curio. It’s a hell of a curio, that’s for sure. It is like one of the house robots on Robot Wars (UK) or BattleBots (US): something completely outside the rules of normal sportsmanship and is big enough to beat you to a pulp, and it’s very rare that you would even own one, not at least before it owns you.

 

Log in

Don't have an account? Sign up now