
Original Link: https://www.anandtech.com/show/15483/amd-threadripper-3990x-review
The 64 Core Threadripper 3990X CPU Review: In The Midst Of Chaos, AMD Seeks Opportunity
by Dr. Ian Cutress & Gavin Bonshor on February 7, 2020 9:00 AM EST
The recent renaissance of AMD as the performance choice in the high-end x86 market has been great for consumers, enabling a second offering at the top-end of the market. Where Intel offers 28 cores, AMD offers 24 and 32 core parts for the high-end desktop, and to rub salt into the wound, there is now a 64 core offering. This CPU isn’t cheap: the Ryzen Threadripper 3990X costs $3990 at retail, more than any other high-end desktop processor in history, but with it AMD aims to provide the best single socket consumer processor money can buy. We put it through its paces, and while it does obliterate the competition, there are a few issues with having this many cores in a single system.
I Want Performance, What Are My Options
The new AMD Ryzen Threadripper 3990X is a 64 core, 128 thread processor designed for the high-end desktop market. The CPU is a variant of AMD’s Enterprise EPYC processor line, offering more frequency and a higher power budget, but fewer memory channels, fewer PCIe, and a lower memory capacity support. The 3990X is at that cusp between consumer and enterprise based on its features and cost, and it’s ultimately going to compete against both. On paper, users who don’t necessarily need all of the 64 core EPYC features might turn to the 3990X, whereas consumers who need more than 32 cores are going to look here as well. We’re going to test against both.
The TR3990X is part of the Threadripper 3000 family, and will partner its 32 core and 24 core brethren in being paired with new TRX40 motherboards. Despite the same socket as the previous generation Threadrippers, AMD broke motherboard compatibility this time around in order to support PCIe 4.0 from the CPU to the chipset, allowing for higher bandwidth configurations for extra controllers. We’ve covered all 12 of the TRX40 motherboards on the market in our motherboard and chipset overview, with a lot of models focusing on 3x PCIe 4.0 x16 support, multi-gigabit Ethernet onboard, Wi-Fi 6, and one even adding in Thunderbolt 3.
ASUS ROG Zenith II Alpha Motherboard, Built for 3990X
All the Threadripper 3000 family CPUs support a total of 64 PCIe 4.0 lanes from the CPU, and another 24 from the chipset (however each of these use eight lanes to communicate with each other). There are four memory channels, supporting up to DDR4-3200 memory, and each CPU has a rated TDP of 280 W. We’ve tested the 3970X and 3960X when those CPUs were launched – you can read the review here.
AMD Zen 2 Socketed CPUs | |||||||
AnandTech | Cores/ Threads |
Base/ Turbo |
L3 | DRAM 1DPC |
PCIe | TDP | SRP |
Third Generation Threadripper | |||||||
TR 3990X | 64 / 128 | 2.9 / 4.3 | 256 MB | 4x3200 | 64 | 280 W | $3990 |
TR 3970X | 32 / 64 | 3.7 / 4.5 | 128 MB | 4x3200 | 64 | 280 W | $1999 |
TR 3960X | 24 / 48 | 3.8 / 4.5 | 128 MB | 4x3200 | 64 | 280 W | $1399 |
Ryzen 3000 | |||||||
Ryzen 9 3950X | 16 / 32 | 3.5 / 4.7 | 64 MB | 2x3200 | 24 | 105 W | $749 |
The new CPU, the 3990X, comes at the hefty price of $1 per 'X' (because it's called the 3990X and costs $3990, get it?). With 64 cores it has a rated base frequency of 2.9 GHz, and a turbo of 4.3 GHz. In our testing, we saw the single core frequency go as high as 4.35 GHz, above the rated turbo, and the all-core turbo around 3.45 GHz.
Who is This CPU Aimed At?
Not everyone needs 64 cores, and AMD has been very clear about this in their messaging. Even though the 3990X is part of AMD’s high-end desktop line, because it’s breaking new ground in core count and price, it sort of goes beyond the high-end, essentially eclipsing the prosumer/server market. This means users (and companies) that can amortize and justify the cost of the hardware as it enables them to complete projects (and therefore contracts) faster. For a user that needs to create something, rather than doing 25 prototypes a week, doing 100 per week makes their workflow a lot more complete, and it’s this sort of user AMD is going after.
Render farms that run on CPU is going to be a key example. AMD has already promoted the fact that several animation and VFX studios that produce effects in blockbuster films have been running engineering samples of the 64-core Threadripper processors for titles already in the market. Then there are video game production houses and architects, that want to rapidly prototype demo models and shorten the time to create each prototype – something that might not be able to be done on GPU (and isn’t AVX-512 accelerated).
The 3990X with 64 cores is $3990, double the cost of the 3970X with its 32 cores at $1999. Doubling the cores is an obvious step up, however there isn’t an increase in memory bandwidth or PCIe lanes, so users need to be sure that the CPU is the bottleneck of their workload.
AMD TR3 | ||
TR3 3990X | AnandTech | TR3 3970X |
$3990 | SEP | $1999 |
64 / 128 | Cores/Threads | 32 / 64 |
2.9 GHz | Base Frequency | 3.7 GHz |
3.45 GHz | All-Core Freq (As Tested) | 3.81 GHz |
4.3 GHz | Single-Core Frequency | 4.5 GHz |
64 | PCIe 4.0 Lanes | 64 |
8 x DDR4-3200 | DDR4 Support | 8 x DDR4-3200 |
256 GB / 512 GB | Max DDR4 Capacity | 256 GB / 512 GB |
280 W | TDP | 280 W |
If we put the 3990X against the EPYC 7702P, the 64-core single socket offering on the enterprise side, then the 3990X has a higher thermal window (280W vs 200W) to enable higher frequencies (2.9/4.3 vs 2.0/3.35) and is cheaper ($3990 vs $4425), but it only has half the memory channels (only 4 compared to 8), half the PCIe lanes (only 64 compared to 128), and no registered memory support. The question here is whether the workload the user is looking at requires more memory/PCIe for the EPYC, or more raw CPU performance for the Threadripper.
Then there’s the competition against the Intel processors. In the high-end desktop market, Intel has nothing to compete, with the maximum product at 18 cores. It does offer a 28-core workstation part, the W-3175X, which is unlocked, with a TDP of 255W, six memory channels, 44 PCIe 3.0 lanes, at a high cost of $2999. Then there’s the server CPUs – if we want parity to the 64 cores of the 3990X, we either need to use a single Xeon Platinum 9282 with 56 cores, which isn’t available without a big contract and it has an unknown price ($25k+?), or dual Xeon Platinum 8280s, with two lots of 28 cores, at a tray price of $20018.
64-core Battle | ||
1 x TR3 3990X | AnandTech | 2 x Xeon 8280 |
$3990 | Price | $20018 |
64 / 128 | Cores/Threads | 56 / 112 |
2.9 GHz | Base Frequency | 2.7 GHz |
3.45 GHz | All-Core Freq | 3.30 GHz |
4.3 GHz | Single-Core Freq | 4.0 GHz |
4.0 x64 | PCIe Lanes | 3.0 x96 |
8 x DDR4-3200 | DDR4 Support | 12 x DDR4-2933 |
256 GB / 512 GB | Max DDR4 Capacity | 1536 GB |
280 W | TDP | 410 W |
We’re testing against the dual 8280s and the W-3175X as well. Please note our 2x8280 results are from an older review, and so it hasn’t been run on some of our newer benchmarks.
This Review
In this review, we want to cover the Threadripper 3990X in terms of frequency, temperature, power, and performance. There’s a big caveat we have to discuss in terms of operating system choice, which we’ll go into in the next few pages. But our main comparison points are dependent on whether you are a consumer looking at a faster desktop, or an enterprise user looking at an alternative server replacement. We’ll cover both angles here.
Frequency, Temperature, and Power
A lot of questions will be asked about the frequency, temperature, and power of this chip: splitting 280W across all the cores might result in a low all-core frequency and require a super high current draw, or given recent reports of AMD CPUs not meeting their rated turbo frequencies. We wanted to put our data right here in the front half of the review to address this straight away.
We kept this test simple – we used our new NAMD benchmark, a molecular dynamics compute solver, which is an example workload for a system with this many cores. It’s a heavy all-core load that continually cycles around the ApoA1 test simulating as many picoseconds of molecular movement as possible. We run a frequency and thermal logger, left the system idle for 30 seconds to reach an idle steady state, and then fired up the benchmark until a steady state was reached.
For the frequencies we saw an ‘idle’ of ~3600 MHz, which then spiked to 4167 MHz when the test began, and average 3463 MHz across all cores over the first 6 minutes or so of the test. We saw a frequency low point of 2935 MHz, however in this context it’s the average that matters.
For thermals on the same benchmark, using our Thermaltake Riing 360 closed loop liquid cooler, we saw 35ºC reported on the CPU at idle, which rose to 64ºC after 90 seconds or so, and a steady state after five minutes at 68ºC. This is an ideal scenario, due to the system being on an open test bed, but the thing to note here is that despite the high overall power of the CPU, the power per core is not that high.
This is our usual test suite for per-core power, however I’ve condensed it horizontally as having all 64 cores is a bit much. At the low loads, we’re seeing the first few cores take 8-10W of power each, for 4.35 GHz, however at the other end of the scale, the CPUs are barely touching 3.0 W each, for 3.45 GHz. At this end of the spectrum, we’re definitely seeing AMD’s Zen 2 cores perform at a very efficient point, and that’s even without all 280 W, given that around 80-90W is required for the chipset and inter-chip infinity fabric: all 64 cores, running at almost 3.5 GHz, for around 200W. From this data, we need at least 20 cores active in order to hit the full 280W of the processor.
We can compare these values to other AMD Threadripper processors, as well as the high-end Ryzens:
AMD Power/Frequency Comparison | |||||||
AnandTech | Cores | CPU TDP | 1-Core Power |
1-Core Freq |
Full Load Power/core |
Full Load Freq |
|
3990X | 64 | 280 W | 10.4 W | 4350 | 3.0 W | 3450 | |
3970X | 32 | 280 W | 13.0 W | 4310 | 7.0 W | 3810 | |
3960X | 24 | 280 W | 13.5 W | 4400 | 8.6 W | 3950 | |
3950X | 16 | 105 W | 18.3 W | 4450 | 7.1 W | 3885 |
The 3990X exhibits a much lower power-per-core value than any of the other CPUs, which means a lower per-core frequency, but it isn’t all that far off at all: less than half the power for only 400 MHz less. This is where the real efficiency of these CPUs comes into play.
The Windows and Multithreading Problem (A Must Read)
Unfortunately, not everything is just as straightforward as installing Windows 10 and going off on a 128 thread adventure. Most home users that have Windows typically have versions of Windows 10 Home or Windows 10 Pro, which are both fairly ubiquitous even among workstation users. The problem that these operating systems have rears its ugly head when we go above 64 threads. Now to be clear, Microsoft never expected home (or even most workstations) systems to go above this amount, and to a certain extent they are correct.
Whenever Windows experiences more than 64 threads in a system, it separates those threads into processor groups. The way this is done is very rudimentary: of the enumerated cores and threads, the first 64 go into the first group, the second 64 go into the next group, and so on. This is most easily observed by going into task manager and trying to set the affinity of a particular program:
With our 64 core processor, when simultaneous multithreading is enabled, we get a system with 128 threads. This is split into two groups, as shown above.
When the system is in this mode, it becomes very tricky for most software to operate properly. When a program is launched, it will be pushed into one of the processor groups based on load – if one group is busy, the program will be spawned in the other. When the program is running inside the group, unless it is processor group aware, then it can only access other threads in the same group. This means that if a multi-threaded program can use 128 threads, if it isn’t built with processor groups in mind, then it might only spawn with access to 64.
If this sounds somewhat familiar, then you may have heard of NUMA, or non-uniform memory architecture. This occurs when the CPU cores in the system might have different latencies to main memory, such as within a dual socket system: it can be quick to access the memory directly attached to its own core, but it can be a lot slower if a core needs to access memory attached to the other physical CPU. Processor groups is one way around this, to stop threads jumping from CPU to CPU. The only issue here is that despite having 128 threads on the 3990X, it’s all one CPU!
In Windows 10 Pro, this becomes a problem. We can look directly at Task Manager:
Here we see all 64 cores and 128 threads being loaded up with an artificial load. The important number here though is the socket count. The system thinks that we have two sockets, just because we have a high number of threads in the system. This is a big pain, and the source of a lot of slowdowns in some benchmarks.
(Interestingly enough, Intel’s latest Xeon Phi chips with 72 lightweight cores and 4-way HT for 288 threads show up as five sockets. How’s that for pain!)
Of course, there is a simple solution to avoid all of this – disable simultaneous multithreading. This means we still have 64 cores but now there’s only one processor group.
We still have most of the performance on the chip (and we’ll see later in the benchmarks). However, some of the performance has been lost – if I wanted 64 threads, I’d save some money and get the 32-core! There seems to be no easy way around this.
But then we remember that there are different versions of Windows 10.
From Wikipedia
Microsoft at retail sells Windows 10 Home, Windows 10 Pro, Windows 10 Pro for Workstations, and we can also find keys for Windows 10 Enterprise for sale. Each of these, aside from the usual feature limitations based on the market, also have limitations on processor counts and sockets. In the diagram above, we can see where it says Windows 10 Home is limited to 64 cores (threads), whereas Pro/Education versions go up to 128, and then Workstation/Enterprise to 256. There’s also Windows Server.
Now the thing is, Workstation and Enterprise are built with multiple processor groups in mind, whereas Pro is not. This has comes through scheduler adjustments, which aren’t immediately apparent without digging deeper into the finer elements of the design. We saw significant differences in performance.
In order to see the differences, we did the following comparisons:
- 3990X with 64 C / 128 T (SMT On), Win10 Pro vs Win10 Ent
- Win 10 Pro with 3990X, SMT On vs SMT Off
This isn’t just a case of the effect SMT has on overall performance – the way the scheduler and the OS works to make cores available and distribute work are big factors.
In 3DPM, with standard non-expert code, the difference between SMT on and off is 8.6%, however moving to Enterprise brings half of it back.
When we move to hand-tuned AVX code, the extra threads can be used and per-thread gets a 2x speed increase. Here the Enterprise version again gets a small lead over the Pro.
DigiCortex is a more memory bound benchmark, and we see here that disabling SMT scores a massive gain as it frees up CPU-to-memory communication. Enterprise claws back half that gain while keeping SMT enabled.
Photoscan is a variable threaded test, but having SMT disabled gives the better performance with each thread having more resources on tap. Again, W10 Enterprise splits the difference between SMT off and on.
Our biggest difference was in our new NAMD testing. Here the code is AVX2 accelerated, and the difference to watch out for is with SMT On, going from W10 Pro to W10 Ent is a massive 8.3x speed up. In regular Pro, we noticed that when spawning 128 threads, they would only sit on 16 actual cores, or less than, with the other cores not being utilized. In SMT-Off mode, we saw more of the cores being used, but the score still seemed to be around the same as a 3950X. It wasn’t until we moved to W10 Enterprise that all the threads were actually being used.
On the opposite end of the scale, Corona can actually take advantage of different processor groups. We see the improvement moving from SMT off to SMT On, and then another small jump moving to Enterprise.
Similarly in our Blender test, having processor groups was no problem, and Enterprise gets a small jump.
POV-Ray benefits from having SMT disabled, regardless of OS version.
Whereas Handbrake (due to AVX acceleration) gets a big uplift on Windows 10 Enterprise
What’s The Verdict?
From our multithreaded test data, there can only be two conclusions. One is to disable SMT, as it seems to get performance uplifts in most benchmarks, given that most benchmarks don’t understand what processor groups are. However, if you absolutely have to have SMT enabled, then don’t use normal Windows 10 Pro: use Pro for Workstations (or Enterprise) instead. At the end of the day, this is the catch in using hardware that's skirting the line of being enterprise-grade: it also skirts the line with triggering enterprise software licensing. Thankfully, workstation software that is outright licensed per core is still almost non-existent, unlike the server realm.
Ultimately this puts us in a bit of a quandary for our CPU-to-CPU comparisons on the following pages. Normally we run our CPUs on W10 Pro with SMT enabled, but it’s clear from these benchmarks that in every multithreaded scenario, we won’t get the best result. We may have to look at how we test processors >16 cores in the future, and run them on Windows 10 Enterprise. Over the following pages, we’ll include W10 Pro and W10 Enterprise data for completeness.
AMD 3990X Against Prosumer CPUs
The first set of consumers that will be interested in this processor will be those looking to upgrade into the best consumer/prosumer HEDT package available on the market. The $3990 price is a high barrier to entry, but these users and individuals can likely amortize the cost of the processor over its lifetime. To that end, we’ve selected a number of standard HEDT processors that are near in terms of price/core count, as well as putting in the 8-core 5.0 GHz Core i9-9900KS and the 28-core unlocked Xeon W-3175X.
AMD 3990X Consumer Competition | ||||||
AnandTech | AMD 3990X |
AMD 3970X |
Intel 3175X |
Intel i9- 10980XE |
AMD 3950X |
Intel 9900KS |
SEP | $3990 | $1999 | $2999 | $979 | $749 | $513 |
Cores/T | 64/128 | 32/64 | 28/56 | 18/36 | 16/32 | 8/16 |
Base Freq | 2900 | 3700 | 3100 | 3000 | 3500 | 5000 |
Turbo Freq | 4300 | 4500 | 4300 | 4800 | 4700 | 5000 |
PCIe | 4.0 x64 | 4.0 x64 | 3.0 x48 | 3.0 x48 | 4.0 x24 | 3.0 x16 |
DDR | 4x 3200 | 4x 3200 | 6x 2666 | 4x 2933 | 2x 3200 | 2x 2666 |
Max DDR | 512 GB | 512 GB | 512 GB | 256 GB | 128 GB | 128 GB |
TDP | 280 W | 280 W | 255 W | 165 W | 105 W | 127 W |
The 3990X is beyond anything in price at this level, and even at the highest consumer cost systems, $1000 could be the difference between getting two or three GPUs in a system. There has to be big upsides here moving from the 32 core to the 64 core.
Corona is a classic 'more threads means more performance' benchmark, and while the 3990X doesn't quite get perfect scaling over the 32 core, it is almost there.
The 3990X scores new records in our Blender test, with sizeable speed-ups against the other TR3 hardware.
Photoscan is a variable threaded test, and the AMD CPUs still win here, although 24 core up to 64 core all perform within about a minute of each other in this 20 minute test. Intel's best consumer hardware is a few minutes behind.
y-cruncher is an AVX-512 accelerated test, and so Intel's 28-core with AVX-512 wins here. Interestingly the 128 cores of the 3990X get in the way here, likely the spawn time of so many threads is adding to the overall time.
GIMP is a single threaded test designed around opening the program, and Intel's 5.0 GHz chip is the best here. the 64 core hardware isn't that bad here, although the W10 Enterprise data has the better result.
Without any hand tuned code, between 32 core and 64 core workloads on 3DPM, there's actually a slight deficit on 64 core.
But when we crank in the hand tuned code, the AVX-512 CPUs storm ahead by a considerable margin.
We covered Digicortex on the last page, but it seems that the different thread groups on W10 Pro is holidng the 3990X back a lot. With SMT disabled, we score nearer 3x here.
Luxmark is an AVX2 accelerated program, and having more cores here helps. But we see little gain from 32C to 64C.
As we saw on the last page, POV-Ray preferred having SMT off for the 3990X, otherwise there's no benefit over the 32-core CPU.
AES gets a slight bump over the 32 core, however not as much as the 2x price difference would have you believe.
As we saw on the previous page, W10 Enterprise causes our Handbrake test to go way up, but on W10 Pro then the 3990X loses ground to the 3950X.
And how about a simple game test - we know 64 cores is overkill for games, so here's a CPU bount test. There's not a lot in it between the 3990X and the 3970X, but Intel's high frequency CPUs are the best here.
Verdict
There are a lot of situations where the jump from AMD's 32-core $1999 CPU, the 3970X, up to the 64-core $3990 CPU only gives the smallest tangible gain. That doesn't bode well. The benchmarks that do get the biggest gains however can get near perfect scaling, making the 3990X a fantastic upgrade. However those tests are few and far between. If these were the options, the smart money is on the 3970X, unless you can be absolutely clear that the software you run can benefit from the extra cores.
AMD 3990X Against $20k Enterprise CPUs
For those looking at a server replacement CPU, AMD’s big discussion point here is that in order to get 64 cores on Intel hardware is relatively hard. The best way to get there is with a dual socket system, featuring two of its 28-core dies at a hefty $10k a piece. AMD’s argument is that users can consolidate down to a single socket, but also have better memory support, PCIe 4.0, and no cross-memory domain issues.
AMD 3990X Enterprise Competition | |||
AnandTech | AMD 3990X |
AMD 7702P |
Intel 2x8280 |
SEP | $3990 | $4450 | $20018 |
Cores/Threads | 64 / 128 | 64 / 128 | 56 / 112 |
Base Frequency | 2900 | 2000 | 2700 |
Turbo Frequency | 4300 | 3350 | 4000 |
PCIe | 4.0 x64 | 4.0 x128 | 3.0 x96 |
DDR4 Frequency | 4x 3200 | 8x 3200 | 12x 2933 |
Max DDR4 Capacity | 512 GB | 2 TB | 3 TB |
TDP | 280 W | 200 W | 410 W |
Unfortunately I was unable to get ahold of our Rome CPUs from Johan in time for this review, however I do have data from several dual Intel Xeon setups that I did a few months ago, including the $20k system.
This time with Corona the competition is hot on the heels of AMD's 64-core CPUs, but even $20k of hardware can't match it.
The non-AVX verson of 3DPM puts the Zen 2 hardware out front, with everything else waiting in the wings.
When we add in the AVX-512 hand tuned code, the situation flips: Intel's 56 cores get almost 2.5x the score of AMD, despite having fewer cores.
Blender doesn't seem to like the additional access latency from the 2P systems.
For AES encoding, as the benchmark takes places from memory, it appears that none of Intel's CPUs can match AMD here.
For the 7-zip combined test, there's little difference between AMD's 32-core and 64-core, but there are sizable jumps above Intel hardware.
Verdict
In our tests here (more in our benchmark database), AMD's 3990X would get the crown over Intel's dual socket offerings. The only thing really keeping me back from giving it is the same reason there was hesitation on the previous page: it doesn't do enough to differentiate itself from AMD's own 32-core CPU. Where AMD does win is in that 'money is less of an issue scenario', where using a single socket 64 core CPU can help consolidate systems, save power, and save money. Intel's CPUs have a TDP of 205W each (more if you decide to use the turbo, which we did here), which totals 410W, while AMD maxed out at 280W in our tests. Technically Intel's 2P has access to more PCIe lanes, but AMD's PCIe lanes are PCIe 4.0, not PCIe 3.0, and with the right switch can power many more than Intel (if you're saving 16k, then a switch is peanuts).
We acknowledge that our tests here aren't in any way a comprehensive test of server level workloads, but for the user base that AMD is aiming for, we'd take the 64 core (or even the 32 core) in most circumstances over two Intel 28 core CPUs, and spend the extra money on memory, storage, or a couple of big fat GPUs.
Conclusions
The art to building a good CPU is balance: you want something that is fast for individual streams of instructions and data, but also fast for multiple streams. You need something that is also power efficient, high yielding, and can be put together quite easily, with software out there already able to take advantage of what you have made.
“Opportunities multiply as they are seized.”
AMD has succeeded at a time when its competitor has struggled. As AMD launched its Zen 2 hardware across its Ryzen and EPYC product lines, built on TSMC’s 7nm, Dr. Lisa Su the CEO stated in interviews to AnandTech that:
‘We've executed our roadmap from the previous five years and we’re extending it into the next 5 years, all while assuming our competition will be competitive and even beating their public targets.’
At a time when Intel is struggling with its 10nm manufacturing process, AMD is targeting where Intel should have been if it had executed to time. The fact that Intel has suffered issues has benefited AMD, with its latest Ryzen and EPYC CPUs taking high praise. The follow on from these has been Threadripper, and the first two Zen 2 based Threadripper CPUs were quite good. I even used the word ‘bloodbath’ in the review, it was that impressive compared to what Intel had to offer.
Read our Initial Threadripper 3000 Series Review Here
With this third Threadripper 3000 processor, the 3990X, AMD is hoping to capitalize on its successes. The concept here is relatively simple: more of the same. Double the high-performance Zen 2 cores, at only slightly lower frequencies per core, for the same power – if a user has the right workload, then it’s the ideal processor.
And there-in likes the crux of this CPU; what is the right workload?
“Know yourself and you will win all battles”
One of the continual talking points about new CPUs is if the ecosystem is ready for them, especially with AMD pushing core counts ever higher. There’s no point having a million cores if everything is written for a few cores – not everyone runs a thousand copies of the same workload at the same time. Unfortunately this is what happened here with the 3990X. We’re in a situation where only a few software packages (that we tested) work great with the CPU, but it’s also the operating system that’s behind.
In our reviews, I prefer Windows for both comfort but also because a lot of the user base is on Windows. We typically use Windows 10 Pro, but because this CPU has 128 total threads, the regular version of Windows 10 Pro has issues – we had to move to Windows 10 Enterprise in order to see a difference. The alternative was to disable simultaneous multithreading, taking us back to one thread per core, which actually worked really well for a lot of tests, but also left some performance on the table. We suggest that 3990X users who typically have Windows 10 Pro do one of these two things: either disable SMT or use Win10 Pro for Workstations/Enterprise. This issue is down to how Windows tracks processor groups, an adage from multi-socket platforms, which shouldn’t apply here but because it’s hard coded into the OS when we have above 64 threads, it’s a pain.
Then there’s also the workload issue: we saw a number of tests, like Corona, Blender, and even NAMD, work great, which points to rendering and scientific compute benefiting from such a high core count processor. However other programs, such as 7-zip, LuxMark, Photoscan, and others did not see much (if any) of an improvement in performance compared to AMD’s own 32-core CPU.
I’ve heard a lot of silicon engineers say that adding cores helps, but adding frequency helps everything. The question then becomes whether you target workloads that can scale out (more cores) best, or whether scaling up (more frequency) is a better solution. We either end up with target CPUs for one or the other, or a combination CPU that tries to do both.
“[He] who wishes to fight must first count the cost”
In this review we evaluated two directions for AMD’s 64-core 3990X. The first was at the consumer/prosumer level, looking up to improve on their high-end desktop system. The second was at the enterprise level, looking down to see if that single 64-core CPU is actually worth it compared to a dual socket system. The conclusion might shock you. (It might not.)
For the first stage, the consumer/prosumer level, our conclusion is that the usefulness of the 3990X is limited. Aside from a few select instances (as mentioned, Corona, Blender, NAMD) the 32-core Threadripper for half the price performed on par or with margin. For this market, saving that $2000 between the 64-core and the 32-core can easily net another RTX 2080 Ti for GPU acceleration, and this would probably be the preferred option. Unless you run those specific tests (or ones like it), then go for the 32 core and spend the money elsewhere. Aside from the core count there is little to differentiate the two parts.
The second stage, the enterprise level, it becomes a no brainer to consolidate a dual socket system into a single AMD CPU – the initial outlay cost is substantially lower, and the long term power costs also come into play. This is what the enterprise likes to combine into ‘Total Cost of Ownership’, or TCO. The TCO and performance advantage of AMD here is plain to see in the benchmarks and the pricing. The situation gets a little muddier when we compare which AMD CPU to choose from: typically a server market wants RDIMM memory, which only comes from the EPYC processors. The difference between the 64-core EPYC 7702P and Threadripper 3990X is minor in terms of cost (under $500), and each CPU has its benefits: EPYC gets more PCIe lanes (128 vs 64) and more memory (8 channel RDIMM vs 4 channel UDIMM), while Threadripper gets better frequencies (2900/4300 vs 2000/3350) for a higher TDP (280W vs 200W). From a server perspective, if you need more IO or more memory, get the EPYC, otherwise Threadripper merits consideration.
“Do many calculations [to] lead to victory”
In the end, the situation for the 3990X is not as clear as it was with the 3970X. It’s a good chip, but it’s not the best chip for everything. I will tell you what it is good at though: ever seen Cinebench R20 complete in 16 seconds? Here you go:
A final thought. The AMD TR 3990X is amusingly priced at $3990. It’s a great marketing idea, and gets people talking. I’m proud to say that this price was my idea – AMD originally had it for something different. I don’t often influence change in the industry in such an obvious way, but this one was fun.
True story: the $3990 price tag on the 3990X is @IanCutress's doing. https://t.co/7CpuwubS6L
— Ryan Smith (@RyanSmithAT) January 6, 2020