Original Link: https://www.anandtech.com/show/13116/the-intel-xeon-w-review-w-2195-w-2155-w-2123-w-2104-and-w-2102-tested
The Intel Xeon W Review: W-2195, W-2155, W-2123, W-2104 and W-2102 Tested
by Ian Cutress & Joe Shields on July 30, 2018 1:00 PM EST- Posted in
- CPUs
- Intel
- Xeon
- Workstation
- ECC
- Skylake-SP
- Skylake-X
- Xeon-W
- Xeon Scalable
Anyone looking at a high-end Intel system has three choices: Core i9, Xeon W, or the larger socket Xeon Scalable. Those first two both use the LGA2066 socket, and have identical core/frequency configurations, but are in effect different platforms with locked motherboards for each. The benefits of the Xeon W and Xeon Scalable lie in the ability for ECC memory, vPro management features, and with some processors there are different cache variants.
In a previous generation, Intel had workstation counterparts in line with its high-end desktop line. Both of the products used the same socket, which made it easier for consumers, and the same single socket motherboard that held a Core i7 could also run a range of Xeons: the big E7 series, the dual-socket focused E5-2600 series, and the workstation-focused E5-1600 series. The benefits of these workstation chips were primarily for ECC memory, management features, and OEM support.
Cycle forward to today, and due to socket bifurcation, none of the server focused processors will fit into a modern HEDT platform. To that extent Intel created the Xeon W family, borne from the E5-1600 line, which matches up with the consumer line but with the usual ECC/OEM add-ons. Intel also cut consumer chipset support, pushing Xeon W out of consumer hands and purely into the OEM/system market due to the lack of server chipset based motherboards at retail. Despite the doom and gloom, Supermicro recently sampled us their server chipset X11SRA and a handful of Xeon W processors for review.
The Xeon W Line-Up
Announced back in January 2018, the Xeon W launch was somewhat unexpected: we had reason to believe that Intel would introduce components for the consumer high-end desktop socket with ECC, however what form that would take was unknown, especially with processors up to 18 cores being released on the consumer side. Intel would ultimately have to draw parity with the Xeon W line, potentially causing a shift in its single socket market on the server side as well.
In the end, Intel released eight new Xeon W processors for the market, along with two off-roadmap processors for particular OEMs, and a single version for Apple. The configurations are essentially identical to the consumer HEDT line, using the Enterprise-focused Skylake-SP cores with new AVX512 instructions, a new mesh interconnect, and a rearranged cache structure focusing on L2 data. We have covered the changes compared to the standard Skylake-S core in detail in our initial Skylake-X and Skylake-SP reviews.
What Intel has done with the Xeon W processors compared to the consumer HEDT line is focus more on the lower core count models: in the full line-up there are four quad-core models, some with hyperthreading, but there are also two six-core parts, a single eight-core part, and a ten-core parts. One could interpret this SKU differentiation as Intel not focusing as much on the high-end with the Xeon W line – where the consumer line as products at 18/16/14/12/10 cores, the Xeon W only has 18/14/10/10/8 as the top five models.
Intel Xeon-W Processors (LGA2066) | ||||||||
Cores | Base Freq. |
Turbo 2.0 |
L3 (MB) |
L3/core (MB) |
DDR4 ECC |
TDP | Price | |
Xeon W-2195 | 18/36 | 2.3 GHz | 4.3 GHz | 24.75 | 1.375 | 2666 | 140 W | $2553 |
Xeon W-2175 | 14/28 | 2.5 GHz | 4.3 GHz | 19.25 | 1.375 | 2666 | 140 W | $1947 |
Xeon W-2155 | 10/20 | 3.3 GHz | 4.5 GHz | 13.75 | 1.375 | 2666 | 140 W | $1440 |
Xeon W-2145 | 8/16 | 3.7 GHz | 4.5 GHz | 11.00 | 1.375 | 2666 | 140 W | $1113 |
Xeon W-2135 | 6/12 | 3.7 GHz | 4.5 GHz | 8.25 | 1.375 | 2666 | 140 W | $835 |
Xeon W-2133 | 6/12 | 3.6 GHz | 3.9 GHz | 8.25 | 1.375 | 2666 | 140 W | $617 |
Xeon W-2125 | 4/8 | 4.0 GHz | 4.5 GHz | 8.25 | 2.063 | 2666 | 120 W | $444 |
Xeon W-2123 | 4/8 | 3.6 GHz | 3.9 GHz | 8.25 | 2.063 | 2666 | 120 W | $294 |
Xeon W-2104* | 4/4 | 3.2 GHz | - | 8.25 | 2.063 | 2400 | 120 W | $255 |
Xeon W-2102* | 4/4 | 2.9 GHz | - | 8.25 | 2.063 | 2400 | 120 W | $202 |
*Off Roadmap | ||||||||
Apple Only SKUs | ||||||||
Xeon W-2191B | 18/36 | 2.3 GHz | 4.3 GHz | 24.75 | 1.375 | 2666 | ? | - |
Xeon W-2170B | 14/28 | 2.5 GHz | 4.3 GHz | 19.25 | 1.375 | 2666 | ? | - |
Xeon W-2150B | 10/20 | 3.0 GHz | 4.5 GHz | 13.75 | 1.375 | 2666 | ? | - |
Xeon W-2140B | 8/16 | 3.2 GHz | 4.2 GHz | 11.00 | 1.375 | 2666 | ? | - |
One of the other changes is in the AVX512 compatibility. With the Xeon Scalable processors, each core has the equivalent of two AVX512 FMA ports on each core to maximize bandwidth, except the off-roadmap SKUs that have one. The consumer product line also has two, although Intel initially said certain parts only had one. These Xeon W parts will also have two AVX512 FMA ports each, allowing hand-tuned code to use AVX512 to its fullest. Xeon W also has ECC memory, which is usually one of the main reasons to buy the processors.
Each of the CPUs can support up to 512GB of DDR4-2400 ECC RDIMMs in a quad channel configuration, which means that each module can be 64GB apiece. This is up from the 128 GB UDIMM support on the consumer space, but lower than the 768GB RDIMM support for Xeon Scalable (caused by having six memory channels).
A small note about the ‘different’ processors in the stack. The W-2102 and W-2104 are the low-end quad-core processors without hyperthreading or Turbo, but these are classified as ‘off roadmap’. These processors are not for sale to all OEMs, like the others, and typically are built for specific OEMs that have contracts with specific customers in mind. As a result, pricing lists will not show these parts, and to be honest, Intel does not really like talking about them as promoting them has no inherent value. Of course from our perspective, we like examining every member of the stack, regardless of how widely available it is.
The other set of different processors are the Apple-only parts. These are only found in the Late 2017 model of the updated iMac Pro. Take for example the Xeon W-2150B - this 10-core processor is almost identical to the 10-core Xeon W-2155, but has a lower base frequency of 3.0 GHz (compared to 3.5 GHz). The lower base frequency will greatly reduce the TDP of the processor, however these processors rarely run at base frequency and almost always in a turbo state, where TDP is undefined, making it difficult to place this processor. It is most likely a part that is binned well for voltage, frequency, and power. Again, this is another part that isn’t available to everyone (but if someone has one, we’d love to test it).
All these processors will require a motherboard that uses the C422 chipset. These chipsets are almost identical to the X299 chipsets used in the consumer platform, but are firmware locked to Xeon W processors only with support for ECC. Because of the split between the consumer and workstation platforms, there are very few C422motherboards in the open market for custom builders – most OEMs (Dell, Supermicro) build their own internal motherboards for pre-built systems specifically for their own customers, and optimized for their intended outcome (performance, price, etc.).
Per Core Turbo Data
Intel's per-core turbo data for these workstation parts are split up into three sections, due to the instruction sets they have. On the 'hardest' instructions, Intel uses special turbo values for AVX-512, as due to the way these instructions are processed, more heat is generated on chip. The chip has to balance frequency and power draw, so the AVX-512 data comes in at a lower frequency in order to keep the turbo in check.
The first thing to notice with this data is that for most CPUs, when the whole CPU is using AVX-512 instructions, the frequency will drop below the base frequency. For chips like the Xeon W-2123 and W-2133, even single core loading of AVX-512 will drop the frequency below the base frequency. Intel's base frequency does two things: first, it tells you the frequency at which TDP is applicable, and second it is the guaranteed minimum frequency for regular non-AVX instructions.
Behind AVX-512 is AVX2, which is still somewhat of a strain on the processor beyond regular instructions, but not as much. Where AVX-512 requires dedicated die area for support of the vector units, AVX2 is built into the back-end of the standard core design.
For AVX2, the W-2133 and W-2123 still end up below the base frequency of the processor. But for the big ones, like the W-2195, the full 18-core loading of AVX2 is 500 MHz faster than AVX-512. This is just an indication that users that are fine-tuning code should think about how much of the AVX-512 unit they can keep fed - the AVX-512 unit despite the 500 MHz difference is expected to be faster no doubt, but a half-fed AVX-512 might get trumped by a full AVX2.
For the regular instructions, turbo goes a bit like this:
For a number of users, the key metrics here are the all-core turbos, with the 18-core part having an all-core turbo of 3.2 GHz. Interestingly the W-2155 and W-2145 sits well here: for any code that can't reliably go beyond 12-14 threads, having the higher frequency but lower core count part might actually perform better. We saw a bit of this in our review, with the variable threaded loads executing somewhat better on the W-2155 than the W-2195.
Then and Now: Defining a Workstation
By splitting the motherboard support for workstation grade processors, Intel has (whether on purpose or not) redefined what it means to have an Intel workstation. In previous generations, a certain market of users would happily invest into an E5-2640 style processor and place it into a single socket consumer motherboard, taking advantage of a potentially better-binned processor, and on some motherboards that qualified it, ECC memory. Depending on the location and time, in some instances this method was cheaper than going for the similar grade consumer processor. Due to the motherboard support, these systems were certainly more widespread compared to today. In fact, some users are currently looking to eBay and investing in older 8-core and 10-core processors because they are extremely affordable.
In 2018, for the Intel workstation enthusiast, the situation is complicated and confusing.
If a workstation user looked at consumer-grade hardware, they can get the cores and the motherboard, but lose the ECC memory and co-processor compatibility consummate with a professional system: some motherboard hardware may not be qualified with Quadro, Tesla, FirePro, Xilinx, Altera, etc., because those motherboards aren’t built for that market.
If a workstation user looked at professional-grade hardware, it becomes a case of struggling to self-build based on availability or paying through the nose for an OEM system that might have some horrendous markup. We spoke to one OEM in years past, who said that the prices on the website were almost fictitious – most of their sales in this area come from large-scale corporate contracts which offer discounts based on volume. The single home-brew workstation user was not their target market, unless they wanted to pay the high prices.
Similar SKU Comparison | |||
Features | Skylake-X (i9-7980XE) |
Xeon-W (Xeon W-2195) |
Skylake-SP (Xeon 6140/M) |
Platform | X299 | C422 | C620 |
Socket | LGA2066 | LGA2066 | LGA3647 |
Cores/Threads | 18 / 36 | 18 / 36 | 18 / 36 |
Top Base/Turbo | 2.6 / 4.2 | 2.3 / 4.3 | 2.3 / 3.7 |
GPU PCIe 3.0 | 44 | 48 | 48 |
DRAM / DDR4 | 128GB UDIMM Quad-Channel |
512GB RDIMM+LRDIMM Quad Channel |
768GB / 1536GB RDIMM+LRDIMM Six Channel |
512-bit FMAs | 2 | 2 | 2 |
Max Sockets | 1 | 1 | 4 |
TDP | 165W | 140W | 140W |
Price | $1999 | $2553 | $2451 / $5448 |
Ultimately if a user is going above and beyond for an OEM system, it might be worth looking into Xeon Scalable processors, especially if multiple sockets in a single system are required. This increases the expense significantly, however. The benefits on building a consumer-based workstation, if memory is not needed, also come down to clock speed and AVX-512 support.
The alternative is to look at AMD’s workstation offering, Threadripper, which is cheaper, offers similar core counts, more ECC memory per processor (depending on motherboard support), and more PCIe lanes, but does not have AVX-512 and can suffer from a non-unified memory architecture for software that requires a lot of core-to-core and core-to-memory communication. The multi-socket option here is EPYC, which gets more cores and more system memory, but not increase in PCIe lanes due to the way the platform shares resources.
Intel vs AMD Comparison | |||
Features | Xeon-W Xeon W-2145 |
AMD Ryzen TR 1900X |
AMD EPYC 7401P |
Platform | C422 | X399 | - |
Socket | LGA2066 | TR4 | SP3r2 |
Cores/Threads | 8 / 16 | 8 / 16 | 24 / 48 |
Base/Turbo | 3.7 / 4.5 | 3.8 / 4.0 | 2.0 / 3.0 |
GPU PCIe 3.0 | 48 | 60 | 124 |
DRAM / DDR4 | 512GB RDIMM+LRDIMM Four Channel |
128GB UDIMM Four Channel |
2 TB RDIMM+LRDIMM Eight Channel |
AVX-512 | 2 FMA | - | - |
L3 Cache | 11 MB | 32 MB | 64 MB |
TDP | 140W | 180W | 170W |
Price | $1113 | $549 | $1075 |
A lot of purchasing decisions will be skewed specifically for the workflow in mind, which is one of the reasons why we have so many benchmarks in play for our reviews – there is no ‘one benchmark fits all’ scenario, and we are now in a situation where there are multiple options to choose from depending on the size of the wallet.
This Review
For our analysis today, we were able to secure five of the Xeon W processors: the top-end 18-core W-2195, the mid-range ten core W-2155, the more budget quad-core W-2153, and the two off-roadmap processors in the W-2104 and W-2102.
We have put these processors through our current generation testing suite, with the Spectre and Meltdown patches applied. The main targets for comparison are Intel’s Skylake-X high-end desktop platform, Intel’s Skylake-S consumer platform, and AMD’s Ryzen and Threadripper platforms.
The motherboard used in our review is the Supermicro X11SRA, one of the more 'available' Xeon W motherboards on the market.
You can read our review of the motherboard here.
We must also say thank you to Kingston for sampling us some DDR4-2666 C19 RDIMM Memory for this review.
Xeon W processors support RDIMM ECC memory, and our motherboard here would not accept UDIMMs, and Kingston kindly supplied the memory needed. The (KSM26RS8/8HAI) modules were faultless in our testing.
Pages In This Review
- Overview of Xeon W
- Test Setup and Power Consumption
- CPU Benchmarking: Office Tests
- CPU Benchmarking: System Tests
- CPU Benchmarking: Rendering Tests
- CPU Benchmarking: Encoding Tests
- CPU Benchmarking: Web Tests
- CPU Benchmarking: Legacy Tests
- Spectre vs Meltdown: SYSMark
- Conclusions: Is Intel Serious About Xeon W?
Test Bed and Setup
As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer. Where possible, we will extend out testing to include faster memory modules either at the same time as the review or a later date.
Test Setup | ||||||
Processors | Xeon | W-2195 | W-2155 | W-2123 | W-2104 | W-2102 |
Cores | 18C/36T | 10C/20T | 4C/8T | 4C/4T | 4C/4T | |
Base | 2.3 GHz | 3.3 GHz | 3.6 GHz | 3.2 GHz | 2.9 GHz | |
Turbo | 4.3 GHz | 4.5 GHz | 3.9 GHz | - | - | |
Price | $2553 | $1440 | $294 | $255 | $202 | |
Motherboard | Supermicro X11SRA (BIOS v1.10a) Spectre / Meltdown Patches Applied |
|||||
Cooling | Corsair H115i | |||||
Power Supply | Corsair HX750 | |||||
Memory | Kingston 4x8GB DDR4 2666 CL19-19-19-443 RDIMM (KSM26RS8/8HAI) |
|||||
Memory Settings | DDR4 2666 CL16-18-18-35 2T | |||||
Video Cards | ASUS Strix GTX 980 | |||||
Hard Drive | Crucial MX300 1TB | |||||
Optical Drive | TSST TS-H653G | |||||
Case | Open Test Bed | |||||
Operating System | Windows 10 Pro 64-bit |
Power Consumption
For our power consumption testing, we place the system under a heavy Prime95 load and then take the power consumption reading from the internal CPU sensor. This is the sensor that determines power resources on the system, as well as how fan speeds and throttling should be adjusted. On most platforms we get a breakdown of chip-wide power compared to core power and memory controller power, however the Skylake-SP platform has all that removed and we can only get full-chip power. We also test on a per-core level.
At full load, all of our Xeon W chips are underneath TDP, despite running at full core turbo. The Xeon W-2102 and Xeon W-2104 really show that these quad-core parts are certainly well below the 120W rating. When we look at the higher core count Xeon W parts, we see that the power consumption is well below the corresponding Core i9 processors - the Xeon W-2195 for example is only 123W, compared to 162W for the Core i9 version.
If we look at the chips when only a single thread is loaded, then the Xeon W chips spread out a bit more. The AMD parts fare better here compared to the W-2195, but the W-2195 is still below the Core i9 parts.
Many thanks to...
We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this testbed specifically but is used in other testing.
Thank you to ASUS for providing us with GTX 980 Strix GPUs. At the time of release, the STRIX brand from ASUS was aimed at silent running, or to use the marketing term: '0dB Silent Gaming'. This enables the card to disable the fans when the GPU is dealing with low loads well within temperature specifications. These cards equip the GTX 980 silicon with ASUS' Direct CU II cooler and 10-phase digital VRMs, aimed at high-efficiency conversion. Along with the card, ASUS bundles GPU Tweak software for overclocking and streaming assistance.
The GTX 980 uses NVIDIA's GM204 silicon die, built upon their Maxwell architecture. This die is 5.2 billion transistors for a die size of 298 mm2, built on TMSC's 28nm process. A GTX 980 uses the full GM204 core, with 2048 CUDA Cores and 64 ROPs with a 256-bit memory bus to GDDR5. The official power rating for the GTX 980 is 165W.
The ASUS GTX 980 Strix 4GB (or the full name of STRIX-GTX980-DC2OC-4GD5) runs a reasonable overclock over a reference GTX 980 card, with frequencies in the range of 1178-1279 MHz. The memory runs at stock, in this case, 7010 MHz. Video outputs include three DisplayPort connectors, one HDMI 2.0 connector, and a DVI-I.
Further Reading: AnandTech's NVIDIA GTX 980 Review
Thank you to Crucial for providing us with MX300 SSDs. Crucial stepped up to the plate as our benchmark list grows larger with newer benchmarks and titles, and the 1TB MX300 units are strong performers. Based on Marvell's 88SS1074 controller and using Micron's 384Gbit 32-layer 3D TLC NAND, these are 7mm high, 2.5-inch drives rated for 92K random read IOPS and 530/510 MB/s sequential read and write speeds.
The 1TB models we are using here support TCG Opal 2.0 and IEEE-1667 (eDrive) encryption and have a 360TB rated endurance with a three-year warranty.
Further Reading: AnandTech's Crucial MX300 (750 GB) Review
Thank you to Corsair for providing us with Vengeance LPX DDR4 Memory, HX750 Power Supply, and H115i CPU Cooler.
Corsair kindly sent a 4x8GB DDR4 2666 set of their Vengeance LPX low profile, high-performance memory for our stock testing. The heatsink is made of pure aluminum to help remove heat from the sticks and has an eight-layer PCB. The heatsink is a low profile design to help fit in spaces where there may not be room for a tall heat spreader; think a SFF case or using a large heatsink. Timings on this specific set come in at 16-18-18-35. The Vengeance LPX line supports XMP 2.0 profiles for easily setting the speed and timings. It also comes with a limited lifetime warranty.
Powering the test system is Corsair's HX750 Power Supply. This HX750 is a dual mode unit able to switch from a single 12V rail (62.5A/750W) to a five rail CPU (40A max ea.) and is also fully modular. It has a typical selection of connectors, including dual EPS 4+4 pin four PCIe connectors and a whopping 16 SATA power leads, as well as four 4-pin Molex connectors.
The 135mm fluid dynamic bearing fan remains off until it is 40% loaded offering complete silence in light workloads. The HX750 comes with a ten-year warranty.
In order to cool these high-TDP HEDT CPUs, Corsair sent over its latest and largest AIO in the H115i. This closed-loop system uses a 280mm radiator with 2x140mm SP140L PWM controlled fans. The pump/block combination mounts to all modern CPU sockets. Users are also able to integrate this cooler into the Corsair link software via USB for more control and options.
Thank You to Kingston for sampling us some DDR4-2666 C19 RDIMM Memory.
Xeon W processors support RDIMM ECC memory, and our motherboard here would not accept UDIMMs, and Kingston kindly supplied the memory needed. The (KSM26RS8/8HAI) modules were faultless in our testing.
Benchmarking Performance: CPU Office Tests
The office programs we use for benchmarking aren't specific programs per-se, but industry standard tests that hold weight with professionals. The goal of these tests is to use an array of software and techniques that a typical office user might encounter, such as video conferencing, document editing, architectural modelling, and so on and so forth.
All of our benchmark results can also be found in our benchmark engine, Bench.
Chromium Compile (v56)
Our new compilation test uses Windows 10 Pro, VS Community 2015.3 with the Win10 SDK to combile a nightly build of Chromium. We've fixed the test for a build in late March 2017, and we run a fresh full compile in our test. Compilation is the typical example given of a variable threaded workload - some of the compile and linking is linear, whereas other parts are multithreaded.
Our popular Chrome Compile test gives a good showing for the Intel CPUs, however the higher-powered Core i9 processors perform a lot better here - up to 50% in fact. Part of this is down to memory; the DDR4-2666 C19 memory is slower than the DDR4-2666 C16 used in our Core i9 reviews. However, there might also be a case for power draw - the BIOS defaults for the Core i9 processors allow for a lot more power consumption, which the Xeon W processors might not be able to tap in to. It is worth noting that the W-2155 wins against the W-2195, showing that in this test frequency matters as much as cores.
SYSmark 2014 SE: link
SYSmark is developed by Bapco, a consortium of industry CPU companies. The goal of SYSmark is to take stripped down versions of popular software, such as Photoshop and Onenote, and measure how long it takes to process certain tasks within that software. The end result is a score for each of the three segments (Office, Media, Data) as well as an overall score. Here a reference system (Core i3-6100, 4GB DDR3, 256GB SSD, Integrated HD 530 graphics) is used to provide a baseline score of 1000 in each test.
A note on context for these numbers. AMD left Bapco in the last two years, due to differences of opinion on how the benchmarking suites were chosen and AMD believed the tests are angled towards Intel processors and had optimizations to show bigger differences than what AMD felt was present. The following benchmarks are provided as data, but the conflict of opinion between the two companies on the validity of the benchmark is provided as context for the following numbers.
PCMark 10: link
PCMark 10 is the latest all-in-one office-related performance tool that combines a number of tests for low-to-mid office workloads, including some gaming, but focusing on aspects like document manipulation, response, and video conferencing.
In the Physics score, the W-2195 takes a commanding lead, however the W-2155 is not far behind, offering a better performance per dollar metric. Both are outclassed by the Threadripper 1950X in this test, however. In fact, the only test where Xeon W truly wins is in the Creation test.
GeekBench4: link
GB4 is a popular tool in benchmarking, with most users liking its cross-platform functionality. Due to requests, we are including the data in our reviews. Our benchmark database has a more detailed breakdown of the sub-sections in the test.
GeekBench 4 is still a newer benchmark in our test suite, hence the lack of comparative results.
PCMark8: link
Despite originally coming out in 2008/2009, Futuremark has maintained PCMark8 to remain relevant in 2017. On the scale of complicated tasks, PCMark focuses more on the low-to-mid range of professional workloads, making it a good indicator for what people consider 'office' work. We run the benchmark from the commandline in 'conventional' mode, meaning C++ over OpenCL, to remove the graphics card from the equation and focus purely on the CPU. PCMark8 offers Home, Work and Creative workloads, with some software tests shared and others unique to each benchmark set.
[words]
Benchmarking Performance: CPU System Tests
Our first set of tests is our general system tests. These set of tests are meant to emulate more about what people usually do on a system, like opening large files or processing small stacks of data. This is a bit different to our office testing, which uses more industry standard benchmarks, and a few of the benchmarks here are relatively new and different.
All of our benchmark results can also be found in our benchmark engine, Bench.
FCAT Processing: link
One of the more interesting workloads that has crossed our desks in recent quarters is FCAT - the tool we use to measure stuttering in gaming due to dropped or runt frames. The FCAT process requires enabling a color-based overlay onto a game, recording the gameplay, and then parsing the video file through the analysis software. The software is mostly single-threaded, however because the video is basically in a raw format, the file size is large and requires moving a lot of data around. For our test, we take a 90-second clip of the Rise of the Tomb Raider benchmark running on a GTX 980 Ti at 1440p, which comes in around 21 GB, and measure the time it takes to process through the visual analysis tool.
FCAT likes single threaded performance, whcih shows the high frequency parts with faster memory near the top.
Dolphin Benchmark: link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.
Dolphin is also pure ST frequency driven, however a surprise twist in that our Xeon W-2155 beats the Core i7-8086K in this test, although with a margin of error.
3D Movement Algorithm Test v2.1: link
This is the latest version of the self-penned 3DPM benchmark. The goal of 3DPM is to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in. It affords a ~25% speed-up over v2.0, which means new data.
3DPM likes fast cache and frequency, and the W-2195 is almost fighting with the Core i9-7980XE here, and is let down slightly by its slow memory. The 1950X is still top dog.
DigiCortex v1.20: link
Despite being a couple of years old, the DigiCortex software is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation. The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.
DigiCortex is a memory focused benchmark, but can also take advantage of AVX2 and sometimes AVX512, hence why the W-2195 is sat at the top. That being said, it is above the i9-7980XE, despite the latter having dual AVX512 ports.
Agisoft Photoscan 1.3.3: link
Photoscan stays in our benchmark suite from the previous version, however now we are running on Windows 10 so features such as Speed Shift on the latest processors come into play. The concept of Photoscan is translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the model. The algorithm has four stages, some single threaded and some multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.
Agisoft is a mixture of workloads, although the big multithreaded bit in the middle tends to dominate. Both the W-2195 and W-2155 score the same time, with a cluster of results around it. The Core i9-7960X sits on top though, with a seemingly better mix of cores and threads.
Benchmarking Performance: CPU Rendering Tests
Rendering tests are a long-time favorite of reviewers and benchmarkers, as the code used by rendering packages is usually highly optimized to squeeze every little bit of performance out. Sometimes rendering programs end up being heavily memory dependent as well - when you have that many threads flying about with a ton of data, having low latency memory can be key to everything. Here we take a few of the usual rendering packages under Windows 10, as well as a few new interesting benchmarks.
All of our benchmark results can also be found in our benchmark engine, Bench.
Corona 1.3: link
Corona is a standalone package designed to assist software like 3ds Max and Maya with photorealism via ray tracing. It's simple - shoot rays, get pixels. OK, it's more complicated than that, but the benchmark renders a fixed scene six times and offers results in terms of time and rays per second. The official benchmark tables list user submitted results in terms of time, however I feel rays per second is a better metric (in general, scores where higher is better seem to be easier to explain anyway). Corona likes to pile on the threads, so the results end up being very staggered based on thread count.
Corona is very multi-threaded, so we expect most of the chips to push their legs on this one. The difference between the W-2195 and the Core i9-7980XE is much more as we expect for a fully MT test, with the W-2155 trading blows with the TR 1920X and the lower quad-core SKUs bringing up the rear.
Blender 2.78: link
For a render that has been around for what seems like ages, Blender is still a highly popular tool. We managed to wrap up a standard workload into the February 5 nightly build of Blender and measure the time it takes to render the first frame of the scene. Being one of the bigger open source tools out there, it means both AMD and Intel work actively to help improve the codebase, for better or for worse on their own/each other's microarchitecture.
Blender is a very threaded test, but not completely, as we can see with the W-2195 still trailing even the Core i9-7960X. The W-2104 is pushing against the Core i5-6600K, despite the lower frequency, due to the quad-channel memory in play.
LuxMark v3.1: Link
As a synthetic, LuxMark might come across as somewhat arbitrary as a renderer, given that it's mainly used to test GPUs, but it does offer both an OpenCL and a standard C++ mode. In this instance, aside from seeing the comparison in each coding mode for cores and IPC, we also get to see the difference in performance moving from a C++ based code-stack to an OpenCL one with a CPU as the main host.
POV-Ray 3.7.1b4
Another regular benchmark in most suites, POV-Ray is another ray-tracer but has been around for many years. It just so happens that during the run up to AMD's Ryzen launch, the code base started to get active again with developers making changes to the code and pushing out updates. Our version and benchmarking started just before that was happening, but given time we will see where the POV-Ray code ends up and adjust in due course.
Cinebench R15: link
The latest version of CineBench has also become one of those 'used everywhere' benchmarks, particularly as an indicator of single thread performance. High IPC and high frequency gives performance in ST, whereas having good scaling and many cores is where the MT test wins out.
Cinebench is a 'classic' benchmark, despite being four generations behind the Cinema4D software at this point. The W-2195 goes toe-to-toe in the multithreaded test against the TR 1950X, but easily wins against it in the single threaded test. The W-2195 also beats the i9-7980XE in ST, but loses in MT.
Benchmarking Performance: CPU Encoding Tests
One of the interesting elements on modern processors is encoding performance. This includes encryption/decryption, as well as video transcoding from one video format to another. In the encrypt/decrypt scenario, this remains pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security. Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.
All of our benchmark results can also be found in our benchmark engine, Bench.
7-Zip 9.2
One of the freeware compression tools that offers good scaling performance between processors is 7-Zip. It runs under an open-source licence, is fast, and easy to use tool for power users. We run the benchmark mode via the command line for four loops and take the output score.
AMD's prowess in decompression means that it takes the top spot, however overall the W-2195 and the i9-7980XE are competing for top spot.
WinRAR 5.40
For the 2017 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack (33 video files in 1.37 GB, 2834 smaller website files in 370 folders in 150 MB) of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test 10 times and take the average of the last five runs when the benchmark is in a steady state.
WinRAR likes cores and memory, and it seems that even the W-2155 can win against the Core i9-7980XE in this test. Despite the quad channel memory for the Xeon W quad core parts, the low frequency means they are bringing up the rear. The W-2123 hits mid-pack, actually beating the Threadripper 1950X in this test.
AES Encoding
Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.
HandBrake v1.0.2 H264 and HEVC: link
As mentioned above, video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codec, VP9, there are two others that are taking hold: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content.
Handbrake is a favored tool for transcoding, and so our test regime takes care of three areas.
Low Quality/Resolution H264: Here we transcode a 640x266 H264 rip of a 2 hour film, and change the encoding from Main profile to High profile, using the very-fast preset.
High Quality/Resolution H264: A similar test, but this time we take a ten-minute double 4K (3840x4320) file running at 60 Hz and transcode from Main to High, using the very-fast preset.
HEVC Test: Using the same video in HQ, we change the resolution and codec of the original video from 4K60 in H264 into 4K60 HEVC.
Benchmarking Performance: CPU Web Tests
One of the issues when running web-based tests is the nature of modern browsers to automatically install updates. This means any sustained period of benchmarking will invariably fall foul of the 'it's updated beyond the state of comparison' rule, especially when browsers will update if you give them half a second to think about it. Despite this, we were able to find a series of commands to create an un-updatable version of Chrome 56 for our 2017 test suite. While this means we might not be on the bleeding edge of the latest browser, it makes the scores between CPUs comparable.
All of our benchmark results can also be found in our benchmark engine, Bench.
*Due to some issues in our web testing, only the following tests had scores that were comparable.
Google Octane 2.0: link
Along with Mozilla, as Google is a major browser developer, having peak JS performance is typically a critical asset when comparing against the other OS developers. In the same way that SunSpider is a very early JS benchmark, and Kraken is a bit newer, Octane aims to be more relevant to real workloads, especially in power constrained devices such as smartphones and tablets.
WebXPRT 2015: link
While the previous three benchmarks do calculations in the background and represent a score, WebXPRT is designed to be a better interpretation of visual workloads that a professional user might have, such as browser based applications, graphing, image editing, sort/analysis, scientific analysis and financial tools.
Benchmarking Performance: CPU Legacy Tests
Our legacy tests represent benchmarks that were once at the height of their time. Some of these are industry standard synthetics, and we have data going back over 10 years. All of the data here has been rerun on Windows 10, and we plan to go back several generations of components to see how performance has evolved.
All of our benchmark results can also be found in our benchmark engine, Bench.
3D Particle Movement v1
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. This is the original version, written in the style of a typical non-computer science student coding up an algorithm for their theoretical problem, and comes without any non-obvious optimizations not already performed by the compiler, such as false sharing.
CineBench 11.5 and 10
Cinebench is a widely known benchmarking tool for measuring performance relative to MAXON's animation software Cinema 4D. Cinebench has been optimized over a decade and focuses on purely CPU horsepower, meaning if there is a discrepancy in pure throughput characteristics, Cinebench is likely to show that discrepancy. Arguably other software doesn't make use of all the tools available, so the real world relevance might purely be academic, but given our large database of data for Cinebench it seems difficult to ignore a small five minute test. We run the modern version 15 in this test, as well as the older 11.5 and 10 due to our back data.
x264 HD 3.0
Similarly, the x264 HD 3.0 package we use here is also kept for historic regressional data. The latest version is 5.0.1, and encodes a 1080p video clip into a high quality x264 file. Version 3.0 only performs the same test on a 720p file, and in most circumstances the software performance hits its limit on high end processors, but still works well for mainstream and low-end. Also, this version only takes a few minutes, whereas the latest can take over 90 minutes to run.
Testing Spectre and Meltdown: SYSMark
As we were performing this testing, the issue of Spectre and Meltdown reared its ugly head. After 40 hours of testing, we realised that the motherboard was not BIOS patched for the latest issues, and we reached out to get the latest update, and had to retest all over again.
It was around this time that Intel also reached out to us to give us the results of their own performance testing relating to the patches. The long and short of the discussions about Intel results were that the patches affected systems with older the most, and systems that had fast storage (SSD vs HDD) also took the brunt of the performance hit.
For our testing, we took the SYSMark benchmark and did a before and after comparison. We confirmed the patches were applied by using the Inspectre tool before running in patched mode. You can read our analysis of the Spectre and Meltdown issues in the following articles:
- Meltdown & Spectre: Analyzing Performance Impacts on Intel's NUC7i7BNH
- Intel Publishes Spectre & Meltdown Hardware Plans: Fixed Gear Later This Year
- Intel CEO Addresses the Industry on Meltdown and Spectre Issues in Open Letter
- Intel Forms Product Assurance and Security Group amid Meltdown and Spectre Fallout
- Understanding Meltdown & Spectre: What To Know About New Exploits That Affect Virtually All CPUs
SYSMark 2014 SE
For the overall score, every processor lost some performance:
The biggest overall loser in real terms was the W-2155, which mixes single core performance with many threads. This is interesting - the processor with the most threads, the W-2195, did not have such a percentage dip. This might be related to how each of these processors is laid out differently: the W-2195 uses Intel's HCC 18-core die, whereas the W-2155 uses the LCC 10-core die. The HCC die has extra core-to-core latency because of the larger floorplan, which might hide some of the deficiencies here.
If we compare the percentage decrease across all of the SYSMark sub-tests:
We can see that the biggest decreases are seen in the Response sub-test, which contributes a lot to the overall score decreases. The response sub-test uses a fair amount of storage, which we know is likely to be the biggest loser from the patches. However, our overall decreases in performance range from 2.0% on the small slow core to 5.6% on the 10-core and back down to 3.5% on the largest 18-core part. The hardest hit tests were down 12%.
Conclusion: Is Intel Serious About Xeon W?
In this review, we have covered the performance on three of the more popular Xeon W processors, as well as two off-roadmap parts, and discussed that the Intel’s decision to bifurcate the way its workstation and consumer processors work has put more questions on the table for prospective buyers.
This ultimately comes down to the question: Is Intel Serious About Xeon W? If we ask Intel about this, of course the answer to them is yes – they want to have target markets and have a product portfolio that they feel will fit with that user base. However I am not so sure.
Xeon W was launched a lot later than both the Xeon Scalable platform and the equivalent Skylake-X consumer platform. The messaging behind Xeon W is unclear to a large degree, with only a limited amount of PR invested into it, unlike Xeon Scalable or Skylake-X. The decision to split the market between consumer and workstation, despite having a common socket, has minimized the accessibility of the workstation platform: fewer discussions are being had about the hardware, because there’s little room for a truly mix-and-match scenario as with previous generations. At no point in Intel’s messaging were we offered review samples for example, which is usually an indication that the product line is not one that the product managers are looking to promote. Only Intel’s latest Xeon E designs, released 10 months after the first equivalent consumer parts, beats Xeon W in terms of how un-exciting it can be to try and discuss talking about a platform. Intel does not want to sample Xeon E, either.
So will Intel lose workstation market share to AMD? If I am being so pallid, what are the financial ramifications for this market? AMD’s Threadripper looks like an appealing platform for workstation users for sure, but AMD is not without its own issues. Intel is the incumbent, and has embedded itself with a large number of OEMs and end-users for years, making it difficult for AMD to break that market. AMD’s chiplet design will take a few generations to get used to, so users might stick with ‘what they know’, regardless of any cost/benefit analysis. There is also the discussion of ECC support on Threadripper, for which the messaging has been somewhat unclear: technically it should support up to ECC LRDIMMs, however it does depend a lot on whether the motherboard vendor has qualified their product for RDIMMs or LRDIMMs – most of them are not, complicating the issue. If AMD wanted to tackle this space, they need an ASUS or a GIGABYTE to build a ‘workstation focused’ motherboard, with confirmed ECC and co-processor support. GIGABYTE’s Designare line and MSI’s upcoming X399 Creation might be aimed at this, but it really does require a razor-sharp message to get through.
All this confusion means that while AMD can be competitive in most tests, Intel is expected to remain the market leader for the foreseeable future.
I’m Sold on Xeon W: Tell Me About Performance
As our benchmarks are anything to go by, there is a lot of parity in performance between Intel’s Xeon W and Intel’s Skylake-X product lines. Xeon W takes a hit in memory workloads, because of the memory support: ECC RDIMMs are typically run at base JEDEC sub-timings, and so our DDR4-2666 memory was run at 19-19-19, compared to the 16-16-17 on the consumer platform which is more typical.
Our Xeon W results are skewed a bit towards the low-end processors, mostly because three of the five units we managed to acquire were quad-core processors. At this level, Intel’s now EOL Kaby Lake-X processors fared better, or the consumer Coffee Lake-S look like the better option, unless the user needs ECC or more PCIe lanes than the consumer products provide. The obvious counterpoint here is that if a user needs ECC, and is happy with 64 GB maximum memory support, then Intel’s own Xeon E is also an option, however we have not tested those parts yet (if any OEM can sample them to us, please let us know).
On the high-end, we do see the W-2195 sit behind the Core i9-7980XE in almost all benchmarks, which also means that for embarrassingly parallel workloads, it also sits behind the Threadripper 1950X. It still holds that Intel’s single threaded performance of the Xeon W, despite the lack of Turbo Boost 3.0, still gives it a significant advantage in single threaded workloads over AMD.
For users worrying about Spectre and Meltdown patches affecting performance, in our SYSMark tests we saw a 2-6% decrease over all the tests, with the hardest hit tests seeing a 12% decrease due to the correlation with storage.
Why Buy Xeon W?
The obvious reasons to buy Xeon W processors are just tick boxes: ECC memory, PCIe lanes, co-processor verification. If these are needed, the number of options for the rest of the system (particularly the motherboard) becomes slim, especially when factoring in price and total cost of ownership. A lot of the workstation market works on development cycles and high-throughput compute: the faster the compute, the quicker the prototyping. The fastest processors for a lot of that work, if CPU bound, are won by the consumer Core i9 or Threadripper, however if the above boxes are ticked, then Xeon W would be needed. Or Xeon Scalable, depending on budget.
A small side note to end: If anyone has access to any of the Apple-only Xeons (like the W-2150B) and would kindly let us borrow it for a review, please let me know over email.