Original Link: https://www.anandtech.com/show/9532/the-intel-broadwell-xeon-e3-v4-review-95w-65w-35w-1285-1285l-1265
The Intel Broadwell Xeon E3 v4 Review: 95W, 65W and 35W with eDRAM
by Ian Cutress on August 26, 2015 9:00 AM ESTOur Broadwell coverage on the desktop has included reviews of the two consumer processors and a breakdown of IPC gains from generation to generation. One issue surrounding Broadwell on consumer platforms was that the top quad-core model was rated at one third less power in comparison to previous Intel quad core processors. Specifically, Broadwell is 65W against 84-95W in past generations. This puts Broadwell’s out-of-the-box peak performance at a TDP (and frequency) disadvantage. However in a somewhat under-the-radar launch, Intel also released a series of Broadwell Xeons under the E3-12xx v4 line. We sourced three socketed models, the E3-1285 v4 at 95W, the E3-1285L v4 at 65W and the E3-1265L v4 at 35W to get a better scope of Broadwell's scaling across different power requirements.
Broadwell Xeon Overview
In almost every sense of the word, the launch of Broadwell in a socketed format has been fairly muted. For low-power mobile platforms at 4W and 15W, Broadwell was promoted heavily and the architecture has had many design wins; but for the desktop only two socketed consumer parts were launched. To that end, Intel performed only post-launch sampling for review websites, resulting in many users learning about the performance much later after the official launch (AnandTech had you covered on day one!). Even now, several weeks later, the i7-5775C and i5-5675C are both hard to source in several regions. Complicating matters is that Intel’s platform after Broadwell, Skylake, was launched soon after in early August with a bigger focus on gaming and end-user experiences, as well as the announcement of the Skylake Xeon family integrating into dedicated mobile processors. This has the effect of resigning Broadwell on the desktop to obscurity, whether intentionally or not (cue the conspiracy theorists).
The crumb of comfort in Broadwell is its use of 128MB of eDRAM. This acts as a fully associative last level victim cache (or L4) for the processor, and speeds up certain workloads that are memory dependent and subject to L3 cache misses when data has been previously evicted. This is a narrow double-pumped serial interface capable of delivering 50GB/s bi-directional bandwidth (100GB/s aggregate). Access latency after a miss in the L3 cache is 30 - 32ns, nicely in between an L3 and main memory access. The major benefit in our testing was to the integrated graphics, giving Intel the best integrated graphics in a bit socketed platform where money is no cost. Some reviews also saw that the eDRAM helped in discrete graphics gaming as well, although at a small effect and highly game dependent (but this raises other issues regarding higher performance on lower power/frequency processors with more on-package memory). The main downside of the eDRAM however is that it is CPU resident, not visible from the system agent to the DRAM, and thus only accessible to CPU/GPU workloads rather than accelerating data over the system IO.
What went further under the radar was the Intel Broadwell platform Xeons for the business and server market. We reported on its launch, but there was seemingly nothing front facing about the marketing of these processors, suggesting Intel might be keeping them as a pure business-to-business product. All bar one of these processors also support the eDRAM but also several Xeon-specific features. Back with Haswell, Intel launched a single soldered (BGA) Xeon with eDRAM. By having three socketed variants (LGA) for Broadwell at ‘launch’, it satisfies business customers that want to upgrade from Haswell E3 v3 Xeons but also provides business environments and server environments with the use of that eDRAM. One of the cited uses for it includes active memory databases, giving fewer cache misses by having a larger chunk of faster memory closer to the processing cores.
All the Broadwell Xeons are quad core with hyperthreading, with all bar one having Iris Pro P6300 (the professional version of Iris Pro 6200) on 48 EUs/GT3e. All but one soldered part has the eDRAM disabled. (Note that the E3-1284L v3 is listed by CPU-World but not currently listed at ark.intel.com.) Aside from this, the models differ solely on the basis of processor frequency, graphics frequency and thermal design limits. It is interesting to note the differences between the E3-1285 v4 and the E3-1285L v4. Sitting at 95W and 65W respectively, that 30W difference in TDP is represented by only a 100 MHz difference in the base frequency. This is relatively odd, and suggests that the 65W part, the E3-1285L v4, is a better off-the-wafer part with preferred frequency/voltage characteristics which also costs almost $100 or ~20% less. This plays a significant part in our testing.
The eDRAM stands out compared to previous Xeons, although at the expense of 2MB of L3 cache compared to previous high end quad core models (or i7 equivalents). Some microprocessor analysts have said that the loss of 2MB of L3 is not that important when backed up by 128MB of a fast L4 type of cache, on the basis that the bandwidth of this L4 is 50GB/s and up before you hit main memory.
Why eDRAM?
In a recent external podcast, David Kanter mentioned that for a multiple increase in cache (e.g. 3x), cache misses are decreased on average by the square root of the multiple increase (e.g. √3, or 1.73, 73%). So the movement from 8MB of last level cache to 128MB + 6MB, and despite the minor increase in latency moving out to eDRAM, is an effective 16x increase, reducing cache misses by a factor of four. This means, to quote, ‘if you have eight cache misses per thousand, you are now down to around two’ – I take this to mean a regular user workload but in a higher throughput environment, it could mean the difference between 2% and 0.5% cache misses out to main memory. Because the move out to main memory is such a latency and bandwidth penalty compared to an on-package transfer between the CPU and L3/L4, even a small decrease in cache misses has performance potential when used in the right context. Anand quoted Intel’s Tom Piazza back in our Haswell eDRAM review about the size of the eDRAM, and it was stated that 32MB should be enough, but it was doubled and then doubled it again just to make sure, as well as ‘go big or go home’. This has knock on performance effects.
Users upgrading to Broadwell Xeons from Haswell (or those purchasing new systems outright) will get this eDRAM benefit and a lower cost than previous Xeons – the E3-1285L v3 from the Haswell architecture was launched at a price of $774, compared with the E3-1285L v4 which is $445. For the difference, the Broadwell processor comes with eDRAM, substantially better integrated graphics and all within the same thermal design. At 95W, this difference is from $662 to $556, a much smaller difference. This suggests that on Haswell, the lower power model was harder to produce, whereas with Broadwell that burden shifts on the frequency.
Graphics Virtualization and Upgrades
One of the benefits of the Broadwell Xeons with eDRAM lies in Intel's graphics virtualization technology (GVT). This affords three modes of operation:
The benefits of these virtualization techniques allow data centers to essentially apply an accelerant to each VM depending on the beneftis of the GPU on each workload. With it being directly included in the CPU, no additional hardware is needed. Obviously this makes more sense when each virtual machine requires infrequent access to the integrated graphics, but for everything else, Intel is set to launch it's Valley Vista platform which will adorn three of these CPUs onto an add-in PCIe card.
Valley Vista
At IDF San Francisco this year, an announcement passed almost everyone by. Intel described an add-in card coming in Q4 2015 that features three Broadwell-H E3 Xeon processors on a single PCB, each with Iris Pro graphics.
Valley Vista is designed to allow for high density, workload specific work, in particular AVC transcoding. Aside from the slide above, there has been no real details as to how this card will work - if there's a PCIe switch for communication, or if it runs in a virtualized layer, or how the card is powered or if each of the processors on the card will have a fixed amount of DRAM associated with them. So far Supermicro announced in a press release that one of their Xeon Phi platforms is suitable for the cards when they get launched later this year. What we do know however is that Broadwell is not fully HEVC accelerated, so the utility in Valley Vista is most likely to be with AVC encode/decode.
Chipsets
As with previous socket drop-ins on the professional line, Intel is promoting the use of its C226 chipset - for our testing, we used an equivalent Z97 platform which worked as well.
This provides an as-is scenario, with sixteen lanes of PCIe 3.0, two channels of DDR3/L-1600 memory with ECC support, a DMI 2.0 x4 link equivalent to 4 GB/s, up to six native USB 3.0 and SATA 6 Gbps ports, depending on the high-speed IO configuration used in the chipset in conjunction with the eight PCIe lanes.
This Review
There isn’t much else to say here – we have covered Broadwell on the desktop and the differences are spelled out for end users despite the current lack of direct availability in certain markets at this time. These are Xeon processors, so no overclocking here, but the main parallel we should be making is the 95W of the E3-1285 v4 and the E3-1276 v3 at 84W. The E3 has some extra frequency (peaks at 4 GHz) and extra L3 cache, but the Xeon has eDRAM.
Compared to Johan’s in depth server reviews, the focus for the testing on this piece is primarily at workstation environments. Because we did not get a 95W ‘consumer’ based Broadwell for comparison, gaming tests were also performed. Unfortunately the Linux based server tests we typically use were not performed due to a spectacular failing of our Ubuntu LiveCD with these processors, even though it worked with the non-Xeon counterparts. We’re still trying to figure this one out but we suspect it is a driver related issue. While in no way similar, in its stead we have SPECviewperf 12 on Windows with a discrete GPU (it's typical use case) as an additional angle of comparison.
A side note to those have recently asked - we are in the process of looking into appropriate repeatable compilation benchmarks and VM environment comparisons. Ideally we are aiming to finalize a series of tests that can be one-click batched and processed within a reasonable testing timeframe. These will not be ready until mid-September at the earliest due to other commitments, but when available we will try and run a number of past systems to acquire appropriate comparative data. To add comments, suggestions or preferences on the tests, please email [email protected].
Test Setup
Test Setup | ||||||||||||||||
Processor |
|
|||||||||||||||
Motherboards | MSI Z97A Gaming 6 | |||||||||||||||
Cooling | Cooler Master Nepton 140XL | |||||||||||||||
Power Supply | OCZ 1250W Gold ZX Series | |||||||||||||||
Memory | G.Skill RipjawsZ 4x4 GB DDR3-1866 9-11-11 Kit | |||||||||||||||
Video Cards | ASUS GTX 980 Strix 4GB MSI GTX 770 Lightning 2GB (1150/1202 Boost) ASUS R7 240 2GB |
|||||||||||||||
Hard Drive | Crucial MX200 1TB | |||||||||||||||
Optical Drive | LG GH22NS50 | |||||||||||||||
Case | Open Test Bed | |||||||||||||||
Operating System | Windows 7 64-bit SP1 |
The dynamics of CPU Turbo modes, both Intel and AMD, can cause concern during environments with a variable threaded workload. There is also an added issue of the motherboard remaining consistent, depending on how the motherboard manufacturer wants to add in their own boosting technologies over the ones that Intel would prefer they used. In order to remain consistent, we implement an OS-level unique high performance mode on all the CPUs we test which should override any motherboard manufacturer performance mode.
All of our benchmark results can also be found in our benchmark engine, Bench.
Many thanks to...
We must thank the following companies for kindly providing hardware for our test bed:
Thank you to AMD for providing us with the R9 290X 4GB GPUs.
Thank you to ASUS for providing us with GTX 980 Strix GPUs and the R7 240 DDR3 GPU.
Thank you to ASRock and ASUS for providing us with some IO testing kit.
Thank you to Cooler Master for providing us with Nepton 140XL CLCs.
Thank you to Corsair for providing us with an AX1200i PSU.
Thank you to Crucial for providing us with MX200 SSDs.
Thank you to G.Skill and Corsair for providing us with memory.
Thank you to MSI for providing us with the GTX 770 Lightning GPUs.
Thank you to OCZ for providing us with PSUs.
Thank you to Rosewill for providing us with PSUs and RK-9100 keyboards.
Load Delta Power Consumption
Power consumption was tested on the system while in a single GTX 770 configuration with a wall meter connected to the OCZ 1250W power supply. This power supply is Gold rated, and as I am in the UK on a 230-240 V supply, leads to ~75% efficiency > 50W, and 90%+ efficiency at 250W, suitable for both idle and multi-GPU loading. This method of power reading allows us to compare the power management of the UEFI and the board to supply components with power under load, the power delta from idle to CPU loading, and all results include typical PSU losses due to efficiency.
Power numbers are typically difficult to gauge as they depend on the stock voltage of the processor and how aggressive the motherboard wants to be in order to ensure stability. If I were thinking from the point of view of the motherboard manufacturer, they are more likely to overvolt a Xeon processor to ensure that stability rather than deal with any unstable platforms. As a result, we get an odd scenario where the 35W processor is almost hitting double the power consumption at load, and the 65W is also above its mark, but the 95W is below. To put an angle on this, the 110W we see on the i7-6700K was in one motherboard, but in another we have seen 76W as well as 84W. Without having access to the BIOS DVFS tables for each processor, it is difficult to tell when we have mismatched data such as this.
Professional Performance: Windows
Agisoft Photoscan – 2D to 3D Image Manipulation: link
Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.
The benefits of the eDRAM here afford nearly two minutes over the v3.
Cinebench R15
Cinebench is a benchmark based around Cinema 4D, and is fairly well known among enthusiasts for stressing the CPU for a provided workload. Results are given as a score, where higher is better.
We've seen that Broadwell can organise threads slightly better than Haswell, along with its IPC increases and ability to manage more data in its buffers. As a result, while single thread is pretty much par for the course between the v3 and v4, the multithreaded result puts the v4 ahead of the v3.
HandBrake v0.9.9: link
For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container. Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.
With our HandBrake tests, historically low quality encodes with small frames require a purely faster processor, whereas large high quality frames need more memory accesses. This is why the E3 v3 at 84W and E3 v4 at 35W come out near similar - the eDRAM of the v4 helps push a little ahead here. That being said, the improvements in Skylake show what perhaps the future v5 Xeons might be capable of.
Hybrid x265
Hybrid is a new benchmark, where we take a 4K 1500 frame video and convert it into an x265 format without audio. Results are given in frames per second.
SPECviewperf 12 on a GTX 980
From popular demand, we have introduced SPECviewperf 12 into our testing regimen from August 2015. SPEC is the well-known purveyor of industry standard benchmarks, often probing both fundamental architectural behavior of processors and controllers, as well as comparing performance with well understood industry software and automated tools. It is this last point we pick up – SPECviewperf 12 tests the responsiveness of graphics packages in the fields of design, medical, automotive as well as energy. The benchmarks focus purely on responsiveness and the ability to both display and rotate complex models to aid in design or interpretation, using each packages internal graphics schema (at 1080p). We run this set with a discrete graphics card, similar to the workstation environments in which they would be used. As a new benchmark, we are still filling the system with data.
At a certain point it seems that most tests are graphics card bound, however a few show up that having the fastest processor makes a difference. Differences from the Haswell platforms score +5% at best, although a bigger difference can be seen going further back in CPU generations. At this point with a discrete graphics card, SPECviewperf's tests are more akin to our gaming tests when it comes to responsiveness.
Office Performance
Dolphin Benchmark: link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.
Dolphin prefers single threaded speed and IPC, which the extra frequency of the v3 wins out here. The disparity between the 65W/95W v4 processors and the 35W processor is most obvious here.
WinRAR 5.0.1: link
Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.
WinRAR is our classic 'eDRAM works here!' benchmark, clearly showing how Broadwell benefits. Although, one might argue that WinRAR is not a typical workload environment. It is also poignant to show that the 95W v4 doesn't win here in this variable-threaded load.
3D Particle Movement
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.
Similar to CineBench, in single threaded mode the v3 wins out due to the faster frequency, but in multithreaded mode the advancements in the Broadwell core due to better thread resource management puts at least the 95W v4 ahead.
FastStone Image Viewer 4.9
FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and results are given in seconds.
Web Benchmarks
On the lower end processors, general usability is a big factor of experience, especially as we move into the HTML5 era of web browsing. For our web benchmarks, we take four well known tests with Chrome 35 as a consistent browser.
For web implementations, both Kraken and Octane see benefits moving up to Broadwell, but it is worth noting that moving to Skylake is an even better benefit. This again comes down to the management of CPU instructions between threads, and having benefits associated with keeping the knowledge of past instructions or information in lower cache levels. In would seem in this regard, if you count these benchmarks indicative of a real workload, that web-based throughput implementations are more in-flight operation limited than any other resource.
Gaming Benchmarks: Low End
To satisfy our curiosity regarding high power and low power eDRAM based Xeons in gaming, we ran our regular suite through each processor. On this page are our integrated graphics results, along with a cheaper graphics solution in the R7 240 DDR3.
Alien: Isolation
If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.
For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.
Total War: Attila
The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.
For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.
Grand Theft Auto V
The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).
GRID: Autosport
No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.
GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.
Middle-Earth: Shadow of Mordor
The final title in our testing is another battle of system performance with the open world action-adventure title, Shadow of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.
For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.
Gaming Benchmarks: Mid-Range
To satisfy our curiosity regarding high power and low power eDRAM based Xeons in gaming, we ran our regular suite through each processor. On this page are our results with a mid-range card, the R9 285, and an ex-high end card, the GTX 770 (a GTX 680 rebadge).
Alien: Isolation
If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.
For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.
Total War: Attila
The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.
For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.
Grand Theft Auto V
The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).
GRID: Autosport
No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.
GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.
Middle-Earth: Shadow of Mordor
The final title in our testing is another battle of system performance with the open world action-adventure title, Shadow of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.
For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.
Gaming Benchmarks: High End
To satisfy our curiosity regarding high power and low power eDRAM based Xeons in gaming, we ran our regular suite through each processor. On this page are our results with the top models at their respective release dates – the GTX 980 and R9 290X. To answer some questions regarding our use of GTX 980s rather than GTX 980 Tis, the simple answer is that for long term platform testing, we need a consistent graphics setup which changes every couple of years. This is coupled with the difficulty of sourcing several cards at once from our contacts that have available budget to do so. At the time of this current cycle, the GTX 980 was NVIDIA’s top model and ASUS stepped up to the plate with a set of 980 Strix cards. Similarly, AMD provided directly two of MSI’s R9 290X 4GB models. When it comes time to update the cycle (and/or games), we try and test the new graphics on as many CPUs as possible. But this does take a substantial amount of time to set up each platform (X99, Z170, Z97, Z77, X79, X58, FM2+, AM3) and run the gauntlet of i7/i5/i3/FX/A10/A8 processors on each. That’s not to say it’s not fun, but it is a comparatively large time investment hence the perceived long generational delay (in terms of graphics) between GPU-on-CPU updates.
Alien: Isolation
If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.
For low end graphics, we test at 720p with Ultra settings, whereas for mid and high range graphics we bump this up to 1080p, taking the average frame rate as our marker with a scripted version of the built-in benchmark.
Total War: Attila
The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.
For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.
Grand Theft Auto V
The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).
GRID: Autosport
No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.
GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.
Middle-Earth: Shadow of Mordor
The final title in our testing is another battle of system performance with the open world action-adventure title, Shadow of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.
For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.
Intel Broadwell Xeon E3 v4 Conclusions
If you skipped to the end without looking at the benchmark data, we’re going to throw a few graphs in here for good measure.
We said at the start of this review that one of the key parts to examine was the main parallel to the 95W of the E3-1285 v4 and the E3-1276 v3 at 84W due to their near similar (but not ideal) thermal design power metrics. The differentiator here is that the E3 v3 has some extra frequency, but the E3 v4 range has eDRAM.
At least, we thought this should be the battle to be had, but it is clear from the results that something else is more interesting. Comparing the E3-1285 v4 at 95W to its lower power variant, the E3-1285L v4 at 65W, we see that the low power variant scores better on almost all benchmarks.
The only difference between these two processors aside from the 30W of TDP should be the 100 MHz gap in favor of the 95W part. We said at the beginning that this 100 MHz does not adequately explain 30W in the grand scheme of things, so the lower powered model must also have a substantially better voltage/frequency profile. This, as it happens, has some knock on side effects.
In a couple of CPU tests, the extra frequency wins. This boils down to only 3DPM and Sunspider, both tests that arguably are neither extensively pressing the processor nor exhaustive in their capabilities. But the lower power model, by virtue of the better binning, is able to keep its higher frequency turbo mode available for 100% of the time in our testing, ultimately giving a higher frequency and completing work quicker. This is despite the base frequency of the E3-1285 v4 being higher, and alluding to a variable turbo frequency profile based on power draw. To cap it off, the E3-1285L v4 is also $111 cheaper. So when the two processors are put side-by-side, the decision is obvious. We would choose the E3-1285L v4 every time.
This means the title fight should be between the 65W E3-1285L v4 and the E3-1276 v3 at 84W. Here it gets a little more edgy – the v4 here is technically 100 MHz above the i7-5775C which we looked at in our last Broadwell review, and we get more into a performance/power based efficiency discussion based on the TDP difference than the 95W vs 84W discussion.
In DRAM heavy scenarios such as WinRAR, which requires a large amount of cache to retain dictionary compression tables, the benefits of the eDRAM are easy to see. Benchmarks on the integrated graphics also win out due to the Iris Pro P6300. For discrete graphics, the Broadwell parts certain win over the v3 for efficiency at this point, with results between the two being almost identical. But the big one to note here is Photoscan in pure CPU mode, where Broadwell takes a minor lead – Photoscan uses a set of fifty two-dimensional photos with no depth information to create a three-dimensional imagine over several stages, so managing that data around the memory subsystem becomes a handful when there are 40,000 data points per picture in flight. This benchmark was suggested to us by an archivist at a national library who uses it to recreate models of the artifacts in their storage for external examination.
For all the other CPU tests, a dichotomy appears. The higher frequency v3 wins for compute driven performance, but data driven metrics (and efficiency) are the realm of Broadwell, Xeon or otherwise, as long as there is frequency to match.
As mentioned on the first page of the review, on the suggestion of a number of our readers and based on these interesting, we are looking into other avenues which are also data driven. Previously our Chrome compilation benchmark was a featured set piece in our testing, but has fallen away and a similar equivalent needs to be reintroduced. As a result, we are speaking to some users and looking into a series of tests of this nature that afford a repeatable and consistent point of analysis – ideally in an automated context if possible and encompassing a variety of projects and languages. At current there is no time framework for introduction, but September affords some time to focus on the project and then test a number of processors on it. Thoughts and suggestions should be forwarded to [email protected].