Original Link: https://www.anandtech.com/show/8679/intel-haswellep-xeon-12-core-review-e5-2650l-v3-and-e5-2690-v3



As part of our Haswell-EP coverage, the next two processors on our test beds are both 12 core variants. The E5-2650L V3 is a surprising monster, giving 12 Haswell cores at 1.8 GHz with 2.5 GHz turbo for only 65W, while the E5-2690 V3 extends the power budget to 135W for all 12 cores at a 2.6 GHz base frequency.

With Haswell-EP, the landscape of the server and workstation CPU has changed ever so slightly. In previous generations, Intel balanced core counts with frequency at the same power level. This makes a lot of sense, as you cannot add cores or frequency without drawing more power. What makes Haswell-EP feel different is a slight change in strategy as to how the new core designs are binned according to their characteristics.

As we explained in our Haswell-EP overview and the review of our 10c samples, all Intel Haswell-EP CPUs are derived from three main die designs. These die designs all look fairly similar, varying in how many columns or cores, but this allows Intel to keep the range of designs simple but also allows them to fuse off bad cores and keep yields high.

One of the other factors in high yields is binning - the process of separating the dies into those that require less voltage for the same frequency, those that need more voltage, or those with cores that are disabled. This gives CPU manufacturers room to adjust both voltage and frequency to hit a desired power point. With Haswell-EP, Intel is being very adventurous in its goals - the same die design underpins both the CPUs in today's review, but both dies have different voltage/frequency characteristics within their power target.

Statistically, the further away from the average yield point a die is, the more money Intel can charge for it because they will end up with less of them in that bin. It is not always that easy and depends also on demand and market positioning. As discussed in the 10c review, Intel's main competition is with itself, so it has to be able to convince old customers to upgrade. By being more aggressive with die binning, this allows customers to optimize and configure the design they want.

This is why I say that the general expectation of raising cores results in a lower frequency has shifted with Haswell-EP. By being aggressive with their binning strategy, Intel can approach customers and say 'how many cores / what frequency / what power window do you need? Choose two and we can tell you what we can do on the third'. This is why processors like the E5-2650L V3 exist. A low high core count (12 cores, 24 threads), reasonable frequency part (1.8 GHz base, 2.5 GHz turbo) that has a TDP of only 65W. It sounds rather crazy.

The Market

In the broadest sense, Intel tackles several markets with the entire Haswell-EP line. At the low end we get quad core parts with lots of cache and DDR4 support aimed at memory bandwidth contained applications that do not rely on core count. Moving on up we have workstation related SKUs, extended support units, off-roadmap designs for specific customers, a lot of SKUs that won't ever reach the shelves as individual parts, parts focused on 2P compute, and others for networking, infrastructure and compute-dependent storage.

Our two CPUs today, while both 12C/24T parts, are aimed at different markets as well.

The E5-2690 V3 sits in a stack of CPUs at a 2.3-2.6 GHz base frequency but vary in their core counts. The reason why a customer moves up and down this stack is more about cost vs. performance of the part itself, with a moderate rise in power consumption as cores are added. The E5-2690 V3 has a TDP of 135W, and this kind of performance stack has been very typical since Intel's Xeon class went multicore.

The E5-2650L V3, by virtue of its 'L' designation, sits in the low power segment. The combination of high performance and low power seems like an oxymoron, however it is targeted at code that can take advantage of more cores over more frequency. Some programs do not scale with additional frequency due to other bottlenecks in the system, such as memory accesses, but can take advantage of more cores. The E5-2650L V3 does this in 65W TDP, less than half of the E5-2690 V3.

Intel Xeon E5 2600 v3 SKU Comparison
Xeon E5 Cores/
Threads
TDP Clock Speed (GHz)
Base - Turbo 
Price
High Performance (35-45MB LLC)
2699 v3 18/36 145W 2.3-3.6 $4115
2698 v3 16/32 135W 2.3-3.6 $3226
2697 v3 14/28 145W 2.6-3.6 $2702
2695 v3 14/28 120W 2.3-3.3 $2424
"Advanced" (20-30MB LLC)
2690 v3 12/24 135W 2.6-3.5 $2090
2685 v3 12/24 120W 2.6-3.5 $2090
2680 v3 12/24 120W 2.5-3.3 $1745
2660 v3 10/20 105W 2.6-3.3 $1445
2658 v3 (E) 12/24 105W 2.2-2.9 $1832
2650 v3 10/20 105W 2.3-3.0 $1167
Midrange (15-25MB LLC)
2640 v3 8/16 90W 2.6-3.4 $939
2630 v3 8/16 85W 2.4-3.2 $667
2620 v3 6/12 85W 2.4-3.2 $422
Frequency optimized (10-20MB LLC)
2687W v3 10/20 160W 3.1-3.5 $2141
2667 v3 8/16 135W 3.2-3.6 $2057
2643 v3 6/12 135W 3.4-3.7 $1552
2637 v3 4/8 135W 3.5-3.7 $996
Budget (15MB LLC)
2609 v3 6/6 85W 1.9 $306
2603 v3 6/6 85W 1.6 $213
Power Optimized (20-30MB LLC)
2650L v3 12/24 65W 1.8-2.5 $1329
2648L v3 (E) 12/24 75W 1.8-2.5 $1544
2630L v3 8/16 55W 1.8-2.9 $612

In terms of upfront cost, the E5-2690 V3 follows the pattern of rising cost compared to the other 2.3-2.6 GHz parts and sits around the $2090 mark. The E5-2650L V3 however comes across as a bit expensive to begin with (~$1329), but the idea here is that for a 2P system limited to 135W, you can either have a single E5-2690 V3 with 12 cores at 2.6 GHz for $2090, or two E5-2650L V3 CPUs with 24 cores at 1.8 GHz for $2660. If you don't mind the ~30% price difference and have the software to take advantage, it is almost a no-brainer comparison.

There is a distinct rider on this however, as it comes down to the software in use. As we have discovered over the last couple of years, most off the shelf software does not scale to more sockets, or even worse, reduces in performance due to memory mismanagement across the two CPUs. So on paper, if the software used ticks all the boxes, a dual E5-2650L V3 system sounds like a good deal.

The frequency response of the 12-core lineup for E5 26xx v3 CPUs puts the two L CPUs at the same frequency but a 10W difference in power consumption. The E5-2690 v3 heads the top of the list, with the 2685 v3 and 2670 v3 taking the biggest drop between single core and multi-core performance. The 2670 v3 spreads the drop over more cores in use, finally reaching its multi-core frequency when half the CPU is loaded, rather than 5/12.

Test Setup

For our testing, it is worth noting that our CPU samples arrived at different times and due to the testing setup at the time, certain benchmarks were unable to be run due to updates required. We were also able to source a second E5-2650L V3 and a 2P motherboard, allowing the comparison between the two CPUs on their own and a 130W combination.

Test Setup
Processor Intel Xeon E5-2690 V3 (135W), 12C/24T: 2.6 GHz (3.5 GHz Turbo)
Intel Xeon E5-2650L v3 (65W), 12C/24T: 1.8 GHz (2.5 GHz Turbo)
Motherboards ASUS X99-Deluxe
ASRock X99 Extreme6
GIGABYTE MD60-SC0
Cooling Cooler Master Nepton 140XL
Corsair H80i
Thermalright TRUE Copper
Power Supply OCZ 1250W Gold ZX Series
Corsair AX1200i Platinum PSU
Memory ADATA XPG Z1 DDR4-2400 8x8 GB 1.2V
Corsair DDR4-2133 C15 4x8 GB 1.2V
G.Skill Ripjaws 4 DDR4-2133 C15 4x8 GB 1.2V
Memory Settings JEDEC @ 2133
Video Cards AMD R7 240 DDR3
Video Drivers AMD Catalyst 13.11
Hard Drive OCZ Vertex 3 256GB
Optical Drive LG GH22NS50
Case Open Test Bed
Operating System Windows 7 64-bit SP1

Many thanks to...

We must thank the following companies for kindly providing hardware for our test bed:

Thank you to OCZ for providing us with PSUs and SSDs.
Thank you to G.Skill and ADATA for providing us with memory.
Thank you to Corsair for providing us with memory, an AX1200i PSU and a Corsair H80i CLC.
Thank you to MSI for providing us with the NVIDIA GTX 770 Lightning GPUs.
Thank you to Rosewill for providing us with PSUs and RK-9100 keyboards.
Thank you to ASRock for providing us with some IO testing kit.
Thank you to Cooler Master for providing us with Nepton 140XL CLCs.

Load Delta Power Consumption

Power consumption was tested on the system with a wall meter connected to the power supply. This power supply is Gold rated, and as I am in the UK on a 230-240 V supply, leads to ~75% efficiency > 50W, and 90%+ efficiency at 250W, suitable for both idle and multi-GPU loading. This method of power reading allows us to compare the power management of the UEFI and the board to supply components with power under load, and includes typical PSU losses due to efficiency.

We take the power delta difference between idle and load as our tested value, giving an indication of the power increase from the CPU when placed under stress. Unfortunately due to the timing of our testing, we were unable to check the power difference of the E5-2690 v3.

Power Consumption Delta: Idle to AVX

Overclocking on a Xeon?

Similar to our last Xeon review, multiplier overclocking on the E5-2600 V3 series is disabled.  Nevertheless with the right motherboard users can adjust the BCLK to add a few percent more performance within the stock frequency. Our E5-2600 V2 testing yielded 110 MHz AVX stable for a +10% boost, but the 10-core E5-2600 V3 CPU we tested only gave 104 MHz.  Today we are testing the E5-2650L V3, which should arguably have at least the power headroom if the stock voltage is set a little high by Intel.

Similarly to the E5-2687W V3, the moment we hit a frequency the system does not like, it returns as a complete failed POST rather than a BSOD.  However, we did manage a 5% boost in base frequency, giving a 4.4% rise in POV-Ray scores. The load voltage remained constant, and despite the frequency raising the power margins were all within a 9W window from stock usage.



Linux Performance

Built around several freely available benchmarks for Linux, Linux-Bench is a project spearheaded by Patrick at ServeTheHome to streamline about a dozen of these tests in a single neat package run via a set of three commands using an Ubuntu 14.04 LiveCD. These tests include fluid dynamics used by NASA, ray-tracing, OpenSSL, molecular modeling, and a scalable data structure server for web deployments. We run Linux-Bench and have chosen to report a select few of the tests that rely on CPU and DRAM speed.

Unfortunately due to the different time windows we had these CPUs and the time of introduction of Linux-Bench into the normal testing suite, we only have results for the 2P and 1P E5-2650L V3 configurations.

C-Ray: link

C-Ray is a simple ray-tracing program that focuses almost exclusively on processor performance rather than DRAM access. The test in Linux-Bench renders a heavy complex scene offering a large scalable scenario.

Linux-Bench c-ray 1.1 (Hard)

p7zip Compression and Decompression

7-Zip is a common open source compression and decompression tool for data transfer. 

Linux-Bench 7-Zip Compression

Linux-Bench 7-Zip Decompression

NAMD, Scalable Molecular Dynamics: link

Developed by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign, NAMD is a set of parallel molecular dynamics codes for extreme parallelization up to and beyond 200,000 cores. The reference paper detailing NAMD has over 4000 citations, and our testing runs a small simulation where the calculation steps per unit time is the output vector.

Linux-Bench NAMD Molecular Dynamics

NPB, Fluid Dynamics: link

Aside from LINPACK, there are many other ways to benchmark supercomputers in terms of how effective they are for various types of mathematical processes. The NAS Parallel Benchmarks (NPB) are a set of small programs originally designed for NASA to test their supercomputers in terms of fluid dynamics simulations, useful for airflow reactions and design.

Linux-Bench NPB Fluid Dynamics

OpenSSL Sign/Verify

OpenSSL is the platform that secures the majority of the websites we visit, and being able to issue/verify this security is paramount to performance. We test the rates at which OpenSSL certificates are signed and verified.

Linux-Bench OpenSSL Sign

Linux-Bench OpenSSL Verification

Redis: link

Many of the online applications rely on key-value caches and data structure servers to operate. Redis is an open-source, scalable web technology with a srtong developer base, but also relies heavily on memory bandwidth as well as CPU performance.

Linux-Bench Redis Memory-Key Store, 1x

Linux-Bench Redis Memory-Key Store, 10x

Linux-Bench Redis Memory-Key Store, 100x



CPU Benchmarks

The dynamics of CPU Turbo modes, both Intel and AMD, can cause concern during environments with a variable threaded workload. There is also an added issue of the motherboard remaining consistent, depending on how the motherboard manufacturer wants to add in their own boosting technologies over the ones that Intel would prefer they used. In order to remain consistent, we implement an OS-level unique high performance mode on all the CPUs we test which should override any motherboard manufacturer performance mode.

HandBrake v0.9.9: link

For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container. Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.

HandBrake v0.9.9 LQ Film

HandBrake v0.9.9 2x4K

With the low single core speed of the 2650L v3, processing smaller frames can be rather slow. When the size of the frames is larger, and reside in the computational parts of the cores for longer, a better speedup is achieved. However, the 2690 v3 still wins out.

Agisoft Photoscan – 2D to 3D Image Manipulation: link

Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.

Agisoft PhotoScan Benchmark - Total Time

Photoscan is part single thread and part multithread, and the combination of the two puts the 2690 v3 well ahead of the 2650L v3 x2 despite double the cores.

Dolphin Benchmark: link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.

Dolphin Emulation Benchmark

Dolphin is purely single threaded, and the 2690 v3 sits on par with the 5820K.

WinRAR 5.0.1: link

WinRAR 5.01, 2867 files, 1.52 GB

PCMark8 v2 OpenCL

A new addition to our CPU testing suite is PCMark8 v2, where we test the Work 2.0 suite in OpenCL mode.

PCMark8 v2 Work 2.0 OpenCL with R7 240 DDR3

Hybrid x265

Hybrid is a new benchmark, where we take a 4K 1500 frame video and convert it into an x265 format without audio. Results are given in frames per second.

Hybrid x265, 4K Video

For x265 conversion, more cores rules the roost and we see the 2650L v3 x2 setup surpass the 2690 v3. Although the 0.17 FPS difference might not be worth the $600 outlay depending on how much x265 encoding you do.

Cinebench R15

Cinebench R15 - Single Threaded

Cinebench R15 - Multi-Threaded

3D Particle Movement

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

FastStone Image Viewer 4.9

FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and results are given in seconds.

FastStone Image Viewer 4.9

Web Benchmarks

General usability is a big factor of experience, especially as we move into the HTML5 era of web browsing. For our web benchmarks, we take four well known tests with Chrome 35 as a consistent browser.

Sunspider 1.0.2

Sunspider 1.0.2

Mozilla Kraken 1.1

Kraken 1.1

WebXPRT

WebXPRT

Google Octane v2

Google Octane v2



Gaming Benchmarks

While the last thought on the minds of most Xeon users is related to gaming, we frequently get requests to test gaming performance on Xeons.  As a result we strap the Xeon to a regular consumer level motherboard that can support them and add in one or two GPUs to see how they perform and if more cores makes a difference over the drop in frequency. Unfortunately due to the orientation of the PCIe slots on the 2P board, we were unable to test the dual E5-2650L v3 configuration.

F1 2013

First up is F1 2013 by Codemasters. I am a big Formula 1 fan in my spare time, and nothing makes me happier than carving up the field in a Caterham, waving to the Red Bulls as I drive by (because I play on easy and take shortcuts). F1 2013 uses the EGO Engine, and like other Codemasters games ends up being very playable on old hardware quite easily. In order to beef up the benchmark a bit, we devised the following scenario for the benchmark mode: one lap of Spa-Francorchamps in the heavy wet, the benchmark follows Jenson Button in the McLaren who starts on the grid in 22nd place, with the field made up of 11 Williams cars, 5 Marussia and 5 Caterham in that order. This puts emphasis on the CPU to handle the AI in the wet, and allows for a good amount of overtaking during the automated benchmark. We test at 1920x1080 on Ultra graphical settings.

F1 2013 SLI, Average FPS


Bioshock Infinite

Bioshock Infinite was Zero Punctuation’s Game of the Year for 2013, uses the Unreal Engine 3, and is designed to scale with both cores and graphical prowess. We test the benchmark using the Adrenaline benchmark tool and the Xtreme (1920x1080, Maximum) performance setting, noting down the average frame rates and the minimum frame rates.

Bioshock Infinite SLI, Average FPS


Tomb Raider

The next benchmark in our test is Tomb Raider. Tomb Raider is an AMD optimized game, lauded for its use of TressFX creating dynamic hair to increase the immersion in game. Tomb Raider uses a modified version of the Crystal Engine, and enjoys raw horsepower. We test the benchmark using the Adrenaline benchmark tool and the Xtreme (1920x1080, Maximum) performance setting, noting down the average frame rates and the minimum frame rates.

Tomb Raider SLI, Average FPS


Sleeping Dogs

Sleeping Dogs is a benchmarking wet dream – a highly complex benchmark that can bring the toughest setup and high resolutions down into single figures. Having an extreme SSAO setting can do that, but at the right settings Sleeping Dogs is highly playable and enjoyable. We run the basic benchmark program laid out in the Adrenaline benchmark tool, and the Xtreme (1920x1080, Maximum) performance setting, noting down the average frame rates and the minimum frame rates.

Sleeping Dogs SLI, Average FPS


Battlefield 4

The EA/DICE series that has taken countless hours of my life away is back for another iteration, using the Frostbite 3 engine. AMD is also piling its resources into BF4 with the new Mantle API for developers, designed to cut the time required for the CPU to dispatch commands to the graphical sub-system. For our test we use the in-game benchmarking tools and record the frame time for the first ~70 seconds of the Tashgar single player mission, which is an on-rails generation of and rendering of objects and textures. We test at 1920x1080 at Ultra settings.

Battlefield 4 SLI, Average FPS




Intel Haswell-EP 12 Cores Conclusion

At the beginning of the review we outlined what seemed like a shift in Intel's binning strategy when it comes to Haswell-EP. Given the lack of competition for in most areas of the server market (there are areas where other cores make more sense) Intel could do one of two things. The first option, stagnation, allows competitors to catch up while at the same time allowing Intel to perhaps improve on designs with little gain and little cost to research and development. Though one issue with stagnation, when you are your only competition, ultimately results in customers not wanting to upgrade and reduces revenue. The other option is diversification, and this is what we think is happening with the wide range of models available with Haswell-EP.

By discussing with their big customers, Intel now has several dozen CPU models for only three general dies. Most of these models are part of a road map and designed to improve yields, while others are for specific customers only or have longer life support. With the die product volume Intel has in its fabs, it has the opportunity to be very aggressive with its binning implementation, resulting in several high core, high frequency or low power models. In previous generations, customers could only pick one of those three, but with Haswell-EP most customers can hone in on at least two preferences.

Today we examined two of Intel's 12 core options: the 'typical' stack model in the E5-2690 V3, offering a 2.6 GHz base frequency at 135W for $2090, and the low power optimized E5-2650L V3 which offers a 1.8GHz base frequency, 2.5 GHz turbo, at 65W for $1330.

As expected, the E5-2690 V3 performed better in all of our benchmarks by virtue of the frequency advantage during both single core and multicore operation. In terms of performance for price, or performance per power, the E5-2650L V3 wins out here. At only 65% of the cost, and just under 50% of the power, the cost per computation per watt falls firmly on the side of the E5-2650L V3. Although for compute limited throughout, in absolute terms, the E5-2650L V3 still ends up slower and one of the weakest Xeons we have tested due to the frequency difference. One of the important factors missed by the performance-per-watt analogy is how many units of work, and the cost of that work is, per unit time.

As part of this review we were able to source two E5-2650L V3 CPUs with a dual CPU motherboard and offer a more competitive analysis at 130-135W total TDP against the E5-2690 V3. In these results, anything that was single threaded still fell on the side of the E5-2690 V3 due to its single core frequency advantage. Despite our predictions, relatively few software packages that we came across were won by the dual E5-2650L V3 system, making the E5-2690 V3 the CPU of choice. This was especially true for software that could use multiple cores but could not take advantage of a 2P arrangement. The best example of this is video conversion at low quality which relies on a lot of crosstalk between encoded frames and memory accesses that might not be NUMA aware.  The only reason in this regard to choose a 2x E5-2650L V3 over a 1x E5-2690 V3 would be for the double memory support, or for the few benchmarks where the 2P configuration won (Hybrid x265, Cinebench R15). The single CPU option is cheaper, easier to manage and works better across the board. 

Our next installment of Haswell-EP coverage involves two 14 core models. The 14 core design is a little odd, being outside of the regular scope normally considered for core counts (12/16) and representing a semi-irregular arrangement of cores. Ultimately it relies on the 18-core die being cut down. As always, our CPU review data from past and present can be found in the CPU section of our results comparison database, Bench

Log in

Don't have an account? Sign up now