Original Link: https://www.anandtech.com/show/10337/the-intel-broadwell-e-review-core-i7-6950x-6900k-6850k-and-6800k-tested-up-to-10-cores
The Intel Broadwell-E Review: Core i7-6950X, i7-6900K, i7-6850K and i7-6800K Tested
by Ian Cutress on May 31, 2016 2:01 AM EST- Posted in
- CPUs
- Intel
- Enterprise
- Prosumer
- X99
- 14nm
- Broadwell-E
- HEDT
What would you do with more CPU cores? This is a question I see posted from an Intel employee on a yearly basis, and it actually is a difficult question to answer depending on your computing background. A gamer might not need more than four or six, and a number of workstation use cases are now GPU accelerated. Anyone never in a pure compute situation might not need more than four or six cores.
But what about virtual machines, complex encoding, or non-linear functional compute? How many cores are too many? Intel has recently released the Broadwell-EP based Xeon E5-2600 v4 processors, running up to 22 cores, and the smaller silicon die used for the 10-core parts has today filtered down to the prosumer and high-end desktop (HEDT) markets in four different parts, making up the Core i7 6800 and 6900 series. For today's review we'll be taking a look at all four.
Broadwell-E: The Information
Intel’s HEDT platform cadence has been slowly falling further and further out of phase with their latest CPU microarchitectures. In 2015 we spoke extensively about the successor to Broadwell, Skylake, coming to the consumer and mobile platforms. Now it is 2016, and the HEDT discussion brings us back to Broadwell in the form of Broadwell-E. This out of step cadence occurs for several reasons, namely that the HEDT market is more an extension of the bottom of the enterprise/server market, rather than the extending the reach of the mainstream market. In the enterprise market, customers request stability and updates to appear at regular intervals with sufficient longevity of each platform.
The enterprise market, as Johan reviewed a few weeks ago, uses the name ‘Broadwell-EP’ and comes in three silicon floor plans depending on how many cores are in the final product. Broadwell-E, the HEDT set of processors, takes the smallest 10-core design and splits this into four SKUs to be used in consumer grade X99 motherboards. Most motherboard manufacturers will be releasing their second generation X99 motherboards for this launch, and some have done so already. We have an accompanying piece with this review going over the feature sets of as many new boards as we could find.
But here are the four new SKUs: the 10-core i7-6950X, the 8-core i7-6900K, the 6-core i7-6850K and the 6-core i7-6800K:
Intel i7 HEDT Lineup | |||||||
6950X | 6900K | 6850K | 6800K | 5960X | 4960X | 3960X | |
Cores | 10 | 8 | 6 | 6 | 8 | 6 | 6 |
Threads | 20 | 16 | 12 | 12 | 16 | 12 | 12 |
Base CPU Freq | 3.0 GHz | 3.2 GHz | 3.6 GHz | 3.4 GHz | 3.0 GHz | 3.6 GHz | 3.3 GHz |
Turbo CPU Freq | 3.5 GHz | 3.7 GHz | 3.8 GHz | 3.6 GHz | 3.5 GHz | 4.0 GHz | 3.9 GHz |
TDP | 140W | 130W | |||||
Memory Freq. | DDR4-2400 | DDR4 2133 |
DDR3 1866 |
DDR3 1600 |
|||
L3 Cache | 25MB | 20MB | 15MB | 15MB | 20MB | 15MB | 15MB |
PCIe Lanes | 40 | 40 | 40 | 28 | 40 | 40 | 40 |
Arch. | Broadwell-E | HSW-E | IVB-E | SNB-E | |||
Price | $1723 | $1089 | $617 | $434 | $999 | $990 | $990 |
There’s a lot of information here to dissect, so let’s start with the one that will catch the most attention: the price.
(1) Price
In order to further separate the high-end desktop platform from their mainstream platforms, Intel has adjusted the pricing of the new Broadwell-E CPUs relative to the previous generations.
The top line 10-core CPU, the i7-6950X, comes in at $1723 at 1k tray pricing, with consumer pricing expected to be nearer $1749 or $1799 depending on stock levels and availability. This is a marked increase from previous top model Extreme Edition processors in the past, which Intel has offered at $999 ($1049 retail). The reasons for the change are somewhat unclear: one could argue that it’s a larger die and costs more to make, but this is the first 14nm HEDT part and should be smaller than an equivalent Haswell-E design. The main reason that springs to mind is simply market and price segmentation – Intel will state that enthusiasts have been asking for more in the overclockable high-end consumer space, so here it is (and here’s the price).
The 10-core is a full $634 more than the 8-core i7-6900K, meaning a 58% increase in price for only 25% increase in cores. We all know that once you reach the high-end, the price/performance curve goes off in a silly direction but if you want to keep the cream of the crop, there is an extra charge.
The i7-6950X also gets a unique retail box compared to the other processors, in a sleek black with gold lettering. This combination of colors tends to go down well with whoever loves gold, perhaps indicating that Intel is looking at a new kind of premium customer.
Moving on to the 8-core i7-6900K, and this part comes in above the previous top price: $1089 compared to $999. This makes it a rough upgrade from any 5960X users when the benefits are limited to the upgraded microarchitecture. As would be expected, while it has fewer cores in play than the i7-6950X, it still allows the base and turbo frequencies to be overclocked for users who run these processors above stock speeds. The processor is still unlocked, but it means a couple of things for users who have a Haswell-E system based on the i7-5960X already: spending another $1000 gets you no extra cores and the same chipset.
Ultimately one could argue that Intel is looking more to Nehalem/Westmere and Sandy Bridge-E users with Broadwell-E, as nearly every Intel presentation usually goes to quote current rates of 3-to-5 year-old systems that need upgrading. There are a few other features that users of Haswell-E systems might be interested in however, which we will cover.
The i7-6850K and i7-6800K are priced at $617 and $434 respectively, marking an increase in price in order to get onto the HEDT ladder from previous generations. These are six core processors, like the i7-5930K and i7-5820K from the previous generation, with similar limitations. The issue Intel has here is that the i7-5820K has been regularly running close or below its MSRP, making it an easy purchase for users that want a HEDT system (if only for the memory support). Making the base i7-6800K a $434 part rather than a ~$320 part means that a minimum outlay for a system (i.e. motherboard + CPU) is closer to $600 than $400, and less attractive to the top segment of mainstream users.
(1b) Xeon Price Competition
I would point out here that Intel is somewhat shooting itself in the foot with the pricing on the i7-6950X. The recently released Xeon Broadwell-EP processor list includes the Xeon E5-2640 v4: a 10-core 2.4 GHz/3.4 GHz part that runs at 90W, and is priced at $939, which compares favorably to the i7-6950X and its 10-cores at a 3.0 GHz/3.5 GHz clockspeeds. And because it’s a Xeon E5 processor processor, with the right motherboard a user can put two into the same machine for 20 cores/40 threads for only $1878, or only $150 more than the 10-core i7-6950X. There are other costs going into the dual processor market, but it opens up a wide margin.
Or how about finding a Xeon at a similar price? Try the Xeon E5-2680 v4 for $1745, which has 14-cores with hyperthreading for 28 threads, runs at 120W TDP and a 2.4 GHz to 3.3 GHz peak frequency. While the frequency is slightly down, that’s a 40% increase in cores for $22, gaining ECC support and a couple of extra hypervisor features in exchange for limiting DRAM to DDR4-2133 and losing overclocking.
The one downside to going the Xeon route is Xeons are only sold as OEM chips with a limited warranty. Intel only wants to sell these through vendors, so very few models ever make it into the retail market (and not always the ones you want).
(2) PCIe Lanes
When Intel introduced Haswell-E, it experimented with a new type of product separation: it also varied the number of CPU-host PCIe lanes among the SKUs. This practice continues in Broadwell-E, in an almost identical fashion. The lowest end CPU has 28 PCIe 3.0 lanes, capable of three-way GPU setups (and no 2x16 setups), while the other processors have a full 40 PCIe 3.0 lanes, allowing for four-way GPU setups or different combinations therein.
Typically the number of users investing in 3-way or 4-way graphics arrangements that implement SLI or Crossfire is very small, with many developers not bothering to optimize for multi-GPU beyond two graphics cards, so it could easily be a moot point for most users. That being said, fewer PCIe lanes means that some slots will have to operate at half bandwidth as a user expands, or users interested in many PCIe devices (GPUs with storage, RAID, and/or 10 gigabit Ethernet) might run out of lanes quicker. But therein lies the product segmentation – if a user need more PCIe lanes, then they need to spend a further $175 for the next processor up.
Back in our Haswell-E review, we did test the effect of having 28 PCIe lanes compared to 40 PCIe lanes on SLI and Crossfire graphics arrangements (or having PCIe 3.0 x16/x8 compared to PCIe 3.0 x16/x16). We found a sub-1% differential for dual-GPU gaming at the time. Due to timing we haven’t repeated the tests with Broadwell-E, and accept that it might matter more with DX12, but ideally we need more games on the market that support the different DX12 multi-GPU modes.
(3) Official Memory Support Increased to DDR4-2400
Within a single socket generation, we typically do not see a change in memory support for successive generations of Intel processors. The ‘official memory support’ of a processor defines the base JEDEC frequency and is the sole guaranteed frequency for the processor that meets the qualified error rates. The reality is that most processors will support faster memory, hence companies like Corsair, G.Skill, Kingston and others offer DDR4-3000 memory kits or faster, for those with bigger wallets. The reason why a CPU manufacturer does not qualify memory at that speed comes down to a number of factors, but as we mentioned, seeing a change within a socket is relatively rare.
One of the new Broadwell-E focused memory modules, a G.Skill TridentZ 16GB DDR4-3200 stick
In this case, the LGA2011-3 socket supports Haswell-E and Broadwell-E processors. The official memory speed supported for Haswell is DDR4-2133, and in our CPU/motherboard tests this is the speed we have been using. For Broadwell-E, this moves up to DDR4-2400, and we’ve sourced DDR4-2400 kits specifically for our testing. Ultimately the performance gain from increasing the memory speed by this small of an amount is minimal for most operations But for those specific use cases that require fast memory (WinRAR compression, RAMDisk/RAMCaches, in-memory virtual machines), they will sometimes see a benefit.
Who is Broadwell-E Aimed At?
Even just looking at the specification sheet, for anyone currently on a modern HEDT system it would be hard to see the value of investing in Broadwell-E unless peak performance is required at any cost. Intel's promotional pitch at launch, and through the length of the platform, will be aimed at users currently still on a Nehalem/Westmere (or even Sandy Bridge-E/Ivy Bridge-E) who want to move up from 4/6 cores and start using the features of the latest X99 platform:
Because of the pricing, it’s clear that the complete cost for a 10-core consumer machine, including memory, storage and graphics, will be a minimum of $2300 for a system with a basic GPU, or nearer $3000 for a high-end gaming platform. Meanwhile we may see Haswell-E in the channel for sale at the lower cost for at least a little while longer, in which case it may be seen as the more price attractive purchase at this time. Although aside from the main specifications, Intel will be hoping that some of the new features will also entice potential purchases.
Turbo Boost Max 3.0 (TBM3), aka Turbo Boost Max or Turbo Boost 3.0
When Intel released the enterprise focused Broadwell-EP Xeon CPUs, there were a few features added to the platform over the previous Haswell-EP generation. One of these has come through to the consumer parts, though in a slightly different form.
For Broadwell-EP, one of the new features was the ability to have each core adjust the frequency independently depending on AVX or non-AVX workloads. Previously when an AVX load was detected, all the cores would reduce in frequency, but beginning with BDW-EP now they act separately. Intel has taken this enterprise feature and expanded it a little into a feature they're calling ‘Turbo Boost Max 3.0’.
Turbo Boost 2.0 is what Intel calls its maximum Turbo or ‘peak’ frequency. So in the case of the i7-6950X, the base frequency is 3.0 GHz and the Turbo Boost 2.0 frequency is 3.5 GHz. The CPU will use that frequency when light workloads are in play and decrease the frequency of the cores as the load increases in order to keep the power consumption more consistent. Turbo Boost 2.0 frequencies are advertised alongside the CPU on the box - TBM3 will be slightly different and not advertised.
TBM3, in a nutshell, will boost the frequency of a single CPU core when a single-threaded program is being used.
It requires a driver, similar to Skylake’s Speed Shift feature (which is not in Broadwell-E), which should be distributed in new X99 motherboard driver packages, but will also be rolled out in Windows 10 in due course. It also comes with a user interface, which might make it easier to explain:
Each of the cores in the processor can be individually accessed by the OS with the new driver, and the cores will be rated based on their performance and efficiency as they come out of Intel. In the image above, Core 9 is rated the best, with Core 0 at the bottom. This means that for TBM3, the driver will primarily use Core 9.
When enabled, TBM3 will activate in two modes: either the foreground application, or from a priority list. For the foreground selection, when the software detects a single threaded workload in play, it will attempt to pin the software to the best core (similar to changing the affinity in task manager to one core), and then boost the frequency. In priority mode, the application will look for any application on the left-hand panel (which has to be added manually). If an application with higher priority is present, then the software will unpin the current software and take the higher priority one and pin that instead.
When pinned, the software will boost the frequency of that core only. The only question now is how much is the boost, and what is the effect on performance? Unfortunately, both of those questions have bad answers.
Intel refuses to state the effect of TBM3, saying that ‘each CPU is different and could boost by different amounts’. Now, you might think that makes sense. However…
Turbo Boost Max 3.0 has to be supported by the motherboard manufacturer in the BIOS. The TBM3 settings have to be set in the BIOS, which means that the usefulness of such a feature is actually down to the motherboard manufacturers. But they know how to do it right, right? Well, here’s where it can get worse.
On the MSI motherboard we used for most of our testing, Turbo Boost Max 3.0 was disabled by default in the BIOS. We asked about this, and they said it was a conscious decision made by management a couple of weeks prior. This makes TBM3 useless for most users who never even touch the BIOS. That sounds good, right?
Well, the BIOS also sets how much the CPU can boost by. So ultimately it doesn’t matter how much the CPU might like to boost in frequency, the system will only boost by the amount it says so in the BIOS, which is set by the motherboard manufacturer. In the case of the MSI BIOS, it was set to ‘Auto’. In my case, ‘Auto’ meant a boost of zero, despite the MSI BIOS ‘suggesting’ 4000 MHz. I had to manually set Core 9 to a 40x multiplier. Then it worked.
All in all, TBM3 was only enabled after I changed two settings and specifically setting the correct core in the BIOS. For me, this isn’t a global feature if that is the case. That’s not to mention how Multi-Core Turbo also comes into the mix, which still works with Turbo Boost 2.0 speeds by default. Based on what we've seen, it would seem at this time that TBM3 isn’t being readily embraced at this time.
It should be noted that we also had one of the new ASUS motherboards in for testing, however time was too limited before leaving for Computex to verify if this is the case on the ASUS motherboard as well. ASUS has told me that they have/will have a software package that enables TBM3 to be applied to multiple cores at once, whereas the Intel software will only accelerate a single program. It should be interesting to test.
The Reviewers Problem With Turbo Boost Max 3.0
In the options menu for TBM3, there are two primary options to take note of. The first is the utilization threshold, which is the % at which the software will take control of the single threaded application and pin it to a core. By default, this is set at 90%.
The other option is where a dilemma will be faced. It is the evaluation interval, or the period of time between checks that the software makes in order to accelerate a program. The version of the software we had started with a value of 10 seconds. That means if the software package starts one second or nine seconds into a benchmark run, it can affect the score. The answer here would be to make the evaluation interval very small, but the software only has a one-second resolution. So for benchmarks that run for only a few seconds (anyone benchmarking wPrime or SuperPi, for example), might either fail to be accelerated if the evaluation window is set at default, or only slightly when set at one second.
As you can imagine, if a reviewer does not know if TBM3 is enabled or not, there may be some odd benchmark results that seem different to what you might expect. It should be noted that because of the BIOS issue and the potential for motherboard manufacturers to do something different with every product, we ran our benchmarks with TBM3 disabled, and readers should check to see if reviewers specify how TBM3 is being used when data is published.
Package Differences: It’s Thin
When the Skylake mainstream platform was launched, it was noted that the processor packages and substrates were thinner compared to the previous generation. It would appear that Intel is using the same packaging technology for Broadwell-E as well.
On the left is the Haswell-E based Core i7-5960X, and on the right is the Broadwell-E based Core i7-6950X. Both of these platforms use a FIVR, the Fully Integrated Voltage Regulator, which Intel equipped on this microarchitecture in order to increase power efficiency. Usually the presence of the FIVR would require additional layers for power management in the package, but it would seem that Intel has been optimizing this to a certain extent. Each individual layer is certainly thinner, but it is likely that Intel has also reduced the number of layers, though my eyes cannot discern the resolution needed to see exactly how many are in each CPU (and I don’t have a microscope on hand to test).
A couple of questions will crop up from readers regarding the thinner package. Firstly, on the potential for bending the package, especially in regards to a minor story on Skylake where a couple of CPUs were found to have bent when under extreme cooler force. As far as Intel is aware, Broadwell-E should not have a problem for a number of reasons, but mostly related to the dual latch socket design and socket cooler implementation. Intel’s HEDT platforms, from Sandy Bridge-E on, have been rated as requiring 30-40% more pressure per square inch then the mainstream platforms. As a result the sockets have been designed with this in mind, ensuring the pressure of the latch and cooler stays on the heatspreader.
The other question that would come to my mind is the heatspreader itself. Intel has stated that they are not doing anything new with regards to the thermal interface material here compared to previous designs, and it is clear that the heatspreader itself is taller to compensate for the z-height difference in the processor PCB.
If we compare the ‘wing’ arrangement between the Haswell-E and Broadwell-E processors, Intel has made the layout somewhat more robust by adding more contact area between the heatspreader and the PCB, especially in the corners and sides. One would assume this is to aid the thinner PCB, although without proper stress testing tools I can’t verify that claim.
The Market
At this point in time, Intel is primarily competing with itself. Because the enterprise market requires consistency, the HEDT platform is constrained to that three year, two product cycle, which maintains enough consistency in socket compatibility to keep the enterprise partners happy. When Intel has 95%+ of the HEDT and x86 enterprise market, rather than increasing market share to generate revenue, Intel has to convince users on older systems that their new products are worth the investment. That’s an easy sell in the enterprise market, as time is money and total cost of ownership for a system is typically well documented for cyclical updates.
For HEDT, making that case to prosumers can be difficult. It depends on budgets and how applications are developing, especially when a number of popular professional software packages are (where possible) trying to leverage PCIe accelerators. There will always be a strong market for CPU performance, and there will always be a market for HEDT, depending on the price. But at some point the HEDT and Xeon markets do collide, and the two main factors on this are price and availability.
As mentioned earlier, the newly introduced Broadwell-E Core i7 parts collide in price with a number of Broadwell-EP Xeon parts, which could suggest that Intel wants to push potential prosumers (especially the professional ones) more into systems made by enterprise and workstation partners. These systems are typically sold with appropriate support, and the two platforms differ by a few features. The question becomes about who is buying HEDT: a number of users reading this will be gamers, and will not be interested in workstation sellers.
It’s a strange balance that Intel is trying to strike. Everyone wants more – whether they need it or not is a different conversation – but most enthusiasts say they want more. Intel states that as a company, it supports the gamers and the enthusiasts who want to push their consumer platforms to the fullest, and something like Broadwell-E does that. However a prohibitive price might reduce the potential number of next generation enthusiasts who want to play at the high-end.
X99 Refresh Motherboards
Throughout this month many of the regular motherboard manufacturers have either released, announced, or teased newer "refresh" motherboards using the LGA2011-3 socket and the X99 chipset. We’ve got a base roundup of all the new motherboards coming out of Computex planned, especially as new models are being announced and shown at the show. A couple of these landed on our desk for Broadwell-E testing, such as the MSI X99A Gaming Pro Carbon:
The Carbon is a relatively new brand for MSI’s motherboard range, typically on the high-end models, and this one aims for a deep black aesthetic that is enhanced through the additional LED lighting.
We also have in the ASUS X99-E-10G WS motherboard, ASUS’ high-end workstation and prosumer based motherboard that also integrates an Intel X550-T2 10 gigabit Ethernet chip offering two 10GBase-T ports. We’ve seen this before on the ASRock X99 WS-E/10G, which used the X540-T2, and required eight PCIe 3.0 lanes from the CPU to provide enough bandwidth. We were only able to test the ASUS 10G board for a couple of days before leaving for Computex, and will have a preview up shortly.
ASRock also sent us their X99X Killer, although the courier tried to deliver on a day where I spent 30 minutes gathering stuff for the Computex trip. Go figure. It’ll be ready to test when I get back!
This Review
As with every CPU launch, there are a number of different directions to take our review. In our review of the launch of the consumer Broadwell parts, the i7-5775C and the i5-5675C we examined the generational update over previous architectures, and thus won’t repeat those tests here. We have had almost every high-end desktop CPU since Sandy Bridge-E in-house at some point, although only the latest have been through our most recent benchmark suite. Due to timing, we were able to test all four of the new Broadwell-E processors, and retest the three Haswell-E processors, however we have a more limited dataset for comparison to Ivy Bridge-E, Sandy Bridge-E and Nehalem/Westmere. It will be interesting to see how the CPU performance for the HEDT has adjusted over the last five generations.
The other angle is the recent release of Intel’s Skylake mainstream focused processors, such as the i7-6700K and the i5-6600K, which feature a higher single core frequency but fewer cores and fewer memory channels, or the mainstream enthusiast focused Devil’s Canyon processors released back in July 2014. These have been tested on our latest range of benchmarks, and should make it clear where the latest mainstream-to-HEDT crossover should be.
Test Setup
Test Setup | |
Processor | Intel Core i7-6950X (10C/20T, 3.0-3.5 GHz) Intel Core i7-6900K (8C/16T, 3.2-3.7 GHz) Intel Core i7-6850K (6C/12T, 3.6-3.8 GHz) Intel Core i7-6800K (6C/12T, 3.4-3.6 GHz, 28 PCIe 3.0) |
Motherboards | MSI X99A Gaming Pro Carbon |
Cooling | Cooler Master Nepton 140XL |
Power Supply | OCZ 1250W Gold ZX Series Corsair AX1200i Platinum PSU |
Memory | G.Skill RipjawsX DDR4-2400 C15 4x16GB 1.2V |
Memory Settings | JEDEC @ 2400 |
Video Cards | ASUS GTX 980 Strix 4GB MSI R9 290X Gaming 4G MSI GTX 770 Lightning 2GB MSI R9 285 Gaming 2G ASUS R7 240 2GB |
Hard Drive | Crucial MX200 1TB |
Optical Drive | LG GH22NS50 |
Case | Open Test Bed |
Operating System | Windows 7 64-bit SP1 |
Many thanks to...
We must thank the following companies for kindly providing hardware for our test bed:
Thank you to AMD for providing us with the R9 290X 4GB GPUs.
Thank you to ASUS for providing us with GTX 980 Strix GPUs and the R7 240 DDR3 GPU.
Thank you to ASRock and ASUS for providing us with some IO testing kit.
Thank you to Cooler Master for providing us with Nepton 140XL CLCs.
Thank you to Corsair for providing us with an AX1200i PSU.
Thank you to Crucial for providing us with MX200 SSDs.
Thank you to G.Skill and Corsair for providing us with memory.
Thank you to MSI for providing us with the GTX 770 Lightning GPUs.
Thank you to OCZ for providing us with PSUs.
Thank you to Rosewill for providing us with PSUs and RK-9100 keyboards.
Office Performance: Extreme Editions
The dynamics of CPU Turbo modes, both Intel and AMD, can cause concern during environments with a variable threaded workload. There is also an added issue of the motherboard remaining consistent, depending on how the motherboard manufacturer wants to add in their own boosting technologies over the ones that Intel would prefer they used. In order to remain consistent, we implement an OS-level unique high-performance mode on all the CPUs we test which should override any motherboard manufacturer performance mode.
All of our benchmark results can also be found in our benchmark engine, Bench.
Dolphin Benchmark: link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.
WinRAR 5.0.1: link
Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.
3D Particle Movement
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.
Agisoft Photoscan – 2D to 3D Image Manipulation: link
Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.
HandBrake v0.9.9: link
For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container. Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.
Office Performance
Dolphin Benchmark: link
Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that ray traces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.
WinRAR 5.0.1: link
Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.
3D Particle Movement
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.
Web Benchmarks
On the lower end processors, general usability is a big factor of experience, especially as we move into the HTML5 era of web browsing. As browsing moves into a multithreaded arena and web applications get more advanced, it is all the more important to have an appropriate level of performance.
Mozilla Kraken 1.1
Google Octane v2
Professional Performance: Windows
Agisoft Photoscan – 2D to 3D Image Manipulation: link
Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.
Cinebench R15
Cinebench is a benchmark based around Cinema 4D, and is fairly well known among enthusiasts for stressing the CPU for a provided workload. Results are given as a score, where higher is better.
HandBrake v0.9.9: link
For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container. Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.
Hybrid x265
Hybrid is a new benchmark, where we take a 4K 1500 frame video and convert it into an x265 format without audio. Results are given in frames per second.
Linux Performance
Built around several freely available benchmarks for Linux, Linux-Bench is a project spearheaded by Patrick at ServeTheHome to streamline about a dozen of these tests in a single neat package run via a set of three commands using an Ubuntu 11.04 LiveCD. These tests include fluid dynamics used by NASA, ray-tracing, OpenSSL, molecular modeling, and a scalable data structure server for web deployments. We run Linux-Bench and have chosen to report a select few of the tests that rely on CPU and DRAM speed.
C-Ray: link
C-Ray is a simple ray-tracing program that focuses almost exclusively on processor performance rather than DRAM access. The test in Linux-Bench renders a heavy complex scene offering a large scalable scenario.
NAMD, Scalable Molecular Dynamics: link
Developed by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign, NAMD is a set of parallel molecular dynamics codes for extreme parallelization up to and beyond 200,000 cores. The reference paper detailing NAMD has over 4000 citations, and our testing runs a small simulation where the calculation steps per unit time is the output vector.
Redis: link
Many of the online applications rely on key-value caches and data structure servers to operate. Redis is an open-source, scalable web technology with a strong developer base, but also relies heavily on memory bandwidth as well as CPU performance.
[words]
Alien: Isolation
If first person survival mixed with horror is your sort of thing, then Alien: Isolation, based off of the Alien franchise, should be an interesting title. Developed by The Creative Assembly and released in October 2014, Alien: Isolation has won numerous awards from Game Of The Year to several top 10s/25s and Best Horror titles, ratcheting up over a million sales by February 2015. Alien: Isolation uses a custom built engine which includes dynamic sound effects and should be fully multi-core enabled.
Total War: Attila
The Total War franchise moves on to Attila, another The Creative Assembly development, and is a stand-alone strategy title set in 395AD where the main story line lets the gamer take control of the leader of the Huns in order to conquer parts of the world. Graphically the game can render hundreds/thousands of units on screen at once, all with their individual actions and can put some of the big cards to task.
For low end graphics, we test at 720p with performance settings, recording the average frame rate. With mid and high range graphics, we test at 1080p with the quality setting. In both circumstances, unlimited video memory is enabled and the in-game scripted benchmark is used.
Grand Theft Auto V
The highly anticipated iteration of the Grand Theft Auto franchise finally hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.
For our test we have scripted a version of the in-game benchmark, relying only on the final part which combines a flight scene along with an in-city drive-by followed by a tanker explosion. For low end systems we test at 720p on the lowest settings, whereas mid and high-end graphics play at 1080p with very high settings across the board. We record both the average frame rate and the percentage of frames under 60 FPS (16.6ms).
GRID: Autosport
No graphics tests are complete without some input from Codemasters and the EGO engine, which means for this round of testing we point towards GRID: Autosport, the next iteration in the GRID and racing genre. As with our previous racing testing, each update to the engine aims to add in effects, reflections, detail and realism, with Codemasters making ‘authenticity’ a main focal point for this version.
GRID’s benchmark mode is very flexible, and as a result we created a test race using a shortened version of the Red Bull Ring with twelve cars doing two laps. The car is focus starts last and is quite fast, but usually finishes second or third. For low end graphics we test at 1080p medium settings, whereas mid and high end graphics get the full 1080p maximum. Both the average and minimum frame rates are recorded.
Middle-Earth: Shadow of Mordor
The final title in our testing is another battle of system performance with the open world action-adventure title, Shadow of Mordor. Produced by Monolith using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity to a large extent, despite having to be cut down from the original plans. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.
For testing purposes, SoM gives a dynamic screen resolution setting, allowing us to render at high resolutions that are then scaled down to the monitor. As a result, we get several tests using the in-game benchmark. For low end graphics we examine at 720p with low settings, whereas mid and high end graphics get 1080p Ultra. The top graphics test is also redone at 3840x2160, also with Ultra settings, and we also test two cards at 4K where possible.
Load Delta Power Consumption
Power consumption was tested on the system while in a single MSI GTX 770 Lightning configuration with a wall meter connected to the OCZ 1250W power supply. This power supply is Gold rated, and as I am in the UK on a 230-240 V supply, leads to ~75% efficiency > 50W, and 90%+ efficiency at 250W, suitable for both idle and multi-GPU loading. This method of power reading allows us to compare both the power management of the BIOS and the board's ability to supply components with power under load, and includes typical PSU losses due to efficiency.
Each of the Broadwell-E SKUs are rated at 140W, however they vary between 6 cores and 10 cores and with different frequencies. Normally one would assume that the core/frequency ratio would be adjusted to match TDP, but ultimately using more cores can consume more power. We see a distinct increase in power consumption moving up the product stack.
Prime95 Core Loading
For this review, we also looked into peak delta power draw when varying the number of cores using Prime95’s mode for peak power consumption. Prime95 identifies cores with multiple threads and adjusts its loading/pinning accordingly.
Broadwell-E Overclocking
Methodology
Our standard overclocking methodology is as follows. We select the automatic overclock options and test for stability with PovRay and OCCT to simulate high-end workloads. These stability tests aim to catch any immediate issues with memory or CPU errors.
For manual overclocks, based on the information gathered from previous testing, we start off at a nominal voltage and CPU multiplier, and the multiplier is increased until the stability tests are failed. The CPU voltage is increased gradually until the stability tests are passed, and the process repeated until the motherboard reduces the multiplier automatically (due to safety protocols) or the CPU temperature reaches a stupidly high level (100ºC+). Our test bed is not in a case, which should push overclocks higher with fresher (cooler) air.
Overclock Results
Due to time constraints we were only able to overclock the i7-6950X using the MSI X99A Gaming Carbon motherboard. MSI has improved its overclocking options as of late on the Z170 platform to make it easier to use, but our BIOS did not have those most recent updates, particularly for load line calibration. However, our sample hit 4.1 GHz at 1.30 volts before the OCCT load temperatures were prohibitive to move up any further. We saw similar things when testing the mainstream Broadwell parts with Iris Pro, which shows that this sort of overclocking performance might be indicative of the silicon itself.
That being said, speaking with our contacts at various motherboard manufacturers, we're told that 4.1 GHz is a reasonably average processor result for Broadwell-E. Some processors will hit 4.3 GHz on air at around the same voltage, whereas others need up to 1.4 volts, and thus results will depend on the cooling setup used or the thermal characteristics of the silicon. I have also been told that AVX is a different story: for any peak frequency attained normally, AVX overclock stable frequencies will be around 200-300 MHz lower.
Catching Up: How Intel Can Re-Align Consumer and HEDT
Earlier in this piece I stated three reasons why the enterprise market has an out of step cadence with the latest CPU microarchitecture: product stability, regular releases, and platform longevity.
To get stability, using Intel’s tried and tested core makes sense, rather than the latest and greatest. The longevity of each enterprise platform is such that each socket and chipset generation must last for two CPU cycles, allowing a potential upgrade path, but also means that customers aren’t ripping out their installations every 12-18 months with fresh new ones in order to beat the competition. Also, by being behind the mainstream platform at a slightly slower refresh rate, it allows the release of enterprise CPUs to compensate for any process delay on the latest architecture.
But at this point, we are now a generation and a year behind the mainstream and latest microarchitecture. There are features in the latest mainstream Skylake CPUs, such as Speed Shift (the ability to react to high priority frequency requests up to 20x faster to save power and improve user experience), that are not in the enterprise and HEDT products. If the out-of-step and slower cadence continues, we could be two generations behind fairly easily. However, Intel has (inadvertently) developed a get-out-of-jail free card here.
Earlier in the year we reported that Intel is changing its processor development strategy due to a combination of factors including the slowing of Moore’s Law and the difficulty in creating a smaller lithography node to create processors. Intel was on their tick-tock strategy for around a decade, alternating between smaller nodes and new microarchitecture designs to give performance increases every cycle (or half-cycle). Tick-tock was well received and provided Intel and its investors with a steady expectation and revenue stream when the new product delivered and if it met expectation. When Intel hit several bumps with 14nm, tick-tock became an extended 'tiiiick-toock', slowly lengthening out the time between updates. Then this year Intel said that, for the CPU product line based on the Core microarchitecture family at least, would move to ‘Process-Architecture-Optimization’, or a three-stage cycle for 14nm (the current node) and 10nm (the next node).
On the mainstream product segment, this means that the 14nm family, originally featuring Broadwell (tick) and Skylake (tock), will become Broadwell (process), Skylake (architecture) and Kaby Lake (optimization). The level of ‘optimization’ that Kaby Lake will provide is unknown at this point, but what used to be a 24-month cycle can now become a 36-month cycle very easily.
But it is not immediately obvious what this means to the enterprise segment. One would naturally expect the segment to follow the PAO implementation, albeit slower. Here’s Intel’s potential trick for the future: depending on the level of ‘optimization’ in the final stage of the cycle, the enterprise segment has the potential to just bypass and ignore it, keeping the cycle length the same and giving Intel an opportunity to realign the microarchitectures. The net product would be 36 month cycles, spanning 3 product generations at the consumer level and 2 product generations at the enterprise/HEDT level.
That being said, it’s a little bit of conjecture. We have spoken to some senior members of Intel about this, and it was acknowledged that it could be a potential strategy, however as expected nothing like this would be confirmed in a casual conversation even if it was decided at a senior level. It will make an interesting point when the enterprise market rolls around to Skylake-E and Skylake-EP based cores and beyond, if Kaby Lake-E will be a ‘thing’ or not.
Broadwell-E Conclusion
Intel’s latest Broadwell-E platform is the next iteration of their high-end desktop strategy, which involves bringing the low-to-mid range professional processors into the consumer market and adding a few features (such as overclocking), but removing others (ECC). For this launch, Intel introduced four processors, ranging from six cores to ten cores and varying in price from $434 to $1723.
At AnandTech we have tested Intel’s Broadwell cores before, both in our Broadwell desktop processor review of the Core i7-5775C and the professional level Broadwell-EP Xeon E5-2600 v4 processor review. We noted a 3-5% increase in clock-per-clock performance compared to the previous generation ‘Haswell’ parts at the time. This review tests all the new Broadwell-E parts for direct comparison to the Haswell parts.
Performance
The move from Haswell-E to Broadwell-E is a change from 22nm to 14nm process technology but the microarchitecture is mostly the same, barring minor adjustments. These adjustments include an improved memory controller (now qualified on DDR4-2400), a faster divider, slightly improved branch prediction, a slightly larger scheduler, and a reduction in AVX multiply latency from 5 cycles to 3 cycles.
Due to this, the performance of the new Broadwell-E parts is somewhat predictable. Adding more cores and adjusting for frequency is a good marker, as is adjusting for the new memory speed. That means a move from the i7-5960X to the i7-6950X gives two more cores at the same frequency, or about 25% more performance. The downside of this upgrade is the price: the i7-5960X was launched at $999/$1049, whereas the new i7-6950X is $1723. That’s a big price increase by any standard.
Turbo Boost Max 3.0: A Troubled Implementation
For Broadwell-E, Intel introduced a new technology called Turbo Boost Max 3.0. With an appropriate driver, BIOS, BIOS settings, and software, this allows the system to pin a single threaded program to the best performing single core at a higher-than-listed frequency. It sounds as if it has potential, but the implementation means that very few users will ever see it.
Firstly, the driver/software implementation is perhaps easily overcome when the driver gets pushed through Windows 10 updates, similar to Speed Shift on Skylake processors which is now fully active. The part where it breaks down is in the BIOS and BIOS settings requirements. Ultimately the BIOS controls which P-states are in play (when the OS selects them), but the BIOS settings can override anything the processor might want by default. Because TBM3 involves an increase in frequency, this requires a number of settings in the BIOS to be enabled. But, because each processor is different, motherboard manufacturers are most likely going to run these options at a very conservative value so none of their users have a bad experience. In the end, whether it's used is going to depend on if the motherboard manufacturers enable it in the first place. In the motherboard we tested, we were told that it was a management decision to have it disabled by default. Because most users never touch the BIOS, especially in a prosumer/professional markets, it will most likely never be used in this case.
We didn’t get time to run a full benchmark suite with TBM 3.0 enabled, and will most likely follow up to see where in our tests it can make the most difference.
Market
The pricing will be prohibitive to most. Many enthusiasts who have played in the HEDT space for a number of years are used to the $999/$1049 price point for the most expensive processor, even when the number of cores has increased. However, this time Intel has decided to increase the top chip's cost by almost 70%. This has complications as to what product is best for prosumers looking to upgrade.
For $1721, if a user wants to invest in the i7-6950X but does not want the overclocking, they can invest in either the 14-core E5-2680 v4 for $1745 giving 40% more cores at a lower power with a slight decrease in frequency, or get double the cores in a 2P system and using the E5-2640 v4 processor: a 10-core 2.4 GHz/3.4 GHz part, running at 90W, for $939. Two of these runs a $1878, which is slightly more but having double the cores available might be the more important thing here. However because these CPUs are not often found at retail, it means that users may have to approach a system builder/integrator in order to source them.
One would assume that Intel is interested in retaining the long term HEDT hold-outs still on Nehalem, Westmere and Sandy Bridge-E processors. These prices (and the overclocking performance) might make these users feel that they should hold on another generation, or invest in Haswell-E. That being said, the low-end Broadwell-E pricing is higher than that of the low-end Haswell-E, which will extend the pricing gap between the mainstream and the high-end desktop platform.