Comments Locked

112 Comments

Back to Article

  • iwod - Thursday, March 31, 2016 - link

    Maximum memory still 768GB?
    What happen to the 5.1Ghz Xeon E5?
  • Ian Cutress - Thursday, March 31, 2016 - link

    I never saw anyone with a confirmed source for that, making me think it's a faked rumor. I'll happily be proved wrong, but nothing like a 5.1 GHz part was announced today.
  • Brutalizer - Saturday, April 2, 2016 - link

    It would have been interesting to bench to the best cpu today, the SPARC M7. For instance:

    -SAP: two M7 cpu scores 169.000 saps vs 109.000 saps for two of this Broadwell-EP cpus

    -Hadoop, sort 10TB data: one SPARC M7 server with four cpus, finishes the sort in 4,260 seconds. Whereas a cluster of 32 PCs equipped with dual E5-2680v2 finishes in 1,054 seconds, i.e. 64 Intel Xeon cpus vs four SPARC M7 cpus.

    -TPC-C: one SPARC M7 server with one cpu gets 5,000,000 tpm, whereas one server with two E5-2699v3 cpus gets 3.600.000 tpm

    -Memory bandwidth, Stream triad: one SPARC M7 reaches 145 GB/sec, whereas two of these Broadwell-EP cpus reaches 119GB/sec

    -etc. All these benchmarks can be found here, and another 25ish benchmarks where SPARC M7 is 2-3x faster than E5-2699v3 or POWER8 (all the way up to 11x faster):
    https://blogs.oracle.com/BestPerf/entry/20151025_s...
  • Brutalizer - Saturday, April 2, 2016 - link

    BTW, all these SPARC M7 benchmarks are almost unaffected if encryption is turned on, maybe 2-5% slower. Whereas if you turn on encryption for x86 and POWER8, expect performance to halve or even less. Just check the benchmarks on the link above, and you will see that SPARC M7 benchmarks are almost unaffected encrypted or not.
  • JohanAnandtech - Saturday, April 2, 2016 - link

    "if you turn on encryption for x86 and POWER8, expect performance to halve or even less". And this is based upon what measurement? from my measurements, both x86 and POWER8 loose like 1-3% when AES encryption is on. RSA might be a bit worse (2-10%), but asymetric encryption is mostly used to open connections.
  • Brutalizer - Wednesday, April 6, 2016 - link

    If we talk about how encryption affects performance, lets look at this benchmark below. Never mind the x86 is slower than the SPARC M7, let us instead look at how encryption affects the cpus. What performance hit has encryption?
    https://blogs.oracle.com/BestPerf/entry/20160315_t...

    -For x86 we see that two E5-2699v3 cpus utilization goes from 40% without crypto, up to 80% with crypto. This leaves the x86 server with very little headroom to do anything else than executing one query. At the same time, the x86 server took 25-30% longer time to process the query. This shows that encryption has a huge impact on x86. You can not do useful work with two x86 cpus, except executing a query. If you need to do additional work, get four x86 xeons instead.

    -If we look at how SPARC M7 gets affected by encryption, we see that cpu utilization went up from 30% up to 40%. So you have lot of headroom to do additional work while processing the query. At the same time, the SPARC cpu took 2% longer time to process the query.

    It is not really interesting that this single SPARC M7 cpu is 30% faster than two E5-2699v3 in absolute numbers. No, we are looking at how much worse the performance gets affected when we turn on encryption. In case of x86, we see that the cpus gets twice the load, so they are almost fully loaded, only by turning on encryption. At the same time taking longer time to process the work. Ergo, you can not do any additional work with x86 with crypto. With SPARC, it ends up with 40% cpu utilization so you can do additional work on SPARC, and process time does not increase at all (2%). This proves that x86 encryption halves performance or worse.

    For your own AES encryption benchmark, you should also see how much cpu utilization goes up. If it gets fully loaded, you can not do any useful work except handling encryption. So you need an additional cpu to do the actual work.
  • JohanAnandtech - Saturday, April 2, 2016 - link

    Two M7 machines start at 90k, while a dual Xeon is around 20k. And most of those Oracle are very intellectually dishonest: complicated configurations to get the best out of the M7 machines, midrange older x86 configurations (10-core E5 v2, really???)
  • Brutalizer - Wednesday, April 6, 2016 - link

    The "dishonest" benchmarks from Oracle, are often (always?) using what is published. If for instance, IBM only has one published benchmark, then Oracle has no other choice than use it, right? Of course when there are faster IBM benchmarks out there, Oracle use that. Same with x86. In all these 25ish cases we see that SPARC M7 is 2-3x faster, all the way up to 11x faster. The benhcmarks vary very much, raw compute power, databases, deep learning, SAP, etc etc
  • Phil_Oracle - Thursday, May 12, 2016 - link

    I disagree Johan! You don't appear to know much about the new SPARC M7 systems and suggest you do a full evaluation before making such remarks. A SPARC T7-1 with 32-cores has a list price of about $39K outperforms a 2-socket 36-core E5-2699v3 anywhere from 38% (OLTP HammerDB) to over 8x faster (OLTP w/ in-memory analytics). A similarly configured *enterprise* class 2-socket 36-core E5-2699v3 from HPE or Cisco lists for $25K+, so in terms of price/performance, the SPARC T7-1 beats the 2-socket E5-2699v3. And if you take into account SW that’s licensed per core, the SPARC M7 is 60% to 2.6x faster/core, dramatically lowering licensing costs. With the new E5-2699v4, providing ~20% more cores at roughly the same price, gets closer, but with performance/core not changing much with E5 v4, SPARC M7 still has a huge lead. And the difference is while the E5 v3/v4 chips don't scale beyond 2-socket, you can get an SPARC M7 system up to 16-sockets with the almost identical price/performance of the 1-socket system.
  • adamod - Friday, June 3, 2016 - link

    BUT CAN IT PLAY CRYSIS?????
  • PowerOfFacts - Thursday, June 23, 2016 - link

    And now Oracle marketing speaks. Their HammerDB results are bogus. Oracle continues to site socket results when the majority of the world has moved on to per core results. They cite the results from a 32 core HammerDB then compare it to a 1 chip (1/2 of 1 socket) POWER8 because Phil has a hard-on for how "HE" believes IBM has packaged the processor and similarly chooses an Intel configuration to ensure "THEY" get the result they want. Phil & Oracle (appear) to always speak with forked tongue.
  • patrickjp93 - Sunday, April 3, 2016 - link

    "Best" only at specific scale-up workloads. There's a reason Sparc is not particularly popular for clusters and supercomputing (and it's NOT software compatibility). It sucks at a lot of workloads when compared to x86. As for the SAP benchmarks, that's to be expected since x86 doesn't yet support transactional memories. That changes with Skylake Purley though.
  • Brutalizer - Wednesday, April 6, 2016 - link

    In these 25ish benchmarks, the SPARC M7 is 2-3x faster on all kinds of workloads, not just some specific scale up workloads. The reason SPARC M7 is not popular for clusters (supercomputers are clusters) is not because of low raw compute performance, it is because of cost and wattage. The M7 is much more expensive than x86, and draws much more power. I guess somewhere 250 watt or so? M7 are in big enterprise servers, some have water cooling, etc. Whereas clusters have many cheap nodes, with no water cooling.

    Clusters can have x86 because the highest wattage x86 cpu, uses 140 watt or so. Not more. So it would be feasible to use 140 watt cpus in clusters. But not 250 watt cpus, they draw too much power.

    For instance, the IBM Blue Gene supercomputer that hold spot nr 5 in top500 for a couple of years, used 850 MHz powerpc cpus, when everyone else used 2.4 GHz x86 or so. The 850 MHz cpu dont use lot of power, so that is the reason it was used in Blue Gene, not because it was faster (it wasnt). A large supercomputer can draw 10 MegaWatt, and that costs very much. Power is a huge issue in super computers. SPARC M7 draws too much power to be useful in a large cluster, and costs too much.

    If we talk about raw compute power for SPARC M7, it reaches 1200 SPECint2006, whereas E5-2699v3 reaches 715 SPECint2006. Not really 2-3x faster, but still much faster.
    In SPECfp2006, the M7 reaches 832, whereas the E5-2699v3 reach 474.
    https://blogs.oracle.com/BestPerf/entry/201510_spe...

    So, as you can see yourself, the SPARC M7 is faster on scale-up business workloads (it was designed for that type of workloads) and also faster on raw compute power. And faster in everything in between. Just look at the wide diversity among these 25 ish benchmarks.
  • Brutalizer - Wednesday, April 6, 2016 - link

    BTW, do you really expect a 150 watt x86 cpu, to outperform a 250 watt SPARC M7 cpu? Have you seen benchmarks where they compare 250 watt graphics card vs a 150 watt graphics card? Which GPU do you think is faster? Do you expect a 150 watt GPU to outperform a 250 watt gpu?

    The SPARC M7 has 50% more cores, twice the cpu cache, twice the GHz, twice the Wattage, twice the RAM bandwidth, twice the nr of transistors (10 billions) - and you are surprised it is 2-3x faster than x86?

    BTW, the SPARC M7 has stronger cores than x86. If you look at all these benchmarks, typically one M7 with 32 cores, is faster than two E5-2699v3 with 2x18 = 36 cores. This must mean that one SPARC M7 core, packs more punch than a E5-2699v3 core, because 32 SPARC cores are faster than 36 x86 cores in all benchmarks.
  • adamod - Friday, June 3, 2016 - link

    i know this is an old post but i am confused (this isnt something i have learned much about yet) i am hoping you can help some...if the sparc has 2 to 3x performance and is 250w compared to 140w then wouldnt that make it MORE efficient? and if you need two 2699's to compare to a sparc m7 then wouldnt that be 280w, more than the 250w of the xeons? i realize there are other factors here but this doesnt make sense to me. also yea there are graphics cards that are a lower wattage and perform better...i am an AMD fan but nvidia has had some faster cards with better performance in the past...i have an R9 280X, a mid grade card rated at i believe 225w, kinda crazy when it can get beaten by 17w nvidia cards
  • tqth - Sunday, April 3, 2016 - link

    The SPARC and POWER servers are for people with unlimited pocket where compactness and reliability worth the premium it's spent on. If you have to ask how much it costs, you'd probably can't afford it.
    Xeons are commodity hardware where you could purchase the best bang for your buck.
    They are not aiming at the same market. Most software wouldn't even work on both system.
    Besides, benchmarks are worthless - unless the performance of the specific software is tested. And that's rare.
  • PowerOfFacts - Thursday, June 23, 2016 - link

    Depends on which Xeon processors you are referring to. The latest Broadwell EP & EX chips can cost over $7K each. Well on par if not exceeding POWER8 chips and definitely more than OpenPOWER chips. Times are changing. Intel has milked their clients for a long time feeding them the marketing line of open, commodity & low cost. They are no longer open buying up ecosystem integrating into the silicone, what exactly does commodity mean anyway and as low cost goes ... as I just said, pretty salty.
  • yuhong - Thursday, March 31, 2016 - link

    64GB LR-DIMMs will probably not come out at reasonable prices until 8Gbit DDR4 is more mainstream.
  • iwod - Thursday, March 31, 2016 - link

    I thought Samsung announced a 128GB DIMM with some type of 3D / TSV RAM.
  • Casper42 - Thursday, March 31, 2016 - link

    Not shipping just yet though.
    Should be sometime this year though.
  • Casper42 - Thursday, March 31, 2016 - link

    HPE just dropped the 64GB LRDIMMs a week or two back.
    They are now exactly 2x the 32GB LRDIMM as far as List Price goes.
    LRDIMMs across the board are 31% more expensive than RDIMMs.
  • wishgranter - Tuesday, April 5, 2016 - link

    http://www.techpowerup.com/221459/samsung-starts-m...
  • wishgranter - Tuesday, April 5, 2016 - link

    While introducing a wide array of 10nm-class DDR4 modules with capacities ranging from 4GB for notebook PCs to 128GB for enterprise servers, Samsung will be extending its 20nm DRAM line-up with its new 10nm-class DRAM portfolio throughout the year.
  • nathanddrews - Thursday, March 31, 2016 - link

    Perf/W is obviously a very exciting metric for server farmers and it generally exciting from a basic technology perspective, but it's absolute performance isn't amazing. Anyway, it's not like I'll be buying one anyway. LOL
  • asendra - Thursday, March 31, 2016 - link

    This interest me in so far as this would be the updated processors in a supposedly-coming-this-year Mac Pro refresh. Not that I would personally fork that much cash, but I'm interested to see how much of a jump they will make.

    But things seam rather bleak. No wonder they decided to wait 3 years for a refresh.
  • MrSpadge - Thursday, March 31, 2016 - link

    Not sure which years you're counting in, but for the majority of us it takes 1.5 years from 09/2014 to today.
    https://en.wikipedia.org/wiki/Haswell_%28microarch...
  • asendra - Thursday, March 31, 2016 - link

    Apple didn't update the MacPros with Haswell-EP. They are still using Ivy Bridge
  • tipoo - Thursday, March 31, 2016 - link


    Wonder what they'll do on the GPU side though. Too early for next generation 14nm FF GPUs from anyone, if Nvidia was even a choice due to OpenCL politics. Another GCN 1.0 part in 2016 would be...A bag of hurt.

    Still waiting on the high end 15" rMBP to have something better than GCN 1.0...The performance, shockingly, hasn't come all that far from even my Iris Pro model. Maybe double, which is something, but I'd like larger than that to upgrade from integrated...
  • extide - Thursday, March 31, 2016 - link

    Nah, if they refresh it late this year, like in august or something like that, then 14/16nm FF GPU's will be available.

    At worst we would get GCN 1.2, but yeah it would suck to see 28nm GPU's put in there...
  • mdriftmeyer - Thursday, March 31, 2016 - link

    On what planet do you not grasp FinFET 14nm end of June from AMD?
  • Kevin G - Thursday, March 31, 2016 - link

    Much like how Apple skipped Haswell-EP, they also skipped a generation of cards from AMD and nVidia. So even if Apple doesn't wait for new GPUs, their is certainly an update on the GPU side.

    The more interesting possibility would be if Apple were to go with Xeon D in the Mac Pro instead of Broadwell-EP. Apple would need a big PLX chip considering the number of lanes they's want to use but it is possible.
  • bill.rookard - Thursday, March 31, 2016 - link

    Another issue is that they're not under any pressure from any competition to really innovate. I don't even remember the last time I read anything about Opteron servers... let alone something about any NEW Opterons.
  • ComputerGuy2006 - Thursday, March 31, 2016 - link

    A sign of things to come for Broadwell-e?

    Seems like a tricky situation. Because skylake-e will come with a new platform in 2017, while broadwell-e isn't the fastest IPC and there are crazy rumors it will might cost $1500 (lol Intel). We also have Zen later this year that might give good performance with good cost/perf ratio.
  • extide - Thursday, March 31, 2016 - link

    Yeah so Intel only gives us the LCC part for the -E platform, so we will see the 10-core SKU as the top, It will either be $1000, or $1500 ... so yeah not sure how that will end up. Although there will be 8 and 6 core options that should be pretty affordable.

    Hopefully they do an 8 core part with 28 lanes for under $500, as THAT would be a great deal!
  • dragonsqrrl - Sunday, April 3, 2016 - link

    I'm hoping the 8 core SKU is around $600, the position the x930K traditionally occupies. What makes me a little worried is that there will be 4 SKUs instead of 3 this time (one 10 core, one 8 core, and two 6 core), and I'm not sure there's enough room under the $600 price point for two 6 core processors.
  • jasonelmore - Thursday, March 31, 2016 - link

    Can it run Star Citizen?
  • theduckofdeath - Thursday, March 31, 2016 - link

    A question we'll never get an answer to? :D
  • JohanAnandtech - Friday, April 1, 2016 - link

    It probably runs mostly on Xeons. Well, the back end that is :-)
  • extide - Thursday, March 31, 2016 - link

    BOOM, 454mm^2 on the worlds best process. The "other" 14/16nm processes use bigger geometry than Intel's 14nm process.

    Now we just need those other guys to catch up so we can see 450+mm GPU's!
  • Kevin G - Thursday, March 31, 2016 - link

    Intel still has plenty of room to increase die size. The largest chip they've produced was the Tukwila Itanium 2 at 699 mm^2. Granted that was a 65 nm design but Haswell-EX is a juggarnaught at 662 mm^2 on Intel's more recent 22 nm process. Seems reasonable that SkyLake-EX could go to 32 cores as Intel has >200 mm^2 of rectal limit left.

    As for GPU's, they're also huge. nVidia's GM200 is 601 mm^2 and AMD's Fiji is 'only' 596 mm^2 both on 28 nm process. TSMC's 20 nm process was skipped so even using the looser 16 nm FinFET, GPU's will see a significant shrink compared to the those high end chips.
  • patrickjp93 - Friday, April 1, 2016 - link

    Knight's Landing: 730 mm^2, also on the 14nm platform
  • extide - Friday, April 1, 2016 - link

    Is it really that big..? Wow, I knew it was big, but didn't know it was that big. Got a source on that?
  • Kevin G - Friday, April 8, 2016 - link

    I'll second a link for a source. I knew it'd be big but that big?
  • extide - Friday, April 1, 2016 - link

    I know you meant Reticle, but that was a pretty funny typo, heh.
  • Kevin G - Friday, April 8, 2016 - link

    Autocorrect has gotten the best of me yet again.
  • extide - Friday, April 1, 2016 - link

    And, I know how big GM200 and Fiji are, but I am talking about big GPU's on 14/16nm. All signs are currently pointing to <300mm^2 for the first round of 14/16nm GPU's.
  • lorribot - Thursday, March 31, 2016 - link

    Given the way Microsoft and others are now licensing by the core and in large non splitable packages (Windows 2016 Datacenter is in blocks of 16 cores, a dual socket server with 44 cores would need 48 core licences) the increasing core count has limited appeal over small numbers of faster cores when looking at virtualised environments.
    Those still in the physical world will still have to pay per core but may have to buy 4 std Windows licenses.
    when it comes to doing your testing, it should reflect these costs and compare total bang per buck when dealing with performance.
    Red Hat still licences per socket but don't be surprised if they go per core too.
  • JohanAnandtech - Friday, April 1, 2016 - link

    Back in 2008, I had a sales person explaining the license models of Microsoft to me in our lab. From that point on, we have invested most of our time and resources in linux server software. :-D
  • extide - Friday, April 1, 2016 - link

    Enterprise linux isn't free, either ya know
  • rahvin - Friday, April 1, 2016 - link

    Support isn't free on the FOSS side but the software is. Redhat is never going to charge more per "cores" for support, that's ridiculous and would result in rivals stealing their support contracts. If licensing costs are that bad that you are dumping hardware you really should be looking at moving services to Linux and Visualizing the windows servers so you can limit the core count and provide more horsepower.

    Anyone putting Microsoft on bare hardware these days is nuts. Although the consolation is that they get to pay MS's exorbitant tax on software. Linux should be the core component of any IT services and virtualized servers where you need proprietary server software.
  • SkipPerk - Friday, April 8, 2016 - link

    "Anyone putting Microsoft on bare hardware these days is nuts"

    This brother is speakin the truth!
  • warreo - Thursday, March 31, 2016 - link

    Can someone clarify this line for me?

    "The average performance increase versus the Xeon E5-2690 is 3%, and the Broadwell cores get a boost of no less than 19%."

    Does that mean IPC increase is 19% for Broadwell, offset by ~16% decline in clockspeed to get to 3% average performance increase? But that doesn't make sense to me as a 3.8ghz (E5-2690) to 3.6ghz (E5-2699 v4) is only 5% decline in max clockspeed?
  • ShieTar - Thursday, March 31, 2016 - link

    I understood it as "the -Ofast setting boosts Broadwell by 19%", so with the -O2 setting it was actually 16% slower than the 2690.

    And I think the AT-Theory based on the original measurements is that the 3.6GHz boost are not even held for a significant amount of time, so that Broadwell in reality comes with an even worse decline in clock speed.
  • warreo - Thursday, March 31, 2016 - link

    Your interpretation makes much more sense than mine, but still doesn't quite add up. The improvement from using -Ofast vs. -O2 is 13% on average, and the lowest improvement is 4% on the xalancbmk, well below the "no less than 19%" quoted by Johan.

    Perhaps the rest of the disparity is normalizing for sustained clock speeds as you suspect? Johan is that correct?
  • Ryan Smith - Thursday, March 31, 2016 - link

    I've reworded that passage to make it clearer. But ShieTar's interpretation was basically correct.

    "Switching from -O2 to -Ofast improves Broadwell-EP's absolute performance by over 19%. Meanwhile the relative performance advantage versus the Xeon E5-2690 averages 3%. "
  • JohanAnandtech - Thursday, March 31, 2016 - link

    That means that the -ofast has much more effect on the Broadwell. I mean by that that -ofast is 19% faster than -o2 on Broadwell, while it is 3% faster on Sandy Bridge. I assume that the older the architecture, the better the compiler is able to optimize it without special tricks.
  • warreo - Friday, April 1, 2016 - link

    Thanks for the clarification. Loved the review, great work Johan!
  • Pinn - Thursday, March 31, 2016 - link

    I'm still happy I went with the 6 core x99 over the 8 core. Massive core count is nice to see available, but I don't see the true value. Looks like you have to do the same rough math to see if the clock speed reduction is worth the core count.
  • Oxford Guy - Tuesday, April 5, 2016 - link

    Why would there be "true value" for six and not for eight?
  • Pinn - Wednesday, April 6, 2016 - link

    Single threaded workloads.
  • jhh - Thursday, March 31, 2016 - link

    The article says TSX-NI is supported on the E5, but if one looks at Intel ARK, it say it's not. Do the processors say they support TSX-NI? Or is this another one of the things which will be left for the E7?
  • JohanAnandtech - Friday, April 1, 2016 - link

    Intel's official slides say: "supports TSX". All SKUs, no exceptions.
  • Oxford Guy - Thursday, March 31, 2016 - link

    Bigger, badder, still obsolete cores.
  • patrickjp93 - Friday, April 1, 2016 - link

    Obsolete? Troll.
  • Oxford Guy - Tuesday, April 5, 2016 - link

    Unlike you, propagandist, I know what Skylake is.
  • benzosaurus - Thursday, March 31, 2016 - link

    "You can replace a dual Xeon 5680 with one Xeon E5-2699 v4 and almost double your performance while halving the CPU power consumption."

    I mean you can, but you can buy 4 X5680s for a quarter the price of a single E5-2699v4. It takes a lot of power savings to make that worthwhile. The pricing in the server market's always seemed weirdly non-linear to me.
  • warreo - Friday, April 1, 2016 - link

    Presumably, it's not just about TCO. Space is at a premium in a datacenter, and so being able to fit more performance per sq ft also warrants a higher price, just like how notebook parts have historically been more expensive than their desktop equivalents.
  • ShieTar - Friday, April 1, 2016 - link

    But you don't get 4 1366-Systems for the price of one 2011-3 System. Depending on your Memory, Storage and Interconnect Needs, even two full Systems based on the Xeon 5680 may cost you more than one system based on the E5-2699 v4. One less Infiniband-Adapter can easily save you 500$ in Hardware.

    And you are not only halving the CPU power consumption, but also the power consumption of the rest of the system that you no longer use, so instead of 140W you are saving probably at least 200W per System, which can already add up to more than 1k$ in electricity and cooling bills for a 24/7 machine running for 3 years.

    And last, but by no means least, less parts means less space, less chance for failure, less maintenance effort. If you happily waste a few hours here or there to maintain your own workstation, you don't do the math, but if you have to pay somebody to do it, salaries matter quickly. With an MTBF for an entire server rarely being much higher than 40.000, and recovery/repair easily taking you a person-day of work, each system generates about 1.7 hours of work per year. Cost of work (it's more than salaries, of course) probably comes up to 100$ for a skilled technical administrator, thus producing another 500$ over 3 years of added operational cost.

    And of course, space matters as well. If your data center is filled, it can be more cost effective to replace the old CPUs with new expensive ones, rather than build a new facility to fill with more old Systems.

    If you add it all up, I doubt you can get a System with an Xeon 5680 and operate it over 3 years for anything below 20.000$. So going from two 20.000$-Systems to a single 24.000$ Dollar System (because of an extra 4000$ for the big CPU) should save you a lot of money in the long run.
  • JohanAnandtech - Friday, April 1, 2016 - link

    Where do you get your pricing info from? I can not imagine that server vendors still sell X5680s.
  • extide - Friday, April 1, 2016 - link

    Yeah, if you go used. No enterprise sysadmin worth his salt is ever going to put used gear that is not in warranty, and in support into production.
  • ltcommanderdata - Friday, April 1, 2016 - link

    Does anyone know the Windows support situation for Broadwell-EP for workstation use? Microsoft said Broadwell is the last fully supported processor for Windows 7/8.1 with Skylake getting transitional support and Kaby Lake will not be supported. So how does Broadwell-EP fit in? Is it lumped in with Broadwell and is fully supported or will it be treated like Skylake with temporary support until 2018 and only critical security updates after that? And following on will Skylake-EP see any Windows 7/8.1 support at all or will it not be supported since it'll presumably be released after Kaby Lake?
  • extide - Friday, April 1, 2016 - link

    When MS says they are not supporting Skylake on Windows 7 DOES NOT MEAN it won't work. It just means they are not going to add any specific support for that processor in the older OS's. They are not adding in the speed shift support, essentially.

    For some reason the press has not made this very clear, and many people are freaking out thinking that there will be a hard break here will stuff will straight up not work. That is not the case.

    Broadwell has no new OS level features over Haswell (unlike Skylake with speed shift) so there is nothing special about Broadwell to the OS. As the poster above mentions, they are all x86 cpu's and will all still work with x86 OS's.

    The difference here is between "Fully Supported" and Compatible. Skylake and even Kaby Lake will be compatible with WIndows 7/8/8.1.
  • aryonoco - Friday, April 1, 2016 - link

    Johan, this is yet again by far the best Enterprise CPU benchmark that's available anywhere on the net.

    Thank you for your detailed, scientific and well documented work. Works like this are not easy, I can only imagine how many man hours (weeks?) compiling this article must have taken. I just want you to know that it's hugely appreciated.
  • JohanAnandtech - Friday, April 1, 2016 - link

    Great to read this after weeks of hard work! :-D
  • fsdjmellisse - Friday, April 1, 2016 - link

    hello, i want to buy E5-2630L v4
    any one can give me website for buy it ?

    Best regards
  • HrD - Friday, April 1, 2016 - link

    I'm confused by the following:

    "The following compiler switches were used on icc:

    -fast -openmp -parallel

    The results are expressed in GB per second. The following compiler switches were used on icc:

    -O3 –fopenmp –static"

    Shouldn't one of these refer to icc and the other to gcc?
  • JohanAnandtech - Friday, April 1, 2016 - link

    Pretty sure I did not mix them up. "-fast" does not work on gcc neither does -fopenmp work on icc.
  • patrickjp93 - Friday, April 1, 2016 - link

    Um, wrong and wrong. -Ofast works with GCC 4.9 and later for sure. And -fopenmp is a valid ICC flag post-ICC 13.
  • JohanAnandtech - Saturday, April 2, 2016 - link

    "-fast" is a typical icc flag. (I did not write -"Ofast" that works on gcc 4.8 too)
  • extide - Friday, April 1, 2016 - link

    Johan, if you read the comment, you can see that you mention icc for BOTH.
  • JohanAnandtech - Saturday, April 2, 2016 - link

    Ok, thanks, time to sleep a little longer. I have fixed the error.
  • xrror - Friday, April 1, 2016 - link

    It's depressing to see the mobile-first design philosophy really gutting into the last bastion of x86 performance.

    I mean I get it - a 22 (20) core xeon wouldn't even exist without the aggressive power management tech needed to keep it from melting or needing exotic cooling. But it's still depressing to see ALL of the arch improvements immediately negated with lowered clock speeds, or worse "turbo speeds" you will never actually see once the machine is running production loads.

    The engineering behind these big core count chips though is always very impressive. Also did Intel ever say how they "fixed" TSX?
  • FunBunny2 - Friday, April 1, 2016 - link

    "It's depressing to see the mobile-first design philosophy really gutting into the last bastion of x86 performance."

    welcome to the world of laissez faire capitalism: do what makes the most money today, irregardless of future consequences. used to be, Intel could rely on M$ making the next versions of Windoze and Office impossible to run on existing Pentiums, thus driving sales of the next Pentium (a whole machine, at that). these days it's up to gamers and data centres. not taking any bets on which turns out to be in the driver's seat.
  • xrror - Friday, April 1, 2016 - link

    Well, considering that "computer gaming" has degraded to whatever the kids are running on their smartphones, or the parent's tablet I'm not hopeful for any new resurgence in demand for high performance PC's in the mass market.

    So the future consequences for Intel prioritizing power efficiency over performance, or possibly developing a separate fabrication tech for performance is... likely not very much. So there really is no "future consequence" for Intel. Sure they could go out and actually try and make a 10Ghz 9nm part possible, but nobody in 2020 would buy it because... it probably would go into whatever iDevice they care about. And HPC market I dunno. Maybe if it datamines marketing data faster or can microtrade on the stock market faster or something. meh.

    The general public really doesn't care about performance anymore (honestly, they may never have), only how portable it is and if a device is good enough to run their stuff on the go.

    The high end market like these multi-core xeons though, is strange because you'd think this is where Intel would go all in, but I guess when your only competitors are IBM Power and (currently non-competitive) AMD I dunno...

    I mean it's sad, even Intel has to beg to justify it's R&D expenses to shareholders - which is stupid because Intel's R&D is one of it's biggest strengths. But such as it is. Apr 1 rant over ;)
  • abufrejoval - Friday, April 1, 2016 - link

    Johan, you keep bemoaning the fact that lack of competition seems to stop "real progress" and I wonder where you expect that progress to happen.

    More specifically you seem to desire more GHz and I can understand that desire, which may originate from that crazy 40MHz to 4GHz rush we all experienced somewhere in the decade starting in the mid nineties.

    I understand the emotion, but I wonder how it fits the scientific mind I see everywhere else in your work, because 8, 16 or 32 GHz is simply not going to happen, competition or not.

    Sure 8GHz are possible, you can even purchase 5GHz off the shelves. But it simply doesn't deliver in terms of Oomp/$. And Web Scale is all about value/€ and the main driver of server evolution today.

    We'll still see radical speedups where it counts, but it will have to be via special purpose function blocks either on SoCs, or by adding a couple of extra instructions or by doing something as radical as Micron's Automata Processor.

    But general purpose von Neumann has hit the Gigahertz wall years ago and nothing can change that except a different model of compute.

    I liked the reference to Andreas Stiller, but I'm not sure everybody here has a subscription to c't like I do since the early 1990's. There could also be the tiny issue that not everyone outside Belgium is quadrilingual.

    Make no mistake: I love your work! It's a pleasure to read for form, style and the content!
  • The Von Matrices - Saturday, April 2, 2016 - link

    Any indication of the QPI speed of these chips? Did Intel increase it from the 9.6 GT/s in Haswell-EP?
  • Ian Cutress - Saturday, April 2, 2016 - link

    Most of the high end are 9.6 GT/s. https://twitter.com/IanCutress/status/715582714099...
  • watersb - Saturday, April 2, 2016 - link

    Johan, this is fantastic work. Thanks very much.

    Any way to address RAS features?
  • isrv - Saturday, April 2, 2016 - link

    well, i'm completely dissapointed.
    web servers wants higher clock speed.
    single-thread load (like PHP) become even slower on those E5v4 due to drop in GHz's.
    still, the best CPU's for that is E3-1290v2, E3-1281v3 (and 1286v3), E3-1280v5, E5-1630v3, E5-1620v2 and the only one 6-core E5-1660v2
    all those are 3.7Ghz (pointless to look at turbo speed since we're under constant 24/7 load).

    i was hoping to at least one 3.8GHz or even higher.

    so no changes here, E5-1660v2 is still the fastest web-server CPU.
    or E5-1630v3 by sacrificing 2 cores for a bit faster memory.
  • patrickjp93 - Sunday, April 3, 2016 - link

    For those 4-8 core chips, the turbo boost is maintainable for 24/7 workloads if your cooling is sufficient. You seem to know far less about this environment than you let on. And who the hell still uses single-threaded PHP? And you're not taking into account better caching algorithms and other architectural improvements that make the 200MHz slower V4 run faster than your V2.
  • isrv - Sunday, April 3, 2016 - link

    i will belive that only after one by one comparison E5-1630v3 vs any of E5v4 composing wordpress front page for example.
    and so far, that's only a words about better caching etc...
  • simplyfabio - Monday, April 4, 2016 - link

    Could I ask one thing here? For a Workstation 3D, both for rendering and graphic/cad, (like illustrator, photoshop, autocad, 3dsmax), could be better have more core like the E5 2690 (considering all the turbo clock speed for each core active) ore better frequency, like the 1680? Thanks a lot to everyone, I can't find a nice review on this side of this CPUs...
  • grantdesrosiers - Monday, April 4, 2016 - link

    Not sure if anyone has pointed it out yet, but I think there is an error on the "Multi-Threaded Integer Performance" page, first graph. The 2695v4 says 22 cores, I believe it should be 18.
  • SanX - Monday, April 4, 2016 - link

    Poor Moore's law for workstations... 10-20% gain per 2-years generation.

    Think about it: there is no reason to upgrade for the next *** 5-10 generations *** or the next 10-20 years (!!!) when the processors will be only e-fold (2.71x) faster.
  • dragonsqrrl - Monday, April 4, 2016 - link

    The problem is your first assumption is already false.
  • Khenglish - Monday, April 4, 2016 - link

    I can't understand why the 4C and under turbo speeds are so slow on the v4 2699. A Broadwell with 55MB of cache being outperformed by a stock clocked Sandy Bridge is ridiculous. Why would this CPU not clock up to at least 4.2GHz with a 4 core workload, and say 4.4GHz for a 1 core workload? Hell it costs over $4000 and a massive TDP. You'd think Intel could take a minute to make the low core count speeds not terribly low.

    My workstation in my lab has a 1650 v3. My workloads peak between 4-8 cores. There is not a single CPU in the v4 lineup that would be an upgrade over the 1650 v3 despite the major power savings of 14nm and the cache size increase due to Intel's inability to set reasonable 8C and under frequencies.
  • Romulous - Monday, April 4, 2016 - link

    People who are serious about recompiling the same software often would probably use ccache and maybe even distcc. So your Linux kernel compile test is really only there for to show potential cpu performance.
  • LHL2500 - Tuesday, April 5, 2016 - link

    "It finds a home in the same LGA 2011-3 socket."
    Not according to Intel's website.
    http://ark.intel.com/compare/91754,81908
    In this comparison between a v3 and a v4 version of a E5-2680, the socket support for the two chips are different. The older version using the the FCLGA2011-3 and the newer version using FCLGA2011.
    So who is right? Anandtech or Intel?
    And it not just this chip. It's all the v4s.
    While I hope it's a typo on Intel's behalf, for now it doesn't look like the v4s are direct upgrades to the v3s. You will apparently need new motherboards.
  • xrror - Tuesday, April 5, 2016 - link

    That... is a bit disconcerting. I also like how "VID Voltage Range" for the v4 parts is simply listed as "0" ...
  • SeanJ76 - Tuesday, April 5, 2016 - link

    My School had the 3rd Generation Xeon's in their Workstations, they were slow as [email protected]!! The consumer i7 4790K/6700K would run laps around these Xeon crap cpus!
  • xrror - Tuesday, April 5, 2016 - link

    Even at 3.3Ghz though, they shouldn't be that slow. I'm taking a guess - if this was a student lab, and they bothered to specifically order xeon (or opteron back in the day) workstations - I'm guessing this was a CAD/CAM lab or something running a boatload of expensive licenced software (like, autodesk, solidworks, etc) and some of that stuff is horrible at thrashing on the hard drive, constantly.

    And I doubt your school could spring the cash for SSD drives in them (because Workstation SKU == you pay dearly OEM workstation 'certified' drive cost).

    This is all guesses though. And not trying to defend - it does suck when you have what should be a sweet machine choking for whatever reason, and you're there trying to get your assignments done and you just want to smash the screen cause it just chhhuuuuuuggggsss... ;p
  • SkipPerk - Friday, April 8, 2016 - link

    I have seen this many times, even in the for-profit sector. I once saw a compute cluster that was choking on server with slow storage. They had a 10 gb network and fast Xeon machines running on flash, but the primary storage was too slow. When they get a proper SAN it was an order of magnitude improvement.

    Back in the day storage was often the bottleneck, but it still comes up today.
  • someonesomewherelse - Thursday, September 1, 2016 - link

    We ran everything in virtual machines with the actual disk images not stored locally.... and the lans in the classrooms were 100mbit, idk about the connection from the classroom to the server with the image. How's that for slow?

    I would have loved it if our stuff was as 'slow' as yours. The wifi in the classrooms was very fast too..... especially since I doubt anyone bothered with turning of their torrents (well I mean it's completely understandable, you are going to watch the new episode of your favorite show once you are back home and not everyone had (well has, but most people can get it now) fth with at least 100Mbit line (ideally symmetrical, but some isps are too gready with ul speeds so 300/50 is cheaper than 100/100...... and good luck getting 1000/1000 on a residential package (the hw isn't the problem since you can get 1000/1000 with a commercial (aka over priced) package..... using the same hw... basically I would just need to sign a new contract, send it back, and enjoy the faster line in 1 business day or less)...well at least there are no bw caps (if I didn't read foreign boards bw caps on non mobile connections would be something I'd think no isp could do and not lose all customers) and there's we have no dmca (or something similar) and afaik no plans for one either (if they tried to pass such a law I can imagine that you'd have enough support for a referendum which you would win with a huge mayority), even better, the methods used to catch people downloading torrents are illegal anyway so any evidence obtained with them or derived from them is inadmissible anyway and just by presenting it you have admitted to several crimes which the police and prosecution are obliged to investigate/prosecute.... copyright infringment however is a civil matter).
  • donwilde1 - Tuesday, April 5, 2016 - link

    One of the more interesting Intel features, in my opinion, is that Broadwell carries an on-board encryption engine with its own interpreter similar to a small-memory, embedded JVM. This enables full Trusted Boot capability, which I view as a necessity in today's hackable world. Would you consider a follow-on article on this? The project was a clean-room development called BeiHai, done in China.
  • JamesAnthony - Wednesday, April 6, 2016 - link

    From what I can tell in looking over the benchmarks, there is not much of an increase in performance at all in core vs core performance speeds going from the V1 CPUs to the V4 CPUs
    As if you look at the benchmarks, and calculate that you are comparing 16 cores to 44 cores, the 44 core setup is not 2.75x faster.

    So while your overall speed goes up, your work accomplished per core is not increasing at the same rate.

    Why does this matter? Well thanks to software licensing costs, as you add cores it gets very expensive quickly. So if your software costs (which can easily exceed the hardware costs very quickly) go up with each core you add, but the work done does not, you quickly wind up in a negative cost / performance ratio.

    For quite a few people the E5-2667 v2 CPU with 8 cores at 3.5 GHz (Turbo 4) comes out around the best value for the software licensing cost.

    So while Intel puts out processors that overall can do more work than the previous ones, the move to per core software licensing is making it a negative value proposition. This is why people keep wanting higher clock speed lower core count processors, but we seem stuck around 3.5 GHz for many years.
  • SkipPerk - Friday, April 8, 2016 - link

    Although you are right for workstations, so much demand is for generic virtualized machines. Many buyers are fine with 2 ghz with as many cores as they can get. They load as little RAM as the spec requires and throw out the cheapest single core, dual thread 2 GB RAM VM they can. This is how call centers work, not to mention many low-level office jobs. They do not care about performance because this is more than enough.

    I have had specialty applications where prosumer 6-core or 8-core CPUs were the better deal (usually liquid cooled and overclocked), but not many buyers are licensing insanely expensive analytical software by the core.
  • SeanJ76 - Sunday, April 10, 2016 - link

    @Xeon chips!! TOTAL GARBAGE!
  • legolasyiu - Wednesday, April 20, 2016 - link

    The ASUS Workstation/Server board with V4 boards are very stable and they have 10% OC. I am very interested how the processor with those boards.
  • Bulat Ziganshin - Saturday, May 7, 2016 - link

    >This increases AES (symmetric) encryption performance by 20-25%

    PCLMULQDQ implements part of Galois Field multiplication and bdw actually improved only GCM part of AES-GCM algo. neither AES nor other popular symmetric encryption algos became faster
  • oceanwave1000 - Monday, May 9, 2016 - link

    This article mentioned that the Broadwell EP e5-v4 family has 3 die configurations. I got the 306mm2 and 454mm2. Did anyone catch the third one?

    Thanks.
  • petar_b - Saturday, August 27, 2016 - link

    Thanks Phil_Oracle, Brutalizer and Anand for this discussion. I have learned a lot from reading different opinions. I am working with IBM and Oracle software products, and from my small experience, Xeons are pathetic when compared to POWER or SPARC. To do same operation at home Xeon it takes 10x more time than what it takes the corporate server to do. I have double memory than corporate server and yet no help from it.
  • someonesomewherelse - Thursday, September 1, 2016 - link

    Btw how locked down are these Xeons and their motherboards in regards to overclocking? Assuming you could provide enough power and cooling could you reach a decent overclock? Obviously nobody is going to do that for mission crittical servers/workstations, but if I had too much money could I get a quad or octa core system with as much cores possible and at least try to overclock them?

Log in

Don't have an account? Sign up now