Not so long ago, the first generation of humans were given just enough intelligence to be fruitful and multiply. Intel needs to breed thinking machines that don't need to rent in Silicon Valley.
The painful part of Intel's current line-up is that you can either have integrated graphics or a decent quantity of PCIe lanes, but not both. And the lanes provided by the chipset all get cannibalised for peripherals. It's very awkward.
That changes with the Skylake Purley platform, so don't sweat that too much. It's not like anyone else is providing that sort of product any time soon either.
I think that's a typo, since it's a more or less drop-in upgrade to the Brickland platform that only has 4-channel memory. Besides, ark also lists it as having 4 channels.
Several smaller cheaper servers introduces networking overhead and in most cases centralized storages. A single system image with equivalent processing power ends up being faster due the removal of this overhead, sometimes by a surprising amount.
The other thing is that these systems support a lot of memory per socket: 24 TB using the largest DIMMs available today an in eight socket configuration. Many production datasets can fit into that amount. Intel is offering quad core chips with full support for this capacity which is interesting from a licensing cost standpoint.
Big in-memory databases are interesting, though I understand it takes something like 15 minutes to load them into memory. Plus NMVeF is blurring local and remote memory.
I guess the problem for these machines though is there aren't many embarrassingly parallel problems out there. We run HPC workloads where I am, and they're best suited to just so many E5s on a very fast network. The jobs are many times too big to fit on one of these.
On the contrary, there are many relevant workloads, from GIS to medical and defense imaging. Just look into the history of the customer base at SGI, those who bought their Origin systems, etc. Hence the existence of the modern UV series (256 sockets atm). Customers were already dealing with multi-GB datasets 20 years ago, and SGI was the first to design something that could load such files in mere seconds (Group Station for Defense Imaging). I'm not sure about the modernUV systems, but the bisection bandwidth of the last-Gen Origin was 512GB/sec (or it might be 1TB/sec if they made the usual 2X larger system for selected customers) and the tech has moved on a lot since then, with new features such as hw MPI offload, etc.
But yes, other loads don't scale well, all depends on the task. Hence the existence of cluster products aswell, and of course the ability to partition a UV into multiple subunits, each of which can be optimised to match the task scalability, while also allowing fast intercommunication between them, aswell as shared access to data, etc. Meanwhile, work goes on to improve scalability methods, eg. an admin at the Cosmos centre told me they're working hard to improve the scaling of various cosmological simulation codes to exploit up to 512 CPUs. In other fields however, GPU acceleration has taken over, but often that needs big data access aswell. It's a mixed bag as usual.
Speaking of which, Ian, re the usual Intel partners, you forgot to mention SGI. There's no doubt they'll be using these new CPUs in their UV range.
One thing I don't get concerning the E7-8893 v4, if it only has 4 cores, why aren't the max Turbo levels much higher? Indeed, the base clock could surely be a lot higher aswell.
I'm also surprised at the E7-8893 v4, wondering if the clocks are actually high -- this article is completely missing the AVX clocks (base, 1C max, and all-core max) as well as the all-core max turbo non-AVX clocks, for all these CPUs.
Not just this article, of course -- the recent ones (E3-15xx v5, E5-x6xx v4) also have too. It's pretty annoying, without knowing these one can't know how many GFLOPS of potential performance are available.
Might even have missed breaking the 1 TFLOP barrier, if the all-core AVX turbo clock on the E7-8890 v4 is 2.7 GHz ...
This could be an instance where intel uses the lower binned chips as a sellable product so it had to keep clock speed down in order to pass validation and whatnot.
The presented data appears to be from ark which doesn't list AVX clock speeds. The good news is that this is less of an issue overall as apparently AVX clock only affect the particular core running AVX code, not the entire chip. While I haven't seen the max clock permitted for AVX code on these E7 v4's, if they are anything like the E5 v4's then the max clock will be the same between AVX and non-AVX code. So while the system can clock down from running AVX code, Intel's implementation this time around is much less of an issue for traditional server scenarios (HPC is another matter).
If you're running an embarrassing parallel job that can't fit into the memory of these E7 systems, then yes it would be more cost effective to go with more nodes and E5 chips. No way around the networking overhead in this scenario so might as well use it to your advantage to lower costs.
Intel's all newly launched lineup of 4800 V4 and 8800 V4 server processors from Broadwell-Ex family and their detailed info structure is similarly given in deep tech terms on http://www.comparecpus.com/en/cpus-from-intel-xeon... . But they have given a rating for different specifications which is really cool to check out. these boosters seem really promising for this roll out. let's see if they really work out as they are supposed to.
Hello! Is there any reference on the die size? "[...] 24 physical cores at 456.12mm2 [...]" -- where is it coming from, and is there anything about the length and width of the die? Could not find it in Intel's datasheets, unfortunately
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
26 Comments
Back to Article
BrokenCrayons - Monday, June 13, 2016 - link
Only the best CPU charts mention world domination as a possible usage scenario.fanofanand - Monday, June 13, 2016 - link
That's because Anandtech is always looking to the future. :)valinor89 - Monday, June 13, 2016 - link
I, For One, Welcome Our New Robot Overlordscreed3020 - Monday, June 13, 2016 - link
Great catch, I missed that lulz... :)d9 - Tuesday, June 21, 2016 - link
Not so long ago, the first generation of humans were given just enough intelligence to be fruitful and multiply. Intel needs to breed thinking machines that don't need to rent in Silicon Valley.pdf - Monday, June 13, 2016 - link
The painful part of Intel's current line-up is that you can either have integrated graphics or a decent quantity of PCIe lanes, but not both. And the lanes provided by the chipset all get cannibalised for peripherals. It's very awkward.patrickjp93 - Monday, June 13, 2016 - link
That changes with the Skylake Purley platform, so don't sweat that too much. It's not like anyone else is providing that sort of product any time soon either.highlnder69 - Monday, June 13, 2016 - link
Wow, 24 cores with only using 7.2 transistors. Couldn't even imagine what Intel could produce if they bumped that up to 8.2 or even 9.2 transistors...bill.rookard - Monday, June 13, 2016 - link
Haha yes, I saw that myself. Pretty impressive engineering. I wonder if it's a new quantum computer? :)SarahKerrigan - Monday, June 13, 2016 - link
BDW-EX has eight memory channels (as the slide image says), not four as the table says.ZeDestructor - Monday, June 13, 2016 - link
I think that's a typo, since it's a more or less drop-in upgrade to the Brickland platform that only has 4-channel memory. Besides, ark also lists it as having 4 channels.Ian Cutress - Monday, June 13, 2016 - link
It's technically four, but my using memory expanders it effectively splits each memory channel into two, allowing for 3DPC.http://www.anandtech.com/show/9193/the-xeon-e78800...
ZeDestructor - Tuesday, June 14, 2016 - link
Ahh, very nice.How I wish I had the cash and power to have a Brickland machine for my homeserver... would do wonders for a silly ZFS host...
Eden-K121D - Monday, June 13, 2016 - link
Well i heard a rumour about a zen naples server processor having 32 cores 8 channel DDR4 meory and 128 PCIE gen 3 lanesMeteor2 - Monday, June 13, 2016 - link
So... What are 8S servers used for? VM farms? When is effective to buy one of these rather than use several smaller, cheaper servers?FunBunny2 - Monday, June 13, 2016 - link
-- When is effective to buy one of these rather than use several smaller, cheaper servers?any embarrassingly parallel problem. OLTP systems are the archetype.
mdw9604 - Sunday, June 19, 2016 - link
I am not embarrassed that my problems are parallel. Its the perpendicular ones that I tend to cover up.Kevin G - Monday, June 13, 2016 - link
Several smaller cheaper servers introduces networking overhead and in most cases centralized storages. A single system image with equivalent processing power ends up being faster due the removal of this overhead, sometimes by a surprising amount.The other thing is that these systems support a lot of memory per socket: 24 TB using the largest DIMMs available today an in eight socket configuration. Many production datasets can fit into that amount. Intel is offering quad core chips with full support for this capacity which is interesting from a licensing cost standpoint.
Meteor2 - Tuesday, June 14, 2016 - link
Big in-memory databases are interesting, though I understand it takes something like 15 minutes to load them into memory. Plus NMVeF is blurring local and remote memory.I guess the problem for these machines though is there aren't many embarrassingly parallel problems out there. We run HPC workloads where I am, and they're best suited to just so many E5s on a very fast network. The jobs are many times too big to fit on one of these.
mapesdhs - Tuesday, June 14, 2016 - link
On the contrary, there are many relevant workloads, from GIS to medical and defense imaging. Just look into the history of the customer base at SGI, those who bought their Origin systems, etc. Hence the existence of the modern UV series (256 sockets atm). Customers were already dealing with multi-GB datasets 20 years ago, and SGI was the first to design something that could load such files in mere seconds (Group Station for Defense Imaging). I'm not sure about the modernUV systems, but the bisection bandwidth of the last-Gen Origin was 512GB/sec (or it might be 1TB/sec if they made the usual 2X larger system for selected customers) and the tech has moved on a lot since then, with new features such as hw MPI offload, etc.But yes, other loads don't scale well, all depends on the task. Hence the existence of cluster products aswell, and of course the ability to partition a UV into multiple subunits, each of which can be optimised to match the task scalability, while also allowing fast intercommunication between them, aswell as shared access to data, etc. Meanwhile, work goes on to improve scalability methods, eg. an admin at the Cosmos centre told me they're working hard to improve the scaling of various cosmological simulation codes to exploit up to 512 CPUs. In other fields however, GPU acceleration has taken over, but often that needs big data access aswell. It's a mixed bag as usual.
Speaking of which, Ian, re the usual Intel partners, you forgot to mention SGI. There's no doubt they'll be using these new CPUs in their UV range.
One thing I don't get concerning the E7-8893 v4, if it only has 4 cores, why aren't the max Turbo levels much higher? Indeed, the base clock could surely be a lot higher aswell.
Topinio - Tuesday, June 14, 2016 - link
I'm also surprised at the E7-8893 v4, wondering if the clocks are actually high -- this article is completely missing the AVX clocks (base, 1C max, and all-core max) as well as the all-core max turbo non-AVX clocks, for all these CPUs.Not just this article, of course -- the recent ones (E3-15xx v5, E5-x6xx v4) also have too. It's pretty annoying, without knowing these one can't know how many GFLOPS of potential performance are available.
Might even have missed breaking the 1 TFLOP barrier, if the all-core AVX turbo clock on the E7-8890 v4 is 2.7 GHz ...
HideOut - Tuesday, June 14, 2016 - link
This could be an instance where intel uses the lower binned chips as a sellable product so it had to keep clock speed down in order to pass validation and whatnot.Kevin G - Tuesday, June 14, 2016 - link
The presented data appears to be from ark which doesn't list AVX clock speeds. The good news is that this is less of an issue overall as apparently AVX clock only affect the particular core running AVX code, not the entire chip. While I haven't seen the max clock permitted for AVX code on these E7 v4's, if they are anything like the E5 v4's then the max clock will be the same between AVX and non-AVX code. So while the system can clock down from running AVX code, Intel's implementation this time around is much less of an issue for traditional server scenarios (HPC is another matter).Kevin G - Tuesday, June 14, 2016 - link
If you're running an embarrassing parallel job that can't fit into the memory of these E7 systems, then yes it would be more cost effective to go with more nodes and E5 chips. No way around the networking overhead in this scenario so might as well use it to your advantage to lower costs.Seekmore - Wednesday, July 6, 2016 - link
Intel's all newly launched lineup of 4800 V4 and 8800 V4 server processors from Broadwell-Ex family and their detailed info structure is similarly given in deep tech terms on http://www.comparecpus.com/en/cpus-from-intel-xeon... . But they have given a rating for different specifications which is really cool to check out. these boosters seem really promising for this roll out. let's see if they really work out as they are supposed to.Tema726 - Thursday, May 11, 2017 - link
Hello! Is there any reference on the die size? "[...] 24 physical cores at 456.12mm2 [...]" -- where is it coming from, and is there anything about the length and width of the die? Could not find it in Intel's datasheets, unfortunately