12-core die (used in Xeon E5, too) is most likely a 15-core. There are several hints that lead to this conclusion: same package size for 15-core E7 and 12-core E5 models and few config registers that even in E5 generations could address 15 cores.
Does this mean it could be possible we have X99 chipset motherboards supporting this CPU later this year? I'm sick of 6 and 8 cores, it's not enough for me but i'm not spending tons of typical Xeon motherboards.
Quite an impressive power user if you need a CPU designed for a heavy load server performing multitasking tasks such as handling hundreds or thousands of simultaneous user requests and/or managing dozens of VMs.
Exactly what are you doing that 8 cores is "not enough?"
Everyone is sick of the 6 and 8 cores, or the 4 cores as mainstream. They seem to be in the market for an eternity. You can't exactly blame fully Intel for this as really multithreaded programs are still a scarcity.You can't exactly blame programmers as well because their compensation is ridiculous and their job inhumanly demanding. About this "power user" thing; it is dumb to self hallucinate that there is the average joe as opposed to "high performance" computing, the"big data" or "science apps". Remember that people in these fields readily salivate for gamer's gpus and be frank to recognize that supercomputing would have much less progress if it weren't for the poor average joe that you seem to scorn. The fact that 12-15 cores are available for " businesses" but not for the average joe is a travesty that stinks of decadence, waiting for the next paradigm of innovation.
This is a bit of a chicken and egg problem, both for the hardware, and for the fact that the threading support in C++ is still relatively poor - it doesn't really have much in the way of features designed to help deal with the data-access control problem that is big issue in multithreaded programs (and C++ is still the language of choice for performance-critical apps, the likes of which would have a potential benefit from better usage of threading in the first place).
I'm rather surprised if they go with the same socket 2011 as SandyBridge-E/Ivy Bridge-E. One of the hallmarks of the EX line has been its good socket scalablity by having 4 QPI links. This enables 8 socket systems with minimal hops between sockets. Similarly the EX has used memory buffers to enable more bandwidth and higher memory capacities than the EP line.
If the included picture hasn't been modified by Photoshop, then we're looking at a different physical key for the socket.
To quote Intel's paper at ISSCC: "The processor supports two 2011-land, 40-mil pitch organic flip-chip LGA package options", which I would interpret as LGA-2011.
Yes, it has 2011 pads on the package. Rather are they configured the same as SandyBridge-E/Ivy Bridge-E. In other words, does it have two QPI links, four DDR3 channels and 40 PCI-e lanes as the main IO features?
Also worth pointing out is that the rumors of Haswell-E have it using a 2011 pad package as well but supporting DDR4 memory. This variant has been tagged as 2011v3. (Presumably Ivy Bridge-EX would be 2011v2.)
Actually looking at the block diagram, something is really weird. What is VSME and why are there seven of the links? Could these be a new serial memory bus technology? I'd fathom it'd be useful for a spare memory channel in a RAID5-like array (See the Alpha EV7). However, everything else points to two independent memory controllers which complicate such a function.
Further more, the block diagram indicates 3 QPI links which would allow this chip to scale to 8 sockets. The current Sandy Bridge-E/Ivy Bridge-E only go upto quad socket.
I wonder if Intel has some something rather crazy: enabled this die to be used in both the original LGA 2011 and a new EX variant. All the 12 core Ivy Bridge-E's were rumored to be using this die.
I've been under the impression that Ivy Bridge-EX would still be using memory buffers, similar in concept to FB-DIMMs and the memory buffers used by Nehalem/Westmere-EX. Since much of the device signal portion is abstracted from the main die, using multiple memory technologies would be possible. I thought Ivy Bridge-EX's buffers would start off supporting DDR3 with DDR4 buffers appearing next year, possibly as a mid-cycle refresh before Haswell-EX appears.
SuperMicro and HP have made 8P behemoths based on Westmere-EX systems. I had some time testing SuperMicro's 4P, although I wouldn't mind looking at the new 8P if they make one.
IBM had the X3850 X5 that would scale up to 8 sockets. It was different as it was spread out between two chassis with an external QPI links joining them.
IBM also exploited the external QPI links to offer raw memory expanders without the need for more CPU sockets.
Max configuration is a UV 2000 with 256 CPUs from the XEON E5 series (max 2048 cores, 4096 threads), 64TB RAM in 4 racks. Note this is NOT a cluster, it's a single combined system using shared memory NUMA. No doubt SGI will adopt the E7 line with an arch update if necessary. The aim is to eventually scale to the mid tens of thousands of CPUs single-image (1/4 million cores or so).
8 processors is nothing. :D I have a "low end" 36-processor Onyx 3800 that's 14 years old...
SGI developed their own QPI glue logic to enable scaling beyond 4 socket with those chips.
It does have a massive global address space which is nice from a programming standpoint, though to get there SGI had to go through several weird hoops. Reading up on the interconnect documentation, it is a novel tiered structure. The 64 TB of RAM limit is imposed by the Xeon E5's physical address space. Adding another tier allows for non-coherent addressing memory space upto 8 PB. A 64 TB region does appear to be fully cache coherent in that node and the socket limit for a node is 256..
MPI clustering techniques are used to scale at some point and SGI's interconnect chips provide some MPI acceleration to reduce CPU overhead and increase throughput. Neat stuff.
@mapesdhs, Please stop spreading this misconception. The SGI UV2000 server with 256 sockets IS a cluster. Yes, it runs single image Linux over all nodes, but it is still a cluster.
First of all, there are (at least) two kinds of scalability. Scale out, which is a cluster - you just add another node and you have a more powerful cluster. They are very large, with 100s of cpus or even 1000s of cpus. All supercomputers are of this type, and they run embarassingly parallell workloads, number crunching HPC stuff. Typically, they run a small loop on each cpu, very cache intensive, doing some calculation over and over again. These servers are all HPC clusters. SGI UV2000 is of this type. Latency to far off cpus are very bad.
Scale up - which is a single fat huge server. They weigh 1000kg or so, and have up to 32 sockets, or even 64 sockets. They dont run parallell workloads, no. Typically they run Enterprise workloads, such as large databases. These workloads are branch intensive, and jumps wildly in the code, the code will not fit into the cache. These are of the SMP server type, running SMP workloads. SMP servers are not a cluster, they are a single fat server. Sure, they can use NUMA techniques, etc - but the latency to another cpu is very low (because they only have 32/64 cpus which is not far away), so in effect they are like true SMP server. There are not many hops to reach another cpu. SGI is not of this SMP server type. Examples of this type are IBM P795 (32 sockets), Oracle M6-32, Fujitu M4-10s (64 sockets), HP Integrity (64 sockets). They all run Unix OS: Solaris, IBM AIX, HP-UX. They all costs many millions of USD. Very very expensive, if you want 32 socket servers. For isntance, the IBM P595 32 socket server used for the old TPC-C record, costed 35million USD. One single frigging server costed 35 million. With 32 sockets. They are VERY expneisve. A cluster is cheap, just add some pcs and a fast switch.
Sure there are clustered databases running on clusters, but it is not the same thing as a SMP server. A HPC cluster can not replace a SMP server, as HPC servers can not handle branch intensive code - the worst case latency is so bad that performance would grind to a halt if HPC clusters tried Enterprise workloads.
In the x86 area, the largest SMP servers are 8 sockets servers, for instance Oracle M4800. Which is just a x86 pc sporting eight of these Ivy Bridge-EX cpus. There are no 32 socket x86 servers, no 64 sockets. But there are 256 sockets and above (i.e. clusters). So there is a huge gap between 8 sockets, the next is 256 sockets (SGI UV2000). Anything larger than 64 sockets, is a cluster.
For instance, the ScaleMP Linux server sporting 8192/16384 cores and gobs of TB or RAM, very similar to this SGI UV2000 cluster, is also a cluster. It uses a software hypervisor, that tricks the Linux kernel into believing it runs on a SMP server, instead of a HPC cluster: http://www.theregister.co.uk/2011/09/20/scalemp_su... "...Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a SMP shared memory system, ScaleMP cooked up a special software hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes....vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space. vSMP has its limits....The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit."
Even SGI confesses their large Linux Altix and UV2000 servers, are clusters: http://www.realworldtech.com/sgi-interview/6/ "The success of Altix systems in the high performance computing market are a very positive sign for both Linux and Itanium. Clearly, the popularity of large processor count Altix systems dispels any notions of whether Linux is a scalable OS for scientific applications. Linux is quite popular for HPC and will continue to remain so in the future,...However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time"
All large Linux servers with 1000s of cores, are all clusters - and they are all used for HPC number crunching workloads. None are used for SMP workloads. The largest Linux SMP server are 8 socket servers. Anything larger than that, are Linux clusters. So, Linux scales up to 8 sockets in SMP servers. And on HPC clusters, Linux scales well up to 1000 of sockets. On SMP servers, Linux does not scale well. People have tried to compile Linux to the big Unix servers, for instance "Big Tux" server, which is the HP Integrity 64 socket Unix server - with terrible results. The cpu utilization was 40% or so, which means every other cpu were idle - under full load. Linux limit is somewhere around 8 sockets, it does not scale further.
That is the reason Linux does not venture into Enterprise arena. Enterprise which is very lucrative, needs huge 32 socket SMP servers, to run huge databases. And they shell out millions of USD on a single 32 socket server. If Linux could venture into that arena, Linux would. But there are no such big Linux SMP servers on the market. If you know of any, please link. I have never seen a Linux SMP server beyond 8-sockets. The Big Tux server, is a HP-UX server, so it is not a Linux server. It is a Linux experiment with bad performance and results.
So, these large Linux servers - are all clusters which is evidenced by they all are running HPC workloads. None are running SMP workloads. Please post a link, if you know of a counter example (you will not find any counter examples, trust me).
Here is a counter example: http://www.sgi.com/pdfs/4192.pdf It describes ASIC used in the SGI UV2000 and how it links everything together. In particular, differentiates how it is different from a cluster. The main points are as follows: *Global memory space - every byte of memory is addressable directly from any CPU core. *Cache coherent for systems up to 64 TB *One instance of an operating system across the entire system without the need of a hypervisor (this is different from ScaleMP which has to have hypervisor running on each node) I also would not cite a SGI interview from 2004 regarding technology introduced in 2012. A lot has changed in 8 years.
Similarly the "Big Tux" experiment used older Itanium chips that still used a FSB. The have since gone to the same QPI bus as modern Xeons. Scaling to higher socket counts is better on the Itanium side as it has more QPI links. Of course this is a kinda moot point as all enterprise Linux distributions have dropped Itanium support years ago.
Oracle is working on adding SPARC support to it s Oracle Linux distribution. This would be another source for a large coherent system capable of running a single Linux image. No other enterprise Linux distribution will be officially supported on Oracle's hardware.
Where is the counter example? I am asking for an example of a Linux server with more than 8 sockets that runs SMP workloads, namely Enterprise stuff, such as big databases. The SGI server you link to, is a cluster. It says so in your link: they talk alot about "MPI", which is a library for doing HPC calculations on clusters. MPI is never used on SMP servers, it would be catastrophic to develop Oracle or DB2 database, using clustered techniques such as MPI. http://en.wikipedia.org/wiki/Message_Passing_Inter... "MPI remains the dominant model used in High-Performance Computing today...MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run such programs."
So, there are no large SMP Linux servers. Sure, you can compile Linux to the IBM AIX P795 Unix server, but that would be nasty. The P795 is very very expensive 10s of millions of USD, and because Linux does not scale beyond 8 sockets on SMP servers, the performance would be bad too. It would be a bad idea to buy a very expensive Unix server, and install Linux instead.
Regarding the Oracle SPARC servers. Larry Ellison said officially when he bought Sun, that Linux is for low-end and Solaris for high end. Oracle is not offering any big Linux servers. All 32 socket servers are running Solaris.
Have you never thought of why the well researched and mature Unix vendors, have for decades stuck on 32/64 socket servers? They have have had 32 sockets Unix servers for decades, but not larger than that. Why not? Whereas the buggy Linux, has 8 socket servers or 10.000s of core servers, but nothing in between. There are no vendor manufacturing 32 socket Linux servers. You need to recompile Linux to 32 socket Unix servers with bad performance results. The answer is that Linux scales bad on SMP servers, 8 sockets being the maximum. And all larger Linux servers are all clusters, such as SGI UV2000 or the ScaleMP servers. Everybody wants to go into the Enterprise segment, which is very lucrative, but until someone will build 16 socket Linux servers, optimized for Linux, the Enterprise segment belongs to Unix and IBM Mainframes.
BTW, Oracle is developing a 96 socket SPARC server, designed to run huge databases, i.e. SMP workloads. You cant use MPI for Enterprise workloads, MPI is used for clustered HPC number crunching. Also, in 2015, Oracle will release an 16.384 threaded SPARC server with 64 TB of RAM. Both of them running Solaris of course.
Look at the picture are the bottom, here you see that for 32 sockets, each SPARC cpu can reach any other in at most 2-3 hops, which is very good. In effect, it is a SMP server, although it uses NUMA techniques. http://www.theregister.co.uk/2013/08/28/oracle_spa...
You need to define precisely what an actual SMP is. I would argue that the main attributes are a global memory address space, cache coherency between all sockets, only one instance of an OS/hypervisor is necessary to run across all sockets.
Also you apparently didn't read your link to the Register very well. To quote it "This is no different than the NUMAlink 6 interconnect from Silicon Graphics...". Note that SGI UV2000 uses NUMALink6 and according to your reference is an SMP machine. So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not.
As for MPI, it is useful on larger SMP due to its ability to take advantage of memory locality in a NUMA system. It is simply desirable to run a calculation on the a core that resides closest to where the the variables are stored in memory. It reduces the number of links data has to move over before it is processed, thus improving efficiency. This idea applies to both large scale NUMA where the links are intersocket as well as clusters where the links are high speed networking interfaces. Using MPI provides a common interface the programmer regardless if the code is running on a massively parallel SMP machine or a cluster made of hundreds of independent nodes.
As for the IBM p795, you don't have to do any of the compiling, IBM has precompiled Redhat and Suse binaries ready to go. That goes outside of the point though, regardless of price, it is a large SMP server that can run Linux in an enterprise environment with full support. It meets your criteria for something you said did not exist. As for your thoughts on Linux not scaling past 32 sockets for business applications, IBM does list world records for SPECjbb2005 using a p795 and Linux: http://www-03.ibm.com/systems/power/hardware/bench...
"...So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not..."
The definition of SMP is not in the architecture or how the server is built or which cpu it uses. The definition of SMP, is if it can be used for SMP workloads, simple as that. And as SGI and ScaleMP - both selling large linux servers with 10.000s of cores say in my links: these SGI and ScaleMP servers are only used for HPC, and never used for SMP workloads. Read my links.
Simple as that, they say it explicitly "not for SMP workloads, only for HPC workloads".
I dont care if a cluster can replace a SMP server running SMP workloads - then that cluster is good for SMP workloads. But fact is, no cluster can run SMP workloads, they can only run SMP workloads.
If you have a counterexample of SGI or ScaleMP running SMP workloads, please post them here. That would be the first time a cluster can replace a SMP server.
That does not address the architectural similarities between the SGI UV2000 and the SPARC M6 for what defines a big SMP. Rather you're attempting to use the intentionally vague definition of running SMP by merely running SMP style software. I fully reject that definition as a single socket desktop computer with enough RAM can run that software with no issue. Sure, it'll be slower than these big multisocket machines and the results maybe questionable as it has no real RAS features but it would work. I also reject the idea that clusters cannot run what you define as SMP workloads - enterprise scale applications are designed to run on clusters for the simple reason of redundancy. For example large databases run in at least pairs to cover possible hardware failure and/or the need to service a machine (and depending on the DB, both instances can be active but it is unwise to beyond 50% capacity per machine). Further more, these clusters have a remote replication to another data center in case of a local catastrophe. That'd be three or more instances in a cluster.
Thus I stand by my definition of what an SMP machine is: global memory space, cache coherency across multiple sockets and only one OS/hypervisor necessary across the entire system.
You also have ignored the IBM p795 Linux benchmarks for SPECjbb2005 which falls into your SMP workload category. The p795 should fit anyone's definition of an SMP machine.
As for reading your links, I obviously have as I'm pulling quotes out of them that contradict your claims ( "This is no different than the NUMAlink 6 interconnect from Silicon Graphics, which implements a shared memory space using Xeon E5 chips..." http://www.theregister.co.uk/2013/08/28/oracle_spa... ) or have noticed that they're horrendously out of date from 2004 ( http://www.realworldtech.com/sgi-interview/6/ ).
My rationale is simple: a "cluster" by definition does not have the low latency required to function as a shared memory, single combined system. The UV 2000 does, hence it's not a cluster. I know people who write scalable code for 512+ cores, and that's just on the older Origin systems which are not as fast. There's a lot of effort going into increasing code scalability, especially since SGI intends to increase the max cores over 250K.
If you want to regard the UV 2000 as a cluster, feel free, but it's not, because it functions in a manner which a conventional cluster simply can't: shared memory, low latency RAM, highly scalable I/O. Clusters use networking technologies, Infiniband, etc., to pass data around; the UV can have a single OS instance run the entire system. Its use of NUMALink6 to route data around the system isn't sufficient reason to call it a cluster, because NUMA isn't a networking tech.
Based on the scalability of target problems, one can partition UV systems into multiple portions which can communicate, but they still benefit from the high I/O available across the system.
It's not a cluster, and no amount of posting oodles of paragraphs will change that fact.
Ian.
PS. Kevin, thanks for your followup comments! I think at the time this article was current, I just couldn't be bothered to read Brutalizer's post. :D
Clicking through to the CPU-World page lists that CPU as not having a Turbo mode. Specifications are still unconfirmed at this point - as mentioned in the piece Intel often does a balancing act of cores/MHz and will never release a max-core model with max-frequency.
My original source for the information was a PCWorld article, until I was forwarded the Intel information direct. I have used information from CPU-World as well, who have used a different source.
I really expect that the E7-2800/4800/8800 v2 family (Ivy Bridge-EX) will have Turbo Boost. We just don't know what the specs will be from the various leaked sources. It is also supposed to have triple the memory density of Westmere-EX, plus PCI-E 3.0 support.
Very frustrating that my company (architectural visualisation) could absolutely make use of these chips in our render-farm, yet our margins mean we will never be able to afford them. Intel's pricing for anything with more than 6 cores is just depressing. I guess we're just unfortunate to be in a no-mans-land market segment that gets use from multi-core CPUs but doesn't generate enough revenue to feast at the high table.
I suspect NSA is more interested in CPUs that can handle many more threads (on the order of thousands) than Intel CPUs or even SPARC CPUs can. Cray used to make such a CPU, and may still.
Near as I can find, it's been years (or a decade+) since Cray built a machine with a Cray cpu; it's been AMD and Intel. Lots o chips in the cabinet. The interconnects have been Cray's special sauce.
Yeah, possibly Nvidia Tesla gpu chips as well in their mix since crypto cracking needs plenty of fpu power. These 15core monsters with 4.5 billion transistors certainly are rather power efficient at 150w TDP.
Cray - through its YarcData subsidiary - sells machines based on the custom ThreadStorm processor, which is a single-core 500MHz 3-issue VLIW design with 128-thread fine-grained multithreading, based on the 1990's Tera MTA machine. I assume that is what Ktracho was referring to.
The Opterons 16 cores are cheaper and consume way less power, although they are not as powerful per core as the Intel part. But it seems Intel is responding to the Opteron 16 core release as AMD's pricing is much more reasonable for servers. But I am more concerned with scale upwards in terms of core count. One can see Intel NEEDS a huge L3 cache to keep their cores fed while AMD uses larger L2 cache exclusive to each processor and small L3 cache. When AMD uses HSA for server chips it would be interesting to see who they put as non-cpu compute cores/ Maybe a giant quad-pumped fpu unit cluster that does 4ops per cycle and crunches DP fp32 faster than ever before.
Ivy Bridge EX is simply in a different league compared to anything AMD has to offer today.
So, no, Intel is not responding to AMD with Ivy Bridge EX, as AMD has close to zero market share in this segment and has nothing to offer.
Don't get me wrong, I'd very much love AMD to become competetive again, as Intel became a de-facto monopoly and basically segments the market with eFuses which control features worth four digit dollar numbers.
It is a sad state for an IT industry as a whole, but if we talk about performance and RAS features, Ivy Bridge EX has no competing product in AMD product offering - it is literally 2 or 3 generations ahead.
I foresee VMWare and Microsoft further revising their licensing strategies. One license of Enterprise Server for each physical CPU isn't going to cut it when that means 15 cores and 30 threads.
It has turbo, and it goes to insane speeds (for a high-end server CPU), this is why 165W TDP is for :-) Article has a typo.
And, unfortunately, you won't be able to upgrade your Mac Pro with Ivy Bridge EX, as the socket is different (EX is using LGA 2011-1, EP used in Mac Pro is LGA 2011). If you need more CPU performance than a single Intel Xeon 2697 v2 has to offer you have following options:
- Replace your Mac Pro with 2-socket or 4-socket PC workstation system, and put 2 Xeon 2697 v2 CPUs or 4 Xeon 4657L v2 (latter brings 48 cores / 96 threads to the game)
- Replace your Mac Pro with up to 8 socket Ivy Bridge EX system, but be prepared to pay a price of a small flat for such a system... But if money is no object :)
Unfortunately, both options are quite bigger than new Mac Pro, but I'd be much more comfortable with proper big server case for something that has several hundred Watts of TDP.
I'm using dual-socket ASUS Z9PE D8 WS with dual Xeon 2697 v2 for now, but for what I'm doing I'd be looking into 4P expansion...
We have been in affiliate marketing and multi-level marketing for awhile now and nothing has changed our business more quickly than Modulates and its way of promoting video testimonials.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
55 Comments
Back to Article
extide - Tuesday, February 11, 2014 - link
I would assume the 15-core die looks a lot like the 12-core die, except it is 3 columns of 5, instead of 3 columns of 4.Ian Cutress - Tuesday, February 11, 2014 - link
This is confirmed - 3 blocks of 5.extide - Wednesday, February 12, 2014 - link
Sweet, heh, called it, (before the post was updated) :) But anyways that is a HUGE die! I wonder how big it is and how man transistors?psyq321 - Wednesday, February 12, 2014 - link
12-core die (used in Xeon E5, too) is most likely a 15-core. There are several hints that lead to this conclusion: same package size for 15-core E7 and 12-core E5 models and few config registers that even in E5 generations could address 15 cores.churchgeek - Tuesday, February 11, 2014 - link
Wow- interesting that the new 8890v2 cpu is going back to socket LGA2011.B3an - Tuesday, February 11, 2014 - link
Does this mean it could be possible we have X99 chipset motherboards supporting this CPU later this year? I'm sick of 6 and 8 cores, it's not enough for me but i'm not spending tons of typical Xeon motherboards.Gigaplex - Tuesday, February 11, 2014 - link
Yet you're willing to spend $5000 or more on a CPU?inighthawki - Tuesday, February 11, 2014 - link
Quite an impressive power user if you need a CPU designed for a heavy load server performing multitasking tasks such as handling hundreds or thousands of simultaneous user requests and/or managing dozens of VMs.Exactly what are you doing that 8 cores is "not enough?"
ShieTar - Thursday, February 13, 2014 - link
And what is the magical Non-Xeon-8core he is using for it?inighthawki - Thursday, February 13, 2014 - link
He stated "I'm sick of 6 and 8 cores" so it was more of a response to his statement as opposed to a particular processor. Ask him what he's using. :)IUU - Sunday, February 16, 2014 - link
Everyone is sick of the 6 and 8 cores, or the 4 cores as mainstream. They seem to be in the market for an eternity. You can't exactly blame fully Intel for this as really multithreaded programs are still a scarcity.You can't exactly blame programmers as well because their compensation is ridiculous and their job inhumanly demanding.About this "power user" thing; it is dumb to self hallucinate that there is the average joe as opposed to "high performance" computing, the"big data" or "science apps". Remember that people in these fields readily salivate for gamer's gpus and be frank to recognize that supercomputing would have much less progress if it weren't for the poor average joe that you seem to scorn.
The fact that 12-15 cores are available for " businesses" but not for the average joe is a travesty that stinks of decadence, waiting for the next paradigm of innovation.
twtech - Thursday, February 20, 2014 - link
This is a bit of a chicken and egg problem, both for the hardware, and for the fact that the threading support in C++ is still relatively poor - it doesn't really have much in the way of features designed to help deal with the data-access control problem that is big issue in multithreaded programs (and C++ is still the language of choice for performance-critical apps, the likes of which would have a potential benefit from better usage of threading in the first place).sanaris - Monday, February 17, 2014 - link
Virtualization and cloud - nah.Production - yes.
I need as much as possible power inside single motherboard to run quantum predictions.
Jaybus - Wednesday, April 8, 2015 - link
Then go for more PCIe lanes for GPUs and OpenCL, rather than more SMP cores.psyq321 - Wednesday, February 12, 2014 - link
It is not the same LGA 2011 socket as Xeon E5 / Core i7 49xx unfortunately.This is due to different memory controller, so the pin layout is different and, thus, not compatible.
Kevin G - Tuesday, February 11, 2014 - link
I'm rather surprised if they go with the same socket 2011 as SandyBridge-E/Ivy Bridge-E. One of the hallmarks of the EX line has been its good socket scalablity by having 4 QPI links. This enables 8 socket systems with minimal hops between sockets. Similarly the EX has used memory buffers to enable more bandwidth and higher memory capacities than the EP line.If the included picture hasn't been modified by Photoshop, then we're looking at a different physical key for the socket.
Ian Cutress - Tuesday, February 11, 2014 - link
To quote Intel's paper at ISSCC: "The processor supports two 2011-land, 40-mil pitch organic flip-chip LGA package options", which I would interpret as LGA-2011.Kevin G - Tuesday, February 11, 2014 - link
Yes, it has 2011 pads on the package. Rather are they configured the same as SandyBridge-E/Ivy Bridge-E. In other words, does it have two QPI links, four DDR3 channels and 40 PCI-e lanes as the main IO features?Also worth pointing out is that the rumors of Haswell-E have it using a 2011 pad package as well but supporting DDR4 memory. This variant has been tagged as 2011v3. (Presumably Ivy Bridge-EX would be 2011v2.)
Kevin G - Tuesday, February 11, 2014 - link
Actually looking at the block diagram, something is really weird. What is VSME and why are there seven of the links? Could these be a new serial memory bus technology? I'd fathom it'd be useful for a spare memory channel in a RAID5-like array (See the Alpha EV7). However, everything else points to two independent memory controllers which complicate such a function.Further more, the block diagram indicates 3 QPI links which would allow this chip to scale to 8 sockets. The current Sandy Bridge-E/Ivy Bridge-E only go upto quad socket.
I wonder if Intel has some something rather crazy: enabled this die to be used in both the original LGA 2011 and a new EX variant. All the 12 core Ivy Bridge-E's were rumored to be using this die.
psyq321 - Wednesday, February 12, 2014 - link
It is probably the same die shared between HCC EP and EX lines, but with different modules being enabled in EX line (extra QPI, etc.)It could also be that Intel originally planned 15 core Ivy Bridge EP but decided to keep 15 core SKU only for the EX generation.
In any case, we are now stuck with 3 different LGA2011 configurations:
- LGA2011 - for Sandy Bridge EP / Ivy Bridge EP
- LGA2011-1 - for Ivy Bridge EX and Haswell EX
- LGA2011-3 - for Haswell EP (and maybe Broadwell EP)
Last two can support DDR4, but the memory configuration in EX line is totally different, and goes through a scalable memory buffer.
Way to keep things simple :)
Kevin G - Wednesday, February 12, 2014 - link
I've been under the impression that Ivy Bridge-EX would still be using memory buffers, similar in concept to FB-DIMMs and the memory buffers used by Nehalem/Westmere-EX. Since much of the device signal portion is abstracted from the main die, using multiple memory technologies would be possible. I thought Ivy Bridge-EX's buffers would start off supporting DDR3 with DDR4 buffers appearing next year, possibly as a mid-cycle refresh before Haswell-EX appears.pixelstuff - Tuesday, February 11, 2014 - link
"suited for 4P/8P systems"What is an example of an 8P system? Who makes such a thing?
Ian Cutress - Tuesday, February 11, 2014 - link
SuperMicro and HP have made 8P behemoths based on Westmere-EX systems. I had some time testing SuperMicro's 4P, although I wouldn't mind looking at the new 8P if they make one.Kevin G - Tuesday, February 11, 2014 - link
IBM had the X3850 X5 that would scale up to 8 sockets. It was different as it was spread out between two chassis with an external QPI links joining them.IBM also exploited the external QPI links to offer raw memory expanders without the need for more CPU sockets.
darking - Tuesday, February 11, 2014 - link
Fujitsu Produces the RX900 too http://www.fujitsu.com/fts/products/computing/serv...mapesdhs - Tuesday, February 11, 2014 - link
At the extreme end, see: http://www.sgi.com/products/servers/uv/
Max configuration is a UV 2000 with 256 CPUs from the XEON E5 series
(max 2048 cores, 4096 threads), 64TB RAM in 4 racks. Note this is NOT
a cluster, it's a single combined system using shared memory NUMA. No
doubt SGI will adopt the E7 line with an arch update if necessary. The aim
is to eventually scale to the mid tens of thousands of CPUs single-image
(1/4 million cores or so).
8 processors is nothing. :D I have a "low end" 36-processor Onyx 3800 that's 14 years old...
Ian.
Kevin G - Wednesday, February 12, 2014 - link
SGI developed their own QPI glue logic to enable scaling beyond 4 socket with those chips.It does have a massive global address space which is nice from a programming standpoint, though to get there SGI had to go through several weird hoops. Reading up on the interconnect documentation, it is a novel tiered structure. The 64 TB of RAM limit is imposed by the Xeon E5's physical address space. Adding another tier allows for non-coherent addressing memory space upto 8 PB. A 64 TB region does appear to be fully cache coherent in that node and the socket limit for a node is 256..
MPI clustering techniques are used to scale at some point and SGI's interconnect chips provide some MPI acceleration to reduce CPU overhead and increase throughput. Neat stuff.
Brutalizer - Thursday, February 13, 2014 - link
@mapesdhs,Please stop spreading this misconception. The SGI UV2000 server with 256 sockets IS a cluster. Yes, it runs single image Linux over all nodes, but it is still a cluster.
First of all, there are (at least) two kinds of scalability. Scale out, which is a cluster - you just add another node and you have a more powerful cluster. They are very large, with 100s of cpus or even 1000s of cpus. All supercomputers are of this type, and they run embarassingly parallell workloads, number crunching HPC stuff. Typically, they run a small loop on each cpu, very cache intensive, doing some calculation over and over again. These servers are all HPC clusters. SGI UV2000 is of this type. Latency to far off cpus are very bad.
Scale up - which is a single fat huge server. They weigh 1000kg or so, and have up to 32 sockets, or even 64 sockets. They dont run parallell workloads, no. Typically they run Enterprise workloads, such as large databases. These workloads are branch intensive, and jumps wildly in the code, the code will not fit into the cache. These are of the SMP server type, running SMP workloads. SMP servers are not a cluster, they are a single fat server. Sure, they can use NUMA techniques, etc - but the latency to another cpu is very low (because they only have 32/64 cpus which is not far away), so in effect they are like true SMP server. There are not many hops to reach another cpu. SGI is not of this SMP server type. Examples of this type are IBM P795 (32 sockets), Oracle M6-32, Fujitu M4-10s (64 sockets), HP Integrity (64 sockets). They all run Unix OS: Solaris, IBM AIX, HP-UX. They all costs many millions of USD. Very very expensive, if you want 32 socket servers. For isntance, the IBM P595 32 socket server used for the old TPC-C record, costed 35million USD. One single frigging server costed 35 million. With 32 sockets. They are VERY expneisve. A cluster is cheap, just add some pcs and a fast switch.
Sure there are clustered databases running on clusters, but it is not the same thing as a SMP server. A HPC cluster can not replace a SMP server, as HPC servers can not handle branch intensive code - the worst case latency is so bad that performance would grind to a halt if HPC clusters tried Enterprise workloads.
In the x86 area, the largest SMP servers are 8 sockets servers, for instance Oracle M4800. Which is just a x86 pc sporting eight of these Ivy Bridge-EX cpus. There are no 32 socket x86 servers, no 64 sockets. But there are 256 sockets and above (i.e. clusters). So there is a huge gap between 8 sockets, the next is 256 sockets (SGI UV2000). Anything larger than 64 sockets, is a cluster.
For instance, the ScaleMP Linux server sporting 8192/16384 cores and gobs of TB or RAM, very similar to this SGI UV2000 cluster, is also a cluster. It uses a software hypervisor, that tricks the Linux kernel into believing it runs on a SMP server, instead of a HPC cluster:
http://www.theregister.co.uk/2011/09/20/scalemp_su...
"...Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a SMP shared memory system, ScaleMP cooked up a special software hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes....vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space. vSMP has its limits....The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit."
Even SGI confesses their large Linux Altix and UV2000 servers, are clusters:
http://www.realworldtech.com/sgi-interview/6/
"The success of Altix systems in the high performance computing market are a very positive sign for both Linux and Itanium. Clearly, the popularity of large processor count Altix systems dispels any notions of whether Linux is a scalable OS for scientific applications. Linux is quite popular for HPC and will continue to remain so in the future,...However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time"
All large Linux servers with 1000s of cores, are all clusters - and they are all used for HPC number crunching workloads. None are used for SMP workloads. The largest Linux SMP server are 8 socket servers. Anything larger than that, are Linux clusters. So, Linux scales up to 8 sockets in SMP servers. And on HPC clusters, Linux scales well up to 1000 of sockets. On SMP servers, Linux does not scale well. People have tried to compile Linux to the big Unix servers, for instance "Big Tux" server, which is the HP Integrity 64 socket Unix server - with terrible results. The cpu utilization was 40% or so, which means every other cpu were idle - under full load. Linux limit is somewhere around 8 sockets, it does not scale further.
That is the reason Linux does not venture into Enterprise arena. Enterprise which is very lucrative, needs huge 32 socket SMP servers, to run huge databases. And they shell out millions of USD on a single 32 socket server. If Linux could venture into that arena, Linux would. But there are no such big Linux SMP servers on the market. If you know of any, please link. I have never seen a Linux SMP server beyond 8-sockets. The Big Tux server, is a HP-UX server, so it is not a Linux server. It is a Linux experiment with bad performance and results.
So, these large Linux servers - are all clusters which is evidenced by they all are running HPC workloads. None are running SMP workloads. Please post a link, if you know of a counter example (you will not find any counter examples, trust me).
Kevin G - Friday, February 14, 2014 - link
Here is a counter example:http://www.sgi.com/pdfs/4192.pdf
It describes ASIC used in the SGI UV2000 and how it links everything together. In particular, differentiates how it is different from a cluster. The main points are as follows:
*Global memory space - every byte of memory is addressable directly from any CPU core.
*Cache coherent for systems up to 64 TB
*One instance of an operating system across the entire system without the need of a hypervisor (this is different from ScaleMP which has to have hypervisor running on each node)
I also would not cite a SGI interview from 2004 regarding technology introduced in 2012. A lot has changed in 8 years.
Similarly the "Big Tux" experiment used older Itanium chips that still used a FSB. The have since gone to the same QPI bus as modern Xeons. Scaling to higher socket counts is better on the Itanium side as it has more QPI links. Of course this is a kinda moot point as all enterprise Linux distributions have dropped Itanium support years ago.
Also the IBM p795 can run Linux across all 32 sockets/256 cores/1024 threads if you wanted. IBM's Red Book on the p795: http://www.redbooks.ibm.com/redpapers/pdfs/redp464...
Oracle is working on adding SPARC support to it s Oracle Linux distribution. This would be another source for a large coherent system capable of running a single Linux image. No other enterprise Linux distribution will be officially supported on Oracle's hardware.
Brutalizer - Saturday, February 15, 2014 - link
Where is the counter example? I am asking for an example of a Linux server with more than 8 sockets that runs SMP workloads, namely Enterprise stuff, such as big databases. The SGI server you link to, is a cluster. It says so in your link: they talk alot about "MPI", which is a library for doing HPC calculations on clusters. MPI is never used on SMP servers, it would be catastrophic to develop Oracle or DB2 database, using clustered techniques such as MPI.http://en.wikipedia.org/wiki/Message_Passing_Inter...
"MPI remains the dominant model used in High-Performance Computing today...MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run such programs."
So, there are no large SMP Linux servers. Sure, you can compile Linux to the IBM AIX P795 Unix server, but that would be nasty. The P795 is very very expensive 10s of millions of USD, and because Linux does not scale beyond 8 sockets on SMP servers, the performance would be bad too. It would be a bad idea to buy a very expensive Unix server, and install Linux instead.
Regarding the Oracle SPARC servers. Larry Ellison said officially when he bought Sun, that Linux is for low-end and Solaris for high end. Oracle is not offering any big Linux servers. All 32 socket servers are running Solaris.
Have you never thought of why the well researched and mature Unix vendors, have for decades stuck on 32/64 socket servers? They have have had 32 sockets Unix servers for decades, but not larger than that. Why not? Whereas the buggy Linux, has 8 socket servers or 10.000s of core servers, but nothing in between. There are no vendor manufacturing 32 socket Linux servers. You need to recompile Linux to 32 socket Unix servers with bad performance results. The answer is that Linux scales bad on SMP servers, 8 sockets being the maximum. And all larger Linux servers are all clusters, such as SGI UV2000 or the ScaleMP servers. Everybody wants to go into the Enterprise segment, which is very lucrative, but until someone will build 16 socket Linux servers, optimized for Linux, the Enterprise segment belongs to Unix and IBM Mainframes.
BTW, Oracle is developing a 96 socket SPARC server, designed to run huge databases, i.e. SMP workloads. You cant use MPI for Enterprise workloads, MPI is used for clustered HPC number crunching. Also, in 2015, Oracle will release an 16.384 threaded SPARC server with 64 TB of RAM. Both of them running Solaris of course.
Look at the picture are the bottom, here you see that for 32 sockets, each SPARC cpu can reach any other in at most 2-3 hops, which is very good. In effect, it is a SMP server, although it uses NUMA techniques.
http://www.theregister.co.uk/2013/08/28/oracle_spa...
Kevin G - Sunday, February 16, 2014 - link
You need to define precisely what an actual SMP is. I would argue that the main attributes are a global memory address space, cache coherency between all sockets, only one instance of an OS/hypervisor is necessary to run across all sockets.Also you apparently didn't read your link to the Register very well. To quote it "This is no different than the NUMAlink 6 interconnect from Silicon Graphics...". Note that SGI UV2000 uses NUMALink6 and according to your reference is an SMP machine. So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not.
As for MPI, it is useful on larger SMP due to its ability to take advantage of memory locality in a NUMA system. It is simply desirable to run a calculation on the a core that resides closest to where the the variables are stored in memory. It reduces the number of links data has to move over before it is processed, thus improving efficiency. This idea applies to both large scale NUMA where the links are intersocket as well as clusters where the links are high speed networking interfaces. Using MPI provides a common interface the programmer regardless if the code is running on a massively parallel SMP machine or a cluster made of hundreds of independent nodes.
As for the IBM p795, you don't have to do any of the compiling, IBM has precompiled Redhat and Suse binaries ready to go. That goes outside of the point though, regardless of price, it is a large SMP server that can run Linux in an enterprise environment with full support. It meets your criteria for something you said did not exist. As for your thoughts on Linux not scaling past 32 sockets for business applications, IBM does list world records for SPECjbb2005 using a p795 and Linux: http://www-03.ibm.com/systems/power/hardware/bench...
Brutalizer - Sunday, February 23, 2014 - link
"...So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not..."The definition of SMP is not in the architecture or how the server is built or which cpu it uses. The definition of SMP, is if it can be used for SMP workloads, simple as that. And as SGI and ScaleMP - both selling large linux servers with 10.000s of cores say in my links: these SGI and ScaleMP servers are only used for HPC, and never used for SMP workloads. Read my links.
Simple as that, they say it explicitly "not for SMP workloads, only for HPC workloads".
I dont care if a cluster can replace a SMP server running SMP workloads - then that cluster is good for SMP workloads. But fact is, no cluster can run SMP workloads, they can only run SMP workloads.
If you have a counterexample of SGI or ScaleMP running SMP workloads, please post them here. That would be the first time a cluster can replace a SMP server.
Kevin G - Monday, February 24, 2014 - link
That does not address the architectural similarities between the SGI UV2000 and the SPARC M6 for what defines a big SMP. Rather you're attempting to use the intentionally vague definition of running SMP by merely running SMP style software. I fully reject that definition as a single socket desktop computer with enough RAM can run that software with no issue. Sure, it'll be slower than these big multisocket machines and the results maybe questionable as it has no real RAS features but it would work. I also reject the idea that clusters cannot run what you define as SMP workloads - enterprise scale applications are designed to run on clusters for the simple reason of redundancy. For example large databases run in at least pairs to cover possible hardware failure and/or the need to service a machine (and depending on the DB, both instances can be active but it is unwise to beyond 50% capacity per machine). Further more, these clusters have a remote replication to another data center in case of a local catastrophe. That'd be three or more instances in a cluster.Thus I stand by my definition of what an SMP machine is: global memory space, cache coherency across multiple sockets and only one OS/hypervisor necessary across the entire system.
There are business class benchmarks for the SGI UV2000. It is number two in the SPECjbb2005 benchmark when configured with 512 cores ( http://www.spec.org/jbb2005/results/res2012q2/jbb2... ). (The Fijitsu M10-4S is 2% faster but it has double the core count to do so. http://www.spec.org/jbb2005/results/res2013q3/jbb2... )
You also have ignored the IBM p795 Linux benchmarks for SPECjbb2005 which falls into your SMP workload category. The p795 should fit anyone's definition of an SMP machine.
As for reading your links, I obviously have as I'm pulling quotes out of them that contradict your claims ( "This is no different than the NUMAlink 6 interconnect from Silicon Graphics, which implements a shared memory space using Xeon E5 chips..." http://www.theregister.co.uk/2013/08/28/oracle_spa... ) or have noticed that they're horrendously out of date from 2004 ( http://www.realworldtech.com/sgi-interview/6/ ).
mapesdhs - Wednesday, July 16, 2014 - link
Blah blah...My rationale is simple: a "cluster" by definition does not have the low latency
required to function as a shared memory, single combined system. The UV 2000
does, hence it's not a cluster. I know people who write scalable code for 512+
cores, and that's just on the older Origin systems which are not as fast. There's
a lot of effort going into increasing code scalability, especially since SGI intends
to increase the max cores over 250K.
If you want to regard the UV 2000 as a cluster, feel free, but it's not, because
it functions in a manner which a conventional cluster simply can't: shared
memory, low latency RAM, highly scalable I/O. Clusters use networking
technologies, Infiniband, etc., to pass data around; the UV can have a single
OS instance run the entire system. Its use of NUMALink6 to route data
around the system isn't sufficient reason to call it a cluster, because NUMA
isn't a networking tech.
Based on the scalability of target problems, one can partition UV systems
into multiple portions which can communicate, but they still benefit from
the high I/O available across the system.
It's not a cluster, and no amount of posting oodles of paragraphs will
change that fact.
Ian.
PS. Kevin, thanks for your followup comments! I think at the time this article was
current, I just couldn't be bothered to read Brutalizer's post. :D
olderkid - Wednesday, February 19, 2014 - link
HP DL980We have a couple running as virtualization boxes. We've also considered making them huge Oracle servers.
nathanddrews - Tuesday, February 11, 2014 - link
Cool!wishgranter - Tuesday, February 11, 2014 - link
15 cores ? a bit weird in PC industry, its intel bug in their calculations or just they forget the 16th core ??? Or just because of thermal issues ??iMacmatician - Tuesday, February 11, 2014 - link
The layout on the die is 3x5 so 15 cores makes sense.Niloc2792 - Tuesday, February 11, 2014 - link
But can it run Crysis?Conficio - Tuesday, February 11, 2014 - link
Typos?In the text it says Turbo Frequency = 3.8, while in the table it says 2.8.
Also in the text it says source - CPU-World, while in the Feed at the end it says Original Source PCWorld.
Ian Cutress - Wednesday, February 12, 2014 - link
Clicking through to the CPU-World page lists that CPU as not having a Turbo mode. Specifications are still unconfirmed at this point - as mentioned in the piece Intel often does a balancing act of cores/MHz and will never release a max-core model with max-frequency.My original source for the information was a PCWorld article, until I was forwarded the Intel information direct. I have used information from CPU-World as well, who have used a different source.
Ian
GlennAlanBerry - Wednesday, February 12, 2014 - link
I really expect that the E7-2800/4800/8800 v2 family (Ivy Bridge-EX) will have Turbo Boost. We just don't know what the specs will be from the various leaked sources. It is also supposed to have triple the memory density of Westmere-EX, plus PCI-E 3.0 support.omion - Tuesday, February 11, 2014 - link
Minor error correction (which is funny, given the line...)double-error-correction and triple-error-correction
should probably be:
double-error-correction and triple-error-detection
colonelclaw - Wednesday, February 12, 2014 - link
Very frustrating that my company (architectural visualisation) could absolutely make use of these chips in our render-farm, yet our margins mean we will never be able to afford them. Intel's pricing for anything with more than 6 cores is just depressing.I guess we're just unfortunate to be in a no-mans-land market segment that gets use from multi-core CPUs but doesn't generate enough revenue to feast at the high table.
FunBunny2 - Wednesday, February 12, 2014 - link
Well, I suspect Cray has some experience selling such a machine to NSA.Ktracho - Wednesday, February 12, 2014 - link
I suspect NSA is more interested in CPUs that can handle many more threads (on the order of thousands) than Intel CPUs or even SPARC CPUs can. Cray used to make such a CPU, and may still.FunBunny2 - Wednesday, February 12, 2014 - link
Near as I can find, it's been years (or a decade+) since Cray built a machine with a Cray cpu; it's been AMD and Intel. Lots o chips in the cabinet. The interconnects have been Cray's special sauce.fteoath64 - Thursday, February 13, 2014 - link
Yeah, possibly Nvidia Tesla gpu chips as well in their mix since crypto cracking needs plenty of fpu power. These 15core monsters with 4.5 billion transistors certainly are rather power efficient at 150w TDP.SarahKerrigan - Friday, February 14, 2014 - link
Cray - through its YarcData subsidiary - sells machines based on the custom ThreadStorm processor, which is a single-core 500MHz 3-issue VLIW design with 128-thread fine-grained multithreading, based on the 1990's Tera MTA machine. I assume that is what Ktracho was referring to.fteoath64 - Thursday, February 13, 2014 - link
The Opterons 16 cores are cheaper and consume way less power, although they are not as powerful per core as the Intel part. But it seems Intel is responding to the Opteron 16 core release as AMD's pricing is much more reasonable for servers.But I am more concerned with scale upwards in terms of core count. One can see Intel NEEDS a huge L3 cache to keep their cores fed while AMD uses larger L2 cache exclusive to each processor and small L3 cache. When AMD uses HSA for server chips it would be interesting to see who they put as non-cpu compute cores/ Maybe a giant quad-pumped fpu unit cluster that does 4ops per cycle and crunches DP fp32 faster than ever before.
psyq321 - Tuesday, February 18, 2014 - link
Ivy Bridge EX is simply in a different league compared to anything AMD has to offer today.So, no, Intel is not responding to AMD with Ivy Bridge EX, as AMD has close to zero market share in this segment and has nothing to offer.
Don't get me wrong, I'd very much love AMD to become competetive again, as Intel became a de-facto monopoly and basically segments the market with eFuses which control features worth four digit dollar numbers.
It is a sad state for an IT industry as a whole, but if we talk about performance and RAS features, Ivy Bridge EX has no competing product in AMD product offering - it is literally 2 or 3 generations ahead.
Einy0 - Thursday, February 13, 2014 - link
I foresee VMWare and Microsoft further revising their licensing strategies. One license of Enterprise Server for each physical CPU isn't going to cut it when that means 15 cores and 30 threads.CalaverasGrande - Friday, February 14, 2014 - link
Looking forward to upgrading my new Mac Pro from 6 to 15 core at some point down the road.Kinda scared of the TDP, and no turbo?
psyq321 - Tuesday, February 18, 2014 - link
It has turbo, and it goes to insane speeds (for a high-end server CPU), this is why 165W TDP is for :-) Article has a typo.And, unfortunately, you won't be able to upgrade your Mac Pro with Ivy Bridge EX, as the socket is different (EX is using LGA 2011-1, EP used in Mac Pro is LGA 2011). If you need more CPU performance than a single Intel Xeon 2697 v2 has to offer you have following options:
- Replace your Mac Pro with 2-socket or 4-socket PC workstation system, and put 2 Xeon 2697 v2 CPUs or 4 Xeon 4657L v2 (latter brings 48 cores / 96 threads to the game)
- Replace your Mac Pro with up to 8 socket Ivy Bridge EX system, but be prepared to pay a price of a small flat for such a system... But if money is no object :)
Unfortunately, both options are quite bigger than new Mac Pro, but I'd be much more comfortable with proper big server case for something that has several hundred Watts of TDP.
I'm using dual-socket ASUS Z9PE D8 WS with dual Xeon 2697 v2 for now, but for what I'm doing I'd be looking into 4P expansion...
henrykale - Wednesday, August 6, 2014 - link
We have been in affiliate marketing and multi-level marketing for awhile now and nothing has changed our business more quickly than Modulates and its way of promoting video testimonials.