In comparison I really AMDs system with Pinnacle Ridge of "keep the power consumption in check and let the CPU work as fast as it can". This nicely avoids those seemingly random steps where the CPU drops by a few 100 MHz due to loading one more core or due to a single AVX512 instruction.
You do realize that AVX 512 can be turn off - but if you need it - it probably going significantly faster than AVX 2 (256) probably on a factor of more than twice.
This is correct, but the concern is more about people misusing the AVX instructions. If I throw in a couple of AVX intrinsics into my code without really getting a big performance uplift, then the frequency will drop and all of a user's applications running on it will slow down. I'm not really sure how realistic this fear is, though; I would hope that the set of people who know how to use AVX intrinsics is almost entirely contained within the set of people who know how to properly optimize code for performance.
It shouldn't be too bad anymore with Skylake-SP, since the cores have individual turbo limits. So, a couple random avx-512 instructions thrown in somewhere should only limit one core to the avx-512 limit, whereas all others can still reach the higher avx2 or base turbo limit. Unless of course the complete workload is doing a couple avx-512 instructions every once in a while on all cores...
If you worry about slowing it and you don't have AVX 512 application then turn it off. AVX 512 is primary for workstation and servers currently - but we could see in future especially once we see it wide spread used in future consumer CPU. I am very interesting to see what applications use it and how much improvement - I believe it will be significant amount performance - even more than twice.
People including myself had similar worries when CPU's went from 32 bit to 64 bit - and now AVX is transforming between 256 bit or dual 128 bit in AMD's case to 512 bit.
Dude, stop telling people to turn it off. You should really have some empirical evidence that turning it off *ever* helps, in any case, before advising this.
These instructions have already found their way into common string and math library functions. They're also utilized by image and video codecs, such as libjpeg-turbo (which Firefox uses), not to mention popular image processing programs.
This is good news - you can benefit from them without even having to use specialized scientific or rendering programs.
I'm pretty sure a single AVX2 or AVX-512 instruction won't suddenly throw the CPU into the corresponding frequency modes. They're intended for AVX2 or AVX-512-heavy code.
I would love to see a test with 18 core part with a test application that can run in both AVX-512 and AVX2 mode and compare the differences of performance.
1, 512 AVX 2. AVX 2 with 512 disable 3. AVX 2 with 512 enable
It would also be interesting for application to find out with lower core version how many cores with AVX 2 it takes to be same performance
Yes it is fully understandable that CPU with AVX 512 on will running at lower frequency, but these Workstation CPU and there could be applications that take advantage of AVX 512 and I curious about the impact..
"For a number of users, the key metrics here are the all-core turbos, with the 18-core part having an all-core turbo of 3.2 GHz. Interestingly the W-2155 and W-2145 sits well here: for any code that can't reliably go beyond 12-14 threads"
This is interesting statement - if multiply cores by all core frequency - even though the 18 core part is slower than 10 core - but if you need more thread 18 core will beat 10 core part even at higher frequency.
One thing interesting about Intel cores - with lower amount of core, will run faster other cores are not active.
But in general idea of recent ( last year or two ) increase in number of cores and actually application usage - what kind of applications use large amount of cores - and often does high amount of cores get used on system
>This is interesting statement - if multiply cores by all core frequency - even though the 18 core part is slower than 10 core - but if you need more thread 18 core will beat 10 core part even at higher frequency.
What I find weird is that the higher core counts tend to have lower turbo's at similar active cores than the lower core count. So the 18 core is much slower than the 6 core if only 5 or 6 cores are active - 4 ghz vs 4.4! And if you load 10 cores, better buy the 10 core, because the 18 core would be running at only 2.8 ghz instead of the 3.3 of the 10 core!
That is very sad and frustrating - so the 18 core is only really faster if you actually use all of them, at a lower activity it is slower than smaller core counts. Why on earth is that? The other cores are off - why slow the ones that are on?
Probably because it's designed for users who care about energy-efficiency and run apps that tend to have lots of concurrency.
Also, don't forget that cache coherency isn't free. Just because you're not running anything on those other cores doesn't mean it's as if they don't exist.
Finally, more cores means more mesh nodes between you and the memory controller. So, that's going to burn some power that could otherwise be used by a smaller chip to run at a higher speed.
As the core count and last level cache increases, the base speed drops drastically. A big part of that is likely to be about staying within the TDP budget. 140W divided by 18 cores is less than 8W per core. While they can run for a time at 2.9Ghz eventually they have to throttle to 2.3Ghz.
Indeed, six cores might be more than 120W can handle if you look at the W-2133, if they're all running at 3.9/3.8Ghz. The W-2145 can only guarantee to run eight at 3.7Ghz.
"Off" isn't always _entirely_ off. There is still likely to be leakage, there may still be shared cache attached to the cores that needs to be accessible (high core count also tends to mean more cache), etc. The rings connecting the cores - or for newer HCC processors, the mesh - still needs to run. If nothing else it probably *was* on and will have heated the surrounding area up.
[That additional cache is one of the things which can makes the CPU faster in practice despite the fact that it might have to run at a lower clock speed. Mhz isn't everything.]
In an optimal world the CPU might know that *some* cores can go faster (e.g. core 5 can hit 4Ghz, but not cores 0-4 or 6-7) and be able to communicate this to the system, which could schedule work appropriately, but I don't think the current representation handles this.
In fact, nowadays CPU speeds can very so fast that the OS just sends hints about how power-efficient you want the CPU to be on each core and it sets its aggressiveness about ramping up and the maximum boost accordingly.
"Small correction: Xeon-W is Intel's replacement for their E5 line, Xeon-E is their replacement for the E3 lineup."
I don't believe that true - I believe Xeon-W are only single socket E5 supports more than one socket - like the following motherboard that supports 4 CPU's
Just depends on how you look at it i guess, If you want to say amount of supported CPU's supported are determining their lines then yes, you would need to go to Xeon-SP for more than one.
Their tiers went from Xeon E3, E5, E7 to E, W, SP. Xeon-W being the same socket as their HEDT lineup, like E5 was would also lend more credence to it.
Xeons do have increase reliability over HEDT line - at lease when use my Dual Xeon - designed to run 24-7 and also for longer time - it been 10 years and only recently I stop using it. Dual 5160 primary because of incompatible with on board audio on Supermicro motherboard
People put E5-2687W chips into single socket systems because they were workstation focused. Also the E5-2640 has been popular for cheap 8-core Intel systems
Lots of people put the E5-2600 series into single sockets. You don't always have to use them in dual socket configs. It was a situation of what was available in the market. Also, the E5-2687W models were found in single sockets.
Perfect thank you for this, it is now much more interesting for those running CAE software such as Stress Simulation (FEA) and Fluid flow (CFD) softwares on workstations to look at high core single CPU systems vs Dual CPU solutions where you typically see some performance hit for tasks due to them being split across separate CPUs, obviously the cost factor is also significant. Interestingly Both Dell and HP have started to offer the Skylake X series i9 CPUs in their single socket workstations also for those . Part of this may be that Microsoft now charge an additional fee for any system running Xeon (Windows Pro for Workstations) and even more for those with more than 4 cores. However the additional cost of ECC memory is more of a factor and this is one of the key advantages for Xeon particularly for workstation applications with long run times that value the stability so i9 would only really be for those whose budget can't stretch to the Xeon
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
26 Comments
Back to Article
MrSpadge - Tuesday, August 7, 2018 - link
In comparison I really AMDs system with Pinnacle Ridge of "keep the power consumption in check and let the CPU work as fast as it can". This nicely avoids those seemingly random steps where the CPU drops by a few 100 MHz due to loading one more core or due to a single AVX512 instruction.HStewart - Tuesday, August 7, 2018 - link
You do realize that AVX 512 can be turn off - but if you need it - it probably going significantly faster than AVX 2 (256) probably on a factor of more than twice.TheFire - Tuesday, August 7, 2018 - link
This is correct, but the concern is more about people misusing the AVX instructions. If I throw in a couple of AVX intrinsics into my code without really getting a big performance uplift, then the frequency will drop and all of a user's applications running on it will slow down. I'm not really sure how realistic this fear is, though; I would hope that the set of people who know how to use AVX intrinsics is almost entirely contained within the set of people who know how to properly optimize code for performance.mczak - Tuesday, August 7, 2018 - link
It shouldn't be too bad anymore with Skylake-SP, since the cores have individual turbo limits. So, a couple random avx-512 instructions thrown in somewhere should only limit one core to the avx-512 limit, whereas all others can still reach the higher avx2 or base turbo limit. Unless of course the complete workload is doing a couple avx-512 instructions every once in a while on all cores...HStewart - Tuesday, August 7, 2018 - link
If you worry about slowing it and you don't have AVX 512 application then turn it off. AVX 512 is primary for workstation and servers currently - but we could see in future especially once we see it wide spread used in future consumer CPU. I am very interesting to see what applications use it and how much improvement - I believe it will be significant amount performance - even more than twice.People including myself had similar worries when CPU's went from 32 bit to 64 bit - and now AVX is transforming between 256 bit or dual 128 bit in AMD's case to 512 bit.
mode_13h - Tuesday, August 7, 2018 - link
Dude, stop telling people to turn it off. You should really have some empirical evidence that turning it off *ever* helps, in any case, before advising this.mode_13h - Tuesday, August 7, 2018 - link
These instructions have already found their way into common string and math library functions. They're also utilized by image and video codecs, such as libjpeg-turbo (which Firefox uses), not to mention popular image processing programs.This is good news - you can benefit from them without even having to use specialized scientific or rendering programs.
mode_13h - Tuesday, August 7, 2018 - link
I'm pretty sure a single AVX2 or AVX-512 instruction won't suddenly throw the CPU into the corresponding frequency modes. They're intended for AVX2 or AVX-512-heavy code.HStewart - Tuesday, August 7, 2018 - link
I would love to see a test with 18 core part with a test application that can run in both AVX-512 and AVX2 mode and compare the differences of performance.1, 512 AVX
2. AVX 2 with 512 disable
3. AVX 2 with 512 enable
It would also be interesting for application to find out with lower core version how many cores with AVX 2 it takes to be same performance
Yes it is fully understandable that CPU with AVX 512 on will running at lower frequency, but these Workstation CPU and there could be applications that take advantage of AVX 512 and I curious about the impact..
HStewart - Tuesday, August 7, 2018 - link
Another good test is include AMD AVX 2 in match - last I heard is their AVX 2 system is 2 128bits instead of 256 bit.HStewart - Tuesday, August 7, 2018 - link
"For a number of users, the key metrics here are the all-core turbos, with the 18-core part having an all-core turbo of 3.2 GHz. Interestingly the W-2155 and W-2145 sits well here: for any code that can't reliably go beyond 12-14 threads"This is interesting statement - if multiply cores by all core frequency - even though the 18 core part is slower than 10 core - but if you need more thread 18 core will beat 10 core part even at higher frequency.
One thing interesting about Intel cores - with lower amount of core, will run faster other cores are not active.
But in general idea of recent ( last year or two ) increase in number of cores and actually application usage - what kind of applications use large amount of cores - and often does high amount of cores get used on system
jospoortvliet - Tuesday, August 7, 2018 - link
>This is interesting statement - if multiply cores by all core frequency - even though the 18 core part is slower than 10 core - but if you need more thread 18 core will beat 10 core part even at higher frequency.
What I find weird is that the higher core counts tend to have lower turbo's at similar active cores than the lower core count. So the 18 core is much slower than the 6 core if only 5 or 6 cores are active - 4 ghz vs 4.4! And if you load 10 cores, better buy the 10 core, because the 18 core would be running at only 2.8 ghz instead of the 3.3 of the 10 core!
That is very sad and frustrating - so the 18 core is only really faster if you actually use all of them, at a lower activity it is slower than smaller core counts. Why on earth is that? The other cores are off - why slow the ones that are on?
mode_13h - Tuesday, August 7, 2018 - link
Probably because it's designed for users who care about energy-efficiency and run apps that tend to have lots of concurrency.Also, don't forget that cache coherency isn't free. Just because you're not running anything on those other cores doesn't mean it's as if they don't exist.
Finally, more cores means more mesh nodes between you and the memory controller. So, that's going to burn some power that could otherwise be used by a smaller chip to run at a higher speed.
GreenReaper - Wednesday, August 8, 2018 - link
As the core count and last level cache increases, the base speed drops drastically. A big part of that is likely to be about staying within the TDP budget. 140W divided by 18 cores is less than 8W per core. While they can run for a time at 2.9Ghz eventually they have to throttle to 2.3Ghz.Indeed, six cores might be more than 120W can handle if you look at the W-2133, if they're all running at 3.9/3.8Ghz. The W-2145 can only guarantee to run eight at 3.7Ghz.
"Off" isn't always _entirely_ off. There is still likely to be leakage, there may still be shared cache attached to the cores that needs to be accessible (high core count also tends to mean more cache), etc. The rings connecting the cores - or for newer HCC processors, the mesh - still needs to run. If nothing else it probably *was* on and will have heated the surrounding area up.
[That additional cache is one of the things which can makes the CPU faster in practice despite the fact that it might have to run at a lower clock speed. Mhz isn't everything.]
In an optimal world the CPU might know that *some* cores can go faster (e.g. core 5 can hit 4Ghz, but not cores 0-4 or 6-7) and be able to communicate this to the system, which could schedule work appropriately, but I don't think the current representation handles this.
In fact, nowadays CPU speeds can very so fast that the OS just sends hints about how power-efficient you want the CPU to be on each core and it sets its aggressiveness about ramping up and the maximum boost accordingly.
diehardmacfan - Tuesday, August 7, 2018 - link
Small correction: Xeon-W is Intel's replacement for their E5 line, Xeon-E is their replacement for the E3 lineup.HStewart - Tuesday, August 7, 2018 - link
"Small correction: Xeon-W is Intel's replacement for their E5 line, Xeon-E is their replacement for the E3 lineup."I don't believe that true - I believe Xeon-W are only single socket E5 supports more than one socket - like the following motherboard that supports 4 CPU's
https://www.supermicro.com/products/motherboard/Xe...
Scalar able line is probably more in line E5 replacements
Xeon-W is a single cpu
https://www.supermicro.com/products/motherboard/Xe...
and from the source
https://ark.intel.com/products/126707/Intel-Xeon-W...
diehardmacfan - Tuesday, August 7, 2018 - link
Just depends on how you look at it i guess, If you want to say amount of supported CPU's supported are determining their lines then yes, you would need to go to Xeon-SP for more than one.Their tiers went from Xeon E3, E5, E7 to E, W, SP. Xeon-W being the same socket as their HEDT lineup, like E5 was would also lend more credence to it.
HStewart - Tuesday, August 7, 2018 - link
Xeons do have increase reliability over HEDT line - at lease when use my Dual Xeon - designed to run 24-7 and also for longer time - it been 10 years and only recently I stop using it. Dual 5160 primary because of incompatible with on board audio on Supermicro motherboardIan Cutress - Tuesday, August 7, 2018 - link
People put E5-2687W chips into single socket systems because they were workstation focused. Also the E5-2640 has been popular for cheap 8-core Intel systemsmode_13h - Tuesday, August 7, 2018 - link
@dihardmacfan is right about this, with one footnote.The W-series replaces the E5-1xxx chips. It's the E5-2xxx and above that support dual-CPU.
kgardas - Tuesday, August 7, 2018 - link
E3-1600/E3-2600 should be E5-1600/E5-2600.HStewart - Tuesday, August 7, 2018 - link
"E3-1600/E3-2600 should be E5-1600/E5-2600."I could be wrong, but I believe than Xeon W are only single socket configuration and E5-26xx series are dual socket
Ian Cutress - Tuesday, August 7, 2018 - link
Lots of people put the E5-2600 series into single sockets. You don't always have to use them in dual socket configs. It was a situation of what was available in the market. Also, the E5-2687W models were found in single sockets.HStewart - Wednesday, August 8, 2018 - link
Is this a special motherboard of special single socket version of - I believe socket layout is different between the two.Ian Cutress - Tuesday, August 7, 2018 - link
Yup, changed.Alsw - Wednesday, August 8, 2018 - link
Perfect thank you for this, it is now much more interesting for those running CAE software such as Stress Simulation (FEA) and Fluid flow (CFD) softwares on workstations to look at high core single CPU systems vs Dual CPU solutions where you typically see some performance hit for tasks due to them being split across separate CPUs, obviously the cost factor is also significant. Interestingly Both Dell and HP have started to offer the Skylake X series i9 CPUs in their single socket workstations also for those . Part of this may be that Microsoft now charge an additional fee for any system running Xeon (Windows Pro for Workstations) and even more for those with more than 4 cores.However the additional cost of ECC memory is more of a factor and this is one of the key advantages for Xeon particularly for workstation applications with long run times that value the stability so i9 would only really be for those whose budget can't stretch to the Xeon