You had me at "120 Threads per chip". So, how does that compare in compute capabilities with a 64 core/128 thread EPYC, or a similar thread/core count ARM-based server chip? The slides (from IBM) compare this newest Power chip to the predecessor (Power 9), but that doesn't put their newest one into perspective.
The article is "old" and some things might have changed a bit, but I assume the part about SMT still applies, and you can get the gist of it from this part:
"So we suspect that SMT-8 is only good for very low IPC, "throughput is everything" server applications. In most applications, SMT-8 might increase the latency of individual threads, while offering only a small increase in throughput performance. But the flexibility is enormous: the POWER8 can work with two heavy threads but can also transform itself into a lightweight thread machine gun."
That's how Marvell pitched their SMT8 ARM CPUs IIRC. They were supposedly fantastic in workloads where cores are frequently sitting around, twiddling their thumbs waiting for something from RAM, but could then switch gears if needed.
Marvel discontinued that line though...
I can't help but wonder if an asymmetric core architecture would be a better way to achieve that flexibility? It seems like modern big cores waste so much die area for relatively small gains.
Workload comparisons could be all over the map because POWER chips are pretty niche products. They’re used for pretty specific workloads. It might be difficult to find proper comparisons because of the RISC architecture they use. Would still be an interesting article though. Long gone are the days of Itanium chips, except... I’m sure there are still some in use. IBM kind of stands alone now in the segment.
No they are not niche. It will perform well in pretty much any workload you throw at it. It is approx 2.5x faster per core in the industry standard specint 2017 compared to the high end intel Xeon processor
These are THROUGHPUT machines not latency machines. That's great if what you need is throughput, but the single-threaded performance is mostly not exciting.
Your SPEC results are RATE results not SPECSpeed results. They do not show anything about how this machine would perform for latency tasks (answer, not great -- but that's not its target).
Speaking about power cores does not make a sense when P10 does have 8 threads per core while intel does have just 2. Your referenced machines do have 960 and 448 threads. IBM mines from it 1700 and intel/hp just 1570. So not that big difference. What is important to consider here, that intel is old, very old cooper lake and this core is already updated by ice lake (+19% IPC) and sapphire rapids (+15-20%). As Sapphire Rapids will be '22 material I guess IBM does have around half to whole year for dominating market. Then it will probably be weaker again...(well speaking about cpu throughput -- the machine/cpu/ram hierarchy etc. is engineering marvel of course no doubt about it...)
It is not just spec. new records for sap sd (2.7x per core than ice lake) red hat openshift (4.1x throughput per core) and others been announced.
Single thread performance isn’t too bad either. According to Rperf (power9) from IBM going from single thread to 8 threads get you 2.95x the throughput. If that factor is approx the same for power10, Single thread speed should be approx same as Xeon
You are telling a guy designing a truck that he has lousy acceleration. It may be true, but mainly shows that you don't know what's important in trucks, that you think a truck should be a big sports car.
IBM systems (Power and even more so Z) are primarily about MEMORY, IO, and RAS. They start with astonishingly performant (and expensive!) memory and IO systems, and basically fill in enough compute to fully exploit that memory and IO for the target jobs.
Comparing their compute to Intel or AMD or Apple compute is to miss the point -- instead compare their respective memory and IO systems. And if you don't especially have a desire for such a memory/IO system, if you have "normal" code for which caches work well; yay, congratulations, you are not in the target market.
SPEC does low-level benchmarks. It is not the same as testing the performance of enterprise software. These are great or better chips, for their purpose, but they aren’t meant to compete directly with Epyc or Xeon. They are meant to avoid being in the same target market by specializing in certain environments where they shine and they are designed with those workloads in mind, not others.
Also just saying, there’s a reason nobody is out there bragging about how well their POWER chip runs Crysis. The “any workload you throw at it” argument is only ever going to be true if you spend (waste) billions of dollars to build an entire software ecosystem around the hardware. Hence, why Steve Jobs, in talking about their hardware development quoted a famous engineer and said something like, “people really serious about software need to make their own hardware.”
near as I can tell, all Apple has ever done to "make their own hardware.” is take an existing ISA, and make the various bits and pieces wider and/or fatter. that doesn't require much imagination or skill.
Apple pioneered quite a bit, such as the software floppy controller and the Lisa system. Sometimes the quest to develop innovative things in-house ended in failure, such as the Twiggy minifloppy.
It was a good idea overall but the mistake was not using a hard protective shell like Sony did. Had Apple done that its floppies would have had more than twice the capacity of the Sony microfloppy.
Jobs is also the originator of the NeXT Cube, which was the first system of note to ditch floppies (long before the iMac), replacing them with magneto-optical drives.
Jobs was hardly a salesman of generic/vanilla products. If you want that you can look at Apple during the Performa era when he wasn’t around.
While not as much skill as a new ISA and a new design this still is an excellent approach. As time goes by Apple having the OS and hardware will show some serious merits. Some of it is soft things like the hardware and software folks physically speaking to each other.
It seems quite safe to guess that it will suck much more power per second than ARM or EPYC....
Power savings or exploiting any bit of idling to lower power consumption is very low priority on these, while the ARM chips are really concentrating on that and x86 is a compromise.
Just like the z/Arch chips you better make sure that these chips stay loaded to have them pay for the architecture luxury tax and the juice they take.
This is not really meant for compute but rather throughput. That's why IBM is targeting the cloud market. Even if you would compare the two, the IBM chip would only require 15 CPU licenses per chip while the EPYC would require 64..... aaaaaaand all your money is gone to licensing.
I'd say not much, unless someone sponsors licensing and set up costs. These things are SAP HANA / Oracle monsters. I'd assume that these are the workloads that should be tested upon.
No but I bet if they really wanted they could borrow some time as a sponsored event. I suspect one of the smaller cloud providers would be more than happy to do it.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
24 Comments
Back to Article
eastcoast_pete - Wednesday, September 8, 2021 - link
You had me at "120 Threads per chip". So, how does that compare in compute capabilities with a 64 core/128 thread EPYC, or a similar thread/core count ARM-based server chip? The slides (from IBM) compare this newest Power chip to the predecessor (Power 9), but that doesn't put their newest one into perspective.Wereweeb - Wednesday, September 8, 2021 - link
Anandtech has an article analyzing the Power8 architecture that goes deeper into it: https://www.anandtech.com/show/10435/assessing-ibm...The article is "old" and some things might have changed a bit, but I assume the part about SMT still applies, and you can get the gist of it from this part:
"So we suspect that SMT-8 is only good for very low IPC, "throughput is everything" server applications. In most applications, SMT-8 might increase the latency of individual threads, while offering only a small increase in throughput performance. But the flexibility is enormous: the POWER8 can work with two heavy threads but can also transform itself into a lightweight thread machine gun."
brucethemoose - Wednesday, September 8, 2021 - link
That's how Marvell pitched their SMT8 ARM CPUs IIRC. They were supposedly fantastic in workloads where cores are frequently sitting around, twiddling their thumbs waiting for something from RAM, but could then switch gears if needed.Marvel discontinued that line though...
I can't help but wonder if an asymmetric core architecture would be a better way to achieve that flexibility? It seems like modern big cores waste so much die area for relatively small gains.
RedGreenBlue - Wednesday, September 8, 2021 - link
Workload comparisons could be all over the map because POWER chips are pretty niche products. They’re used for pretty specific workloads. It might be difficult to find proper comparisons because of the RISC architecture they use. Would still be an interesting article though. Long gone are the days of Itanium chips, except... I’m sure there are still some in use. IBM kind of stands alone now in the segment.thunng8 - Wednesday, September 8, 2021 - link
No they are not niche. It will perform well in pretty much any workload you throw at it. It is approx 2.5x faster per core in the industry standard specint 2017 compared to the high end intel Xeon processor120 Power10 core scores 2170
https://spec.org/cpu2017/results/res2021q3/cpu2017...
Compared to 224 core Xeon cores scores 1620
https://www.spec.org/cpu2017/results/res2021q1/cpu...
name99 - Wednesday, September 8, 2021 - link
These are THROUGHPUT machines not latency machines.That's great if what you need is throughput, but the single-threaded performance is mostly not exciting.
Your SPEC results are RATE results not SPECSpeed results. They do not show anything about how this machine would perform for latency tasks (answer, not great -- but that's not its target).
https://www.spec.org/cpu2017/Docs/overview.html#Q1...
kgardas - Wednesday, September 8, 2021 - link
Speaking about power cores does not make a sense when P10 does have 8 threads per core while intel does have just 2. Your referenced machines do have 960 and 448 threads. IBM mines from it 1700 and intel/hp just 1570. So not that big difference. What is important to consider here, that intel is old, very old cooper lake and this core is already updated by ice lake (+19% IPC) and sapphire rapids (+15-20%). As Sapphire Rapids will be '22 material I guess IBM does have around half to whole year for dominating market. Then it will probably be weaker again...(well speaking about cpu throughput -- the machine/cpu/ram hierarchy etc. is engineering marvel of course no doubt about it...)thunng8 - Wednesday, September 8, 2021 - link
Ice lake does not improve the per core throughput.https://spec.org/cpu2017/results/res2021q3/cpu2017...
I have my doubts about Sapphire rapids too.
It is not just spec. new records for sap sd (2.7x per core than ice lake) red hat openshift (4.1x throughput per core) and others been announced.
Single thread performance isn’t too bad either. According to Rperf (power9) from IBM going from single thread to 8 threads get you 2.95x the throughput. If that factor is approx the same for power10, Single thread speed should be approx same as Xeon
name99 - Wednesday, September 8, 2021 - link
You are telling a guy designing a truck that he has lousy acceleration.It may be true, but mainly shows that you don't know what's important in trucks, that you think a truck should be a big sports car.
IBM systems (Power and even more so Z) are primarily about MEMORY, IO, and RAS. They start with astonishingly performant (and expensive!) memory and IO systems, and basically fill in enough compute to fully exploit that memory and IO for the target jobs.
Comparing their compute to Intel or AMD or Apple compute is to miss the point -- instead compare their respective memory and IO systems. And if you don't especially have a desire for such a memory/IO system, if you have "normal" code for which caches work well; yay, congratulations, you are not in the target market.
FreckledTrout - Monday, September 13, 2021 - link
That was very on point.kgardas - Wednesday, September 8, 2021 - link
And btw, here is vSMP Epyc: 8 chips: 2250 -- result nearly 2 years old by now.https://spec.org/cpu2017/results/res2020q1/cpu2017...
RedGreenBlue - Wednesday, September 8, 2021 - link
SPEC does low-level benchmarks. It is not the same as testing the performance of enterprise software. These are great or better chips, for their purpose, but they aren’t meant to compete directly with Epyc or Xeon. They are meant to avoid being in the same target market by specializing in certain environments where they shine and they are designed with those workloads in mind, not others.RedGreenBlue - Thursday, September 9, 2021 - link
Also just saying, there’s a reason nobody is out there bragging about how well their POWER chip runs Crysis. The “any workload you throw at it” argument is only ever going to be true if you spend (waste) billions of dollars to build an entire software ecosystem around the hardware. Hence, why Steve Jobs, in talking about their hardware development quoted a famous engineer and said something like, “people really serious about software need to make their own hardware.”FunBunny2 - Friday, September 10, 2021 - link
" Steve Jobs"near as I can tell, all Apple has ever done to "make their own hardware.” is take an existing ISA, and make the various bits and pieces wider and/or fatter. that doesn't require much imagination or skill.
nubie - Sunday, September 12, 2021 - link
Better let Cosworth and Prodrive know all they do is soup up Fords and Subarus /s. If it was so easy why doesn't everyone do it?Oxford Guy - Sunday, September 12, 2021 - link
Apple pioneered quite a bit, such as the software floppy controller and the Lisa system. Sometimes the quest to develop innovative things in-house ended in failure, such as the Twiggy minifloppy.It was a good idea overall but the mistake was not using a hard protective shell like Sony did. Had Apple done that its floppies would have had more than twice the capacity of the Sony microfloppy.
Jobs is also the originator of the NeXT Cube, which was the first system of note to ditch floppies (long before the iMac), replacing them with magneto-optical drives.
Jobs was hardly a salesman of generic/vanilla products. If you want that you can look at Apple during the Performa era when he wasn’t around.
FreckledTrout - Monday, September 13, 2021 - link
While not as much skill as a new ISA and a new design this still is an excellent approach. As time goes by Apple having the OS and hardware will show some serious merits. Some of it is soft things like the hardware and software folks physically speaking to each other.abufrejoval - Wednesday, September 8, 2021 - link
It seems quite safe to guess that it will suck much more power per second than ARM or EPYC....Power savings or exploiting any bit of idling to lower power consumption is very low priority on these, while the ARM chips are really concentrating on that and x86 is a compromise.
Just like the z/Arch chips you better make sure that these chips stay loaded to have them pay for the architecture luxury tax and the juice they take.
milli - Wednesday, September 8, 2021 - link
This is not really meant for compute but rather throughput. That's why IBM is targeting the cloud market. Even if you would compare the two, the IBM chip would only require 15 CPU licenses per chip while the EPYC would require 64..... aaaaaaand all your money is gone to licensing.Dolda2000 - Thursday, September 9, 2021 - link
Is there any chance that Anandtech might get one in for testing?Zibi - Thursday, September 9, 2021 - link
I'd say not much, unless someone sponsors licensing and set up costs.These things are SAP HANA / Oracle monsters.
I'd assume that these are the workloads that should be tested upon.
FunBunny2 - Friday, September 10, 2021 - link
"Is there any chance that Anandtech might get one in for testing?"they won't even buy a $100 motherboard, and you wonder why they don't get a data center quality machine that goes for, at least, $1,000,000?
FreckledTrout - Monday, September 13, 2021 - link
No but I bet if they really wanted they could borrow some time as a sponsored event. I suspect one of the smaller cloud providers would be more than happy to do it.MedicineMarijuana - Sunday, April 10, 2022 - link
Yeah!! Thank you so much for sharing this content. I've been looking for weeks