Only an amateur here but I did read some of your previous articles on this. I thought the big advantage of POWER was it was encouraging 3rd parties to bring ASIC's into their server space, while Intel wan't to own the whole hardware setup? Surely ASIC's is where the whole performance per watt game ends - see Bitcoin mining.
The GT75 is a weird system and I've never been clear on why it exists. The Supermicro 1U (S821LC) at least gets you two CPU's for cheap in 1U... I'm not overly surprised by poor performance here. This is a 130W CPU, not a 170 as Anandtech says, running at low clocks. 170 is only the turbo power. I'm also guessing it's loud as hell with those fans.
That being said - Anandtech also has a history of interestingly Intel-centric interpretations of results. Remember when Intel's uarch was "a lot more sophisticated" than P8 according to Anandtech based purely on ST 7zip results, and then when ThunderX did well on 7zip, Anandtech said 7z was a meaningless benchmark irrelevant to server workloads? I've also noticed that since gcc's Power output improved (~4000 MIPS single-thread compression with gcc 6.2 on 3.3GHz P8), 7zip has conveniently vanished from Anandtech's Power reviews. The idea of drawing conclusions about microarchitectures based on a balance of a range of tests seems alien to them. Looked at through that lens, Power8 (well, in systems not named GT75 :P) looks decent at some things, less decent at others, but overall pretty good. Database perf/W, per Anandtech's testing, is better than Haswell. This isn't a bad place for P8 to be, considering it's been shipping since early 2014 and is on the verge of replacement.
Tl;dr - GT75 is a turd but Anandtech sees what they want to see. I have to wonder if Ryzen is going to be reviewed the same way.
7zip is indeed a bad indicator for server performance, but at the time, we only had access to a virtual machine on top of a POWER8 with 2 GB of RAM and we thought it might give us a first glimpse of what the P8 was capable off.
As time progressed, we understood that is mostly a TLB/latency sensitive benchmark. So it has no place in a server oriented article, it is mostly interesting to discuss micro arch details.
Anandtech has different editors focusing on different areas - Johan on Server, I'm on CPU, Matt on mobile, etc. Feel free to reach out via email if you have suggestions for us.
This isn't the best showing of POWER8. Those chips need robust cooling to keep their clock speeds up to be competitive. The 1U form factor does place a lot of constraints on the design.
As for the Tyan system itself, I'm kinda of surprised that they went with the Centaur buffer that uses DDR3 as IBM has reportedly been shipping buffers that support DDR4 for nearly half a year now. That'd lower power consumption greatly, though likely not enough to be competitive with Intel on a system level performance/watt metric. Moving to DDR4 would give the system a massive increase in memory capacity as 128 GB LR DDR4 DIMMs are shipping with 256 GB LR DIMMs on the horizon. Using 256 GB DIMMs, a system like this would support 8 TB which is a lot for a 1U server.
Considering the internals of the system I'm not surprised but a secondary PSU option would have been nice, even if it was external. PSU redundancy is remarkably common, even for 1U systems.
I'd also be worried about IO performance on this system with all the networking and SATA ports hanging off of a single PCIe uplink to the CPU.
I do agree with the conclusion that POWER9 looks to be very promising. The nice thing is that IBM is going to be offering both the SMT4 and SMT8 cores in both types of sockets (essentially four different dies!).
Form factor wont change the facts - it is a power, it is too expensive, and it is too slow. Power sounds great on paper, but it doesn't seem to deliver in practice.
Form factor does matter when it changes the cooling system which in turn can invoke thermal throttling.
Even on the x86 side of things, half wide 2U servers are popular as they can use larger fans despite having the same effective density as a full width 1U server. The increased air flow into a chassis is great for keepings cools and helps maintain high turbo levels for performance.
"As important as performance per watt is, several markets – HPC, Analytics, and AI chief among them – consider performance the most important metric. Wattage has to be kept under control, but that is it."
In HPC specifically, power consumption is a major issue. This was the entire root of the success of the Blue Gene line back in the day, and why NEC is shifting its supercomputing CPUs to progressively more efficient cores instead of higher-performance cores now (SX-9: 102.4GF/core; SX-ACE: 64GF/core.) . HPC is sensitive to running cost, and power dissipation is a critical factor in that.
In a system with 2-4 GPUs, 512 GB of RAM, the TDP of the CPU is not a dealbreaker. I can agree that some HPC markets are more sensitive to perf/watt; but I have seen a lot of examples where raw performance per dollar was just as important.
POWER8 TDP is 45W-102W higher per socket than the highest spec Xeon E5. That's 90W-204W higher per node where each node consumes 1500W-2000W, or 6-10% total on a site with a multi-million dollar power bill that went to great lengths to bring down the PUE by a similar amount. So for anyone to pick POWER8 it has to do better on energy to solution through its unique features, or be considerably cheaper (ha!). POWER8's advantage is NVLink, but TSUBAME3 going with Intel+PLX switches on top of NVLink shows that it's not that big of a deal. Anyway, the efficiency requirements on the CORAL procurements are pretty strict so scale-out POWER9+Volta will have to shed a lot of weight.
Guys, I know shit ton of stuff about a server Johan listed above. He has a point when he says Power consumption is only so much important. In short, when you combine all aspects to TCO model: POWER8 server delivers most optimal TCO value We consider all the following into our TCO model a) Cost of ownership of the server b) Warranty (Lesser than conventional server, different model of operations) c) What it delivers (How many independent threads (SMT8 on POWER8 remember ? 192 hardware threads), how much Memory Bandwidth (230 GBPs), how much total memory capacity in 1 server ( 1 TB with 32 GB) d) For a public cloud use-case, how many VMs (with x HW threads and x memory cap / bw ) can you deliver on 1 POWER8 server compared to other servers in fleet today ? Based on above stats, a lot . e) Data center floor lease cost in DC ( 24 of these servers in 1 Rack, much denser. Average the lease over age of server: 3 years ). This includes all DC services like aggers, connectivity and such. f) Cost per KWH in the specific DC ( 1 Rack has nominal power 750W)
All this combined POWER has good TCO. Its a massively parallel server, what where major advantage comes from. Choose your workload wisely. That's why companies continue to work on it.
I am talking about all this without actually combining with CAPI over PCIe and openCAPI. Get it ? POWER is going no where.
Personally I think OpenPOWER is a viable competitor, but in the right niches (In memory databases, GPU accelerated + NVlink HPC). Just don't put that MHz beast in a far too small 1U cage. :-)
Articles like these make me wonder if some of these companies using IBM eServer iSeries(AS/400) as mid-level servers are wasting their money. I was always under the impression that Power was suppose to be tuned for database heavy workloads and hence have a massive advantage in doing so. I know the iSeries servers run an OS with DB2 built-in and tuned specifically for it but how much of an advantage does that really equate to?
As ISAs becoming more and more relevant in the post-Moore's law world, where you can't solve a computational problem just by throwing ever more transistors at it, I wonder if this opens up opportunity for POWER to carve out niches left out by Intel's more fixed and general purpose approach.
At the same time, POWER will have to contend with a nascent but rising and truly open ISA in RISC-V, where companies can simply implement the subsets of the ISA that they need. The next decade in processor architecture is going to be interesting to watch.
-- As ISAs becoming more and more relevant in the post-Moore's law world, where you can't solve a computational problem just by throwing ever more transistors at it
given that ISA has been reduced to z, ARM, and X86 not counting Power, of course. and ARM might not really qualify as equivalent. for those ancient enough, or well read enough, know that up to and during the "IBM and 7 Dwarves" era, ISA and even base architecture, made a varied ecosystem. not so much anymore. and I doubt anyone will invent a more efficient adder or multiplier or any other subunit of the real CPU. just look at the screen shots of chips over the last couple of decades: the real CPU area of a chip is nearly disappeared. in fact, much (if not most) of the transistor budget for some years has been used for caching, not ISA in hardware. so called micro-architecture is just a RISC CPU, and the rest of the chip is those caches and ever more complicated "decoder". that and integrating what had previously been other-chip functions. IOW, approaching monopoly control of compute.
I expect the next decade to be more of the same: more cache and more off-chip function brought on chip. actual CPU ISA, not so much.
Not all AnandTech articles live up to the standards set in the days past, but your articles continue your own excellent standards.
Very much looking forward to POWER 9 chips. Hopefully they have also done the work to port the toolchain and important software already to it this time and we won't have to wait another 12 months after release to be able to compile normal Linux programs on it.
Also, 12 fans running at 15,000 rpm in a 1U? What did that sound like?! Wow!
Thx Aryonoco. Not all of those 12 fans were running at top speed, but imagine a Jumbo jet taking off sound. It clearly show how hard it is to cool IBM's best in a 1U: you have to limit the clockspeed to about 2/3 of what it is capable off and double the number of fans.
For folks who are saying that POWER only looks good on paper. NOT true.
I know shit ton of stuff about one of the server Johan listed above. He has a point when he says Power consumption is only so much important. In short, when you combine all aspects to TCO model: POWER8 server delivers most optimal TCO value We consider all the following into our TCO model a) Cost of ownership of the server b) Warranty (Lesser than conventional server, different model of operations) c) What it delivers (How many independent threads (SMT8 on POWER8 remember ? 192 hardware threads), how much Memory Bandwidth (230 GBPs), how much total memory capacity in 1 server ( 1 TB with 32 GB) d) For a public cloud use-case, how many VMs (with x HW threads and x memory cap / bw ) can you deliver on 1 POWER8 server compared to other servers in fleet today ? Based on above stats, a lot . e) Data center floor lease cost in DC ( 24 of these servers in 1 Rack, much denser. Average the lease over age of server: 3 years ). This includes all DC services like aggers, connectivity and such. f) Cost per KWH in the specific DC ( 1 Rack has nominal power 750W)
All this combined POWER has good TCO. Its a massively parallel server, what where major advantage comes from. Choose your workload wisely. That's why companies continue to work on it.
I am talking about all this without actually combining with CAPI over PCIe and openCAPI. With POWER9 all this is getting even better. Get it ? POWER is going no where.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
28 Comments
Back to Article
Amandtec - Friday, February 24, 2017 - link
Only an amateur here but I did read some of your previous articles on this. I thought the big advantage of POWER was it was encouraging 3rd parties to bring ASIC's into their server space, while Intel wan't to own the whole hardware setup? Surely ASIC's is where the whole performance per watt game ends - see Bitcoin mining.ddriver - Friday, February 24, 2017 - link
There is nothing preventing you from building your own accelerators and hooking them to a free PCIE slot.JohanAnandtech - Friday, February 24, 2017 - link
True. But you have only one PCIe 8x and AFAIK it is not an OpenCAPI one, nor an NVME capable.JohanAnandtech - Friday, February 24, 2017 - link
I meant NVLink, not NVME :-)SarahKerrigan - Friday, February 24, 2017 - link
The GT75 is a weird system and I've never been clear on why it exists. The Supermicro 1U (S821LC) at least gets you two CPU's for cheap in 1U... I'm not overly surprised by poor performance here. This is a 130W CPU, not a 170 as Anandtech says, running at low clocks. 170 is only the turbo power. I'm also guessing it's loud as hell with those fans.That being said - Anandtech also has a history of interestingly Intel-centric interpretations of results. Remember when Intel's uarch was "a lot more sophisticated" than P8 according to Anandtech based purely on ST 7zip results, and then when ThunderX did well on 7zip, Anandtech said 7z was a meaningless benchmark irrelevant to server workloads? I've also noticed that since gcc's Power output improved (~4000 MIPS single-thread compression with gcc 6.2 on 3.3GHz P8), 7zip has conveniently vanished from Anandtech's Power reviews. The idea of drawing conclusions about microarchitectures based on a balance of a range of tests seems alien to them. Looked at through that lens, Power8 (well, in systems not named GT75 :P) looks decent at some things, less decent at others, but overall pretty good. Database perf/W, per Anandtech's testing, is better than Haswell. This isn't a bad place for P8 to be, considering it's been shipping since early 2014 and is on the verge of replacement.
Tl;dr - GT75 is a turd but Anandtech sees what they want to see. I have to wonder if Ryzen is going to be reviewed the same way.
JohanAnandtech - Friday, February 24, 2017 - link
7zip is indeed a bad indicator for server performance, but at the time, we only had access to a virtual machine on top of a POWER8 with 2 GB of RAM and we thought it might give us a first glimpse of what the P8 was capable off.As time progressed, we understood that is mostly a TLB/latency sensitive benchmark. So it has no place in a server oriented article, it is mostly interesting to discuss micro arch details.
Ian Cutress - Friday, February 24, 2017 - link
Anandtech has different editors focusing on different areas - Johan on Server, I'm on CPU, Matt on mobile, etc. Feel free to reach out via email if you have suggestions for us.Kevin G - Friday, February 24, 2017 - link
This isn't the best showing of POWER8. Those chips need robust cooling to keep their clock speeds up to be competitive. The 1U form factor does place a lot of constraints on the design.As for the Tyan system itself, I'm kinda of surprised that they went with the Centaur buffer that uses DDR3 as IBM has reportedly been shipping buffers that support DDR4 for nearly half a year now. That'd lower power consumption greatly, though likely not enough to be competitive with Intel on a system level performance/watt metric. Moving to DDR4 would give the system a massive increase in memory capacity as 128 GB LR DDR4 DIMMs are shipping with 256 GB LR DIMMs on the horizon. Using 256 GB DIMMs, a system like this would support 8 TB which is a lot for a 1U server.
Considering the internals of the system I'm not surprised but a secondary PSU option would have been nice, even if it was external. PSU redundancy is remarkably common, even for 1U systems.
I'd also be worried about IO performance on this system with all the networking and SATA ports hanging off of a single PCIe uplink to the CPU.
I do agree with the conclusion that POWER9 looks to be very promising. The nice thing is that IBM is going to be offering both the SMT4 and SMT8 cores in both types of sockets (essentially four different dies!).
ddriver - Saturday, February 25, 2017 - link
Form factor wont change the facts - it is a power, it is too expensive, and it is too slow. Power sounds great on paper, but it doesn't seem to deliver in practice.Kevin G - Tuesday, February 28, 2017 - link
Form factor does matter when it changes the cooling system which in turn can invoke thermal throttling.Even on the x86 side of things, half wide 2U servers are popular as they can use larger fans despite having the same effective density as a full width 1U server. The increased air flow into a chassis is great for keepings cools and helps maintain high turbo levels for performance.
Zzzoom - Friday, February 24, 2017 - link
"As important as performance per watt is, several markets – HPC, Analytics, and AI chief among them – consider performance the most important metric. Wattage has to be kept under control, but that is it."What a load of garbage.
JohanAnandtech - Saturday, February 25, 2017 - link
And now maybe some arguments that substantiate your opinion?SarahKerrigan - Sunday, February 26, 2017 - link
In HPC specifically, power consumption is a major issue. This was the entire root of the success of the Blue Gene line back in the day, and why NEC is shifting its supercomputing CPUs to progressively more efficient cores instead of higher-performance cores now (SX-9: 102.4GF/core; SX-ACE: 64GF/core.) . HPC is sensitive to running cost, and power dissipation is a critical factor in that.Zzzoom - Monday, February 27, 2017 - link
Go read the 7+ years worth of materials from the EE HPC Working Group.JohanAnandtech - Wednesday, March 1, 2017 - link
In a system with 2-4 GPUs, 512 GB of RAM, the TDP of the CPU is not a dealbreaker. I can agree that some HPC markets are more sensitive to perf/watt; but I have seen a lot of examples where raw performance per dollar was just as important.Zzzoom - Wednesday, March 1, 2017 - link
POWER8 TDP is 45W-102W higher per socket than the highest spec Xeon E5. That's 90W-204W higher per node where each node consumes 1500W-2000W, or 6-10% total on a site with a multi-million dollar power bill that went to great lengths to bring down the PUE by a similar amount. So for anyone to pick POWER8 it has to do better on energy to solution through its unique features, or be considerably cheaper (ha!). POWER8's advantage is NVLink, but TSUBAME3 going with Intel+PLX switches on top of NVLink shows that it's not that big of a deal.Anyway, the efficiency requirements on the CORAL procurements are pretty strict so scale-out POWER9+Volta will have to shed a lot of weight.
Zzzoom - Wednesday, March 1, 2017 - link
I forgot about the memory buffers. It's even worse.mystic-pokemon - Sunday, March 5, 2017 - link
Guys, I know shit ton of stuff about a server Johan listed above. He has a point when he says Power consumption is only so much important.In short, when you combine all aspects to TCO model: POWER8 server delivers most optimal TCO value
We consider all the following into our TCO model
a) Cost of ownership of the server
b) Warranty (Lesser than conventional server, different model of operations)
c) What it delivers (How many independent threads (SMT8 on POWER8 remember ? 192 hardware threads), how much Memory Bandwidth (230 GBPs), how much total memory capacity in 1 server ( 1 TB with 32 GB)
d) For a public cloud use-case, how many VMs (with x HW threads and x memory cap / bw ) can you deliver on 1 POWER8 server compared to other servers in fleet today ? Based on above stats, a lot .
e) Data center floor lease cost in DC ( 24 of these servers in 1 Rack, much denser. Average the lease over age of server: 3 years ). This includes all DC services like aggers, connectivity and such.
f) Cost per KWH in the specific DC ( 1 Rack has nominal power 750W)
All this combined POWER has good TCO. Its a massively parallel server, what where major advantage comes from. Choose your workload wisely. That's why companies continue to work on it.
I am talking about all this without actually combining with CAPI over PCIe and openCAPI. Get it ? POWER is going no where.
Michael Bay - Friday, February 24, 2017 - link
I think at this point in time intel has more to fear from goddamn ARM than IBM in server space.Okay, maybe AMD as well.
JohanAnandtech - Friday, February 24, 2017 - link
Personally I think OpenPOWER is a viable competitor, but in the right niches (In memory databases, GPU accelerated + NVlink HPC). Just don't put that MHz beast in a far too small 1U cage. :-)Einy0 - Friday, February 24, 2017 - link
Articles like these make me wonder if some of these companies using IBM eServer iSeries(AS/400) as mid-level servers are wasting their money. I was always under the impression that Power was suppose to be tuned for database heavy workloads and hence have a massive advantage in doing so. I know the iSeries servers run an OS with DB2 built-in and tuned specifically for it but how much of an advantage does that really equate to?FunBunny2 - Friday, February 24, 2017 - link
-- I know the iSeries servers run an OS with DB2 built-in and tuned specifically for it but how much of an advantage does that really equate to?unless IBM has done a complete port recently, AS/400 "integrated database" was built before server versions of DB2 existed. it's/was just a retronym.
kfishy - Friday, February 24, 2017 - link
As ISAs becoming more and more relevant in the post-Moore's law world, where you can't solve a computational problem just by throwing ever more transistors at it, I wonder if this opens up opportunity for POWER to carve out niches left out by Intel's more fixed and general purpose approach.At the same time, POWER will have to contend with a nascent but rising and truly open ISA in RISC-V, where companies can simply implement the subsets of the ISA that they need. The next decade in processor architecture is going to be interesting to watch.
FunBunny2 - Friday, February 24, 2017 - link
-- As ISAs becoming more and more relevant in the post-Moore's law world, where you can't solve a computational problem just by throwing ever more transistors at itgiven that ISA has been reduced to z, ARM, and X86 not counting Power, of course. and ARM might not really qualify as equivalent. for those ancient enough, or well read enough, know that up to and during the "IBM and 7 Dwarves" era, ISA and even base architecture, made a varied ecosystem. not so much anymore. and I doubt anyone will invent a more efficient adder or multiplier or any other subunit of the real CPU. just look at the screen shots of chips over the last couple of decades: the real CPU area of a chip is nearly disappeared. in fact, much (if not most) of the transistor budget for some years has been used for caching, not ISA in hardware. so called micro-architecture is just a RISC CPU, and the rest of the chip is those caches and ever more complicated "decoder". that and integrating what had previously been other-chip functions. IOW, approaching monopoly control of compute.
I expect the next decade to be more of the same: more cache and more off-chip function brought on chip. actual CPU ISA, not so much.
aryonoco - Saturday, February 25, 2017 - link
Thank you Johan. Great article.Not all AnandTech articles live up to the standards set in the days past, but your articles continue your own excellent standards.
Very much looking forward to POWER 9 chips. Hopefully they have also done the work to port the toolchain and important software already to it this time and we won't have to wait another 12 months after release to be able to compile normal Linux programs on it.
Also, 12 fans running at 15,000 rpm in a 1U? What did that sound like?! Wow!
JohanAnandtech - Sunday, February 26, 2017 - link
Thx Aryonoco. Not all of those 12 fans were running at top speed, but imagine a Jumbo jet taking off sound. It clearly show how hard it is to cool IBM's best in a 1U: you have to limit the clockspeed to about 2/3 of what it is capable off and double the number of fans.yuhong - Wednesday, March 1, 2017 - link
"Unfortunately, the latest 8 Gbit based DIMMs are not supported."Micron don't make these chips anymore:
http://media.digikey.com/pdf/PCNs/Micron/PCN_32042...
Interestingly, Crucial is selling 32GB DDR3 quad rank RDIMMs again (but not LR-DIMMs):
http://www.crucial.com/usa/en/ct32g3erslq41339
mystic-pokemon - Sunday, March 5, 2017 - link
For folks who are saying that POWER only looks good on paper. NOT true.I know shit ton of stuff about one of the server Johan listed above. He has a point when he says Power consumption is only so much important.
In short, when you combine all aspects to TCO model: POWER8 server delivers most optimal TCO value
We consider all the following into our TCO model
a) Cost of ownership of the server
b) Warranty (Lesser than conventional server, different model of operations)
c) What it delivers (How many independent threads (SMT8 on POWER8 remember ? 192 hardware threads), how much Memory Bandwidth (230 GBPs), how much total memory capacity in 1 server ( 1 TB with 32 GB)
d) For a public cloud use-case, how many VMs (with x HW threads and x memory cap / bw ) can you deliver on 1 POWER8 server compared to other servers in fleet today ? Based on above stats, a lot .
e) Data center floor lease cost in DC ( 24 of these servers in 1 Rack, much denser. Average the lease over age of server: 3 years ). This includes all DC services like aggers, connectivity and such.
f) Cost per KWH in the specific DC ( 1 Rack has nominal power 750W)
All this combined POWER has good TCO. Its a massively parallel server, what where major advantage comes from. Choose your workload wisely. That's why companies continue to work on it.
I am talking about all this without actually combining with CAPI over PCIe and openCAPI. With POWER9 all this is getting even better. Get it ? POWER is going no where.