"The use of SOI enables a tangible increase of CPU clock rates without a massive increase of power consumption, but a SOI wafer substrate costs more than a bulk substrate"
As far as I understood so far, SOI mainly helps with substrate leakage. If you can get that under control in any other way, you're fine (Intels way). But if SOI allows tangible frequency increases, I wonder why it's not used more widely. The substrate cost is <300$ more than a bare Si wafer, but an entire processed wafer is approaching 10k$ in the leading edge nodes (or will get there at 7 nm). Perentage wise the cost of SOI substrates would dwarf the increased revenue from e.g. Ryzen or Vega being 10 - 20% faster due to higher clock speeds. If it was that simple the other foundries would follow that path, too.
BTW: should "Gate Height" in the table be "standard cell height"?
IBM is surely a vertically integrated company. One one end they do advanced fundamental physical research and on the other end they make database servers and JVMs.
They put many low-level features in their hardware, that their software and upper layers utilize. Apple seems to be going the same way.
>As far as I understood so far, SOI mainly helps with substrate leakage. If you can get that under control in any other way, you're fine (Intels way).
I was curious about this as well. You would think SOI would be less useful in a FinFET system since the channel is barely in contact with the substrate due to the Fin, but apparently there is still some scope for further leakage reduction:
If some prior IBM CPUs are an indication, stupidly high TDPs. Mainframes have been water cooled long before any of the cool kids started kludging things up with heater cores from a junk yard to overclock early pentiums even higher.
-- Mainframes have been water cooled long before any of the cool kids started kludging things up with heater cores from a junk yard to overclock early pentiums even higher.
Um. IBM mainframes were liquid cooled at least until 1990. they've been air cooled since.
No turbo, that speed is 24x7, should see the cooling on these. We have some in our data center they almost look like they are batman themed server racks.
If the pipeline is long enough it can sustain (theoretically*) higher clocks more easily. Intel's Netburst started at 20 pipeline stages (Willamette & Northwood), then reached 31 stages (Prescott, Cedar Mill etc), and there were even plans for successors (Tejas and Jayhawk) with freaking 40 to 50 pipeline stages. These would supposedly reach 7+ Ghz clocks, before they were cancelled and replaced by the Core microarchitecture. *Only theoretically apparently. The first samples of Tejas were clocked at merely 2.8 Ghz and had a 150W TDP, much higher than both the previous Prescott and the first Core CPUs that followed.
You have to realize these z14's are costing half million and up just for the initial cost. They sell some lower end versions but nobody buys those they just get cheaper servers. In reality most people are likely paying 5 million a year to run these once you account for hardware, software, electricity and your maintenance contracts.
-- I’m actually surprised they have enough customers to make this happen.
it's a symbiotic death spiral. the Fortune 500 adopted mainframes and COBOL ~1960 and kept going for decades. re-writing those billions and billions of lines of code in java or PHP isn't feasible, so IBM's cost to continue to produce mainframes may rise (not necessarily, btw given how much of the guts of z machine have shrunk compared to a 370) which gets passed on to your bank, insurance company, etc.
People also run Linux partitions like hundreds of Linux VM's on these. That helps offset the costs of buying a mainframe. I find it a novel idea as you migrate apps from old COBOL to java etc you can use extra compute power to run more VM's.
I second that comment. There's never been an explanation about what "T" has to do with design libraries (not to mention that different "design libraries" could behave quite differently).
A lower number means denser designs are possible, so maybe it's the average number of transistors in ... some standard cell mix? I definitely second the explanation request.
This is all over my head but I did find this while googling:
"The cell height can also be measured by the number of metal tracks that are needed for routing for the cell; in recent nodes we have gone from 12-track (12T), to 9T, to ~7.5T in the latest 14- and 16-nm processes. "
>This is the second time I've seen a comment about #T libraries in a recent process update article. What exactly does it mean?
T = "Track". Rather than just looking at how big each individual fin is in a FinFET transistor, you also have to consider how large each logic cell is (how many fins and metal contacts it will require). Even if the fins stay the same size, if you can make each one more capable, you can make the logic cell smaller by using fewer fins and/or metal wires.
Here's a very simple explainer. Each fin of a finFET can carry a limited amount of current (which is usually not as much as is needed). This means multiple fins are required to create a single transistor. In the past this was as much as 4 fins per transistor (more for some specialty high-current transistors), and a focus of recent design has been to make the fins taller, which allows them to carry less current, thus reducing the number required (most of the time) to three fins, then two, and ultimately one. Each stage of being able to reduce the number of fins means that you can rearrange (and shrink) the various items required to create standard cells (like a single bit of an SRAM).
There's always been some variation in how SRAMs are designed and laid out (in the past people talked about 6T vs 8T designs, T in that case meaning the number of transistors), but this issue of "track size of an SRAM bit" has become a bigger issue since FinFETs because each time you can make the fins taller and reduce the number per transistor, you get an additional density boost.
I assume that IBM's using these high-track-number cells means, essentially, they want lots of fins for each transistor creating higher currents, which in turn allows for faster switching, at the cost of more power. (There is a second set of issues related to how the metal layers per cell are laid out connecting the various pieces together, but the above gives the idea and the main concerns.)
Not sure this is really correct. The std cell track height has nothing to do with SRAM, which has a completely different design/layout. (Although yes, the T in 6T SRAM does stand for transistor.)
The track height is simply a dimensionless measure of the height of the standard cells used in a given library. (Note that there are almost always multiple libraries available for a given process, although with smaller nodes the development of the process and the std cell libraries has become more tightly coupled.) If a given library has cells that are 120 nm tall, and the M1 pitch is 10 nm then you have a 120 nm / 10 nm = 12 Track cell. This measure is used instead of the raw height because it scales with the process node shrinks, so we can talk about a 12 T cell in 10 nm and also 90 nm and still (kind of) mean the same thing. (The absolute performance difference would of course be huge.) In both of these you could route 12 separate metal lines horizontally through the cell. In reality, you get fewer than 12 because power and ground are routed in a solid stripe along the top and bottom of the rows.
Typically from what I've seen, 7 T or less is for the densest (aka cheapest) designs, 7-10 is middle of the road, and 10+ for high performance. I'm not an expert in cell design, so I can't really say why larger cells give you more performance. It definitely gives the cell designer more area to work with. You additional vertical area, and can still use as much width as needed. Because unlike height, width is variable from cell to cell.
Does the SOI cost that much more that AMD choose to use the 14LPP process over the 14HP process to make Zen cores? Just curious as it seems very likely they could have increased frequency due to how efficient the CPU's are.
It's not that much more expensive (see 1st post). But I'm sure the process wasn't ready for Zen & Vega they only start to talk about it now. The regular 14 nm process was licensed from Samsung and freshly implemented when AMD already needed it (at decent yields, of course).
Well that must hold true because AMD have mostly always used SOI, right up to Piledriver.
Not sure entirely how different it is though from the 14LPP process, but it seems like AMD are moving forward with 12nm and then 7nm FinFET. Either it requires too much time or cost to rework Zen, process itself costs too much, a combination of those three, or IBM won't let them use it.
-- IBM Z mainframes are based on specially designed IBM z-series CPUs, which are unique both in terms of microarchitecture, feature set and even physical layout.
there are billions and billions of lines of COBOL which depend on hardware assists implemented decades ago. and there are, likely, millions of lines of assembler running all manner of critical systems.
From the article: "Each IBM z14 SC CPU consists of 6.1 billion transistors, runs at 5.2 GHz and contains 10 cores with dedicated 6 MB L2 per core (2MB L2 for instructions, 4MB L2 for data) and 128 MB shared L3. Meanwhile, the system control (SC) chip consists of 9.7 ..."
Shouldn't that first "Z14 SC CPU" be "Z14 CP CPU"?
Was really confused reading that wondering why the system control chip had 10 cores with no mention of the actual CP setup. Then the next sentence is about the SC again, with different specs.
IBM's decision to go with FinFET on SOI dates back well over five years, and the physics have not changed in that time. If you're willing to take a bit of a deep dive, Terry Hook, who's IBM's SOI-FinFET holes & electrons guru, did a few pieces for Advanced Substrate News back in 2012/2013. I'll paste in links below.
http://bit.ly/2xPCNIW - FinFET on SOI: Potential Becomes Reality http://bit.ly/2yBtuKj - IBM: FinFET Isolation Considerations and Ramifications – Bulk vs. SOI http://bit.ly/2kh4dTA - IBM: Why Fin-on-Oxide (FOx/SOI) Is Well-Positioned to Deliver Optimal FinFET Value
Also, the folks at the SOI Consortium laid it out in very simple terms (re: advantages of doing planar FD-SOI or FinFET-on-SOI) back in a 2012 piece -- you can find that here: http://bit.ly/2fCLTix.
And finally, a note for those worried about SOI wafer costs, the wafers are slightly more expensive, but that cost is immediately amortized because the manufacturing process in the fab uses fewer mask steps when the starting wafer is an SOI wafer. That holds for both FinFETs on SOI and FD-SOI, btw.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
31 Comments
Back to Article
MrSpadge - Friday, September 22, 2017 - link
"The use of SOI enables a tangible increase of CPU clock rates without a massive increase of power consumption, but a SOI wafer substrate costs more than a bulk substrate"As far as I understood so far, SOI mainly helps with substrate leakage. If you can get that under control in any other way, you're fine (Intels way). But if SOI allows tangible frequency increases, I wonder why it's not used more widely. The substrate cost is <300$ more than a bare Si wafer, but an entire processed wafer is approaching 10k$ in the leading edge nodes (or will get there at 7 nm). Perentage wise the cost of SOI substrates would dwarf the increased revenue from e.g. Ryzen or Vega being 10 - 20% faster due to higher clock speeds. If it was that simple the other foundries would follow that path, too.
BTW: should "Gate Height" in the table be "standard cell height"?
LuckyWhale - Friday, September 22, 2017 - link
IBM is surely a vertically integrated company. One one end they do advanced fundamental physical research and on the other end they make database servers and JVMs.They put many low-level features in their hardware, that their software and upper layers utilize. Apple seems to be going the same way.
saratoga4 - Friday, September 22, 2017 - link
>As far as I understood so far, SOI mainly helps with substrate leakage. If you can get that under control in any other way, you're fine (Intels way).I was curious about this as well. You would think SOI would be less useful in a FinFET system since the channel is barely in contact with the substrate due to the Fin, but apparently there is still some scope for further leakage reduction:
http://semiengineering.com/finfet-isolation-bulk-v...
SarahKerrigan - Friday, September 22, 2017 - link
This is also the process that's going to be used for Power9, shipping later this year (and somewhat less exotic than z14.)bill.rookard - Friday, September 22, 2017 - link
Um... 5.2ghz speeds designed for 24/7/365 operations? Intel and AMD could take some notes.bubblyboo - Friday, September 22, 2017 - link
5.2Ghz and 10 cores. I wonder if that's some sort of turbo speed though since having max speed running on all cores 24/7 sounds absurd.DanNeely - Friday, September 22, 2017 - link
If some prior IBM CPUs are an indication, stupidly high TDPs. Mainframes have been water cooled long before any of the cool kids started kludging things up with heater cores from a junk yard to overclock early pentiums even higher.FunBunny2 - Friday, September 22, 2017 - link
-- Mainframes have been water cooled long before any of the cool kids started kludging things up with heater cores from a junk yard to overclock early pentiums even higher.Um. IBM mainframes were liquid cooled at least until 1990. they've been air cooled since.
edzieba - Friday, September 22, 2017 - link
Z14 only air-cools the SC (the 'chipset' equivalent). The CP's are all watercooled: https://www.anandtech.com/show/11750/hot-chips-ibm...Alexvrb - Saturday, September 23, 2017 - link
Hey now, a lot of people used NEW heater cores, thank you very much.FreckledTrout - Friday, September 22, 2017 - link
No turbo, that speed is 24x7, should see the cooling on these. We have some in our data center they almost look like they are batman themed server racks.https://www.ibm.com/developerworks/community/blogs...
Santoval - Friday, September 22, 2017 - link
If the pipeline is long enough it can sustain (theoretically*) higher clocks more easily. Intel's Netburst started at 20 pipeline stages (Willamette & Northwood), then reached 31 stages (Prescott, Cedar Mill etc), and there were even plans for successors (Tejas and Jayhawk) with freaking 40 to 50 pipeline stages. These would supposedly reach 7+ Ghz clocks, before they were cancelled and replaced by the Core microarchitecture.*Only theoretically apparently. The first samples of Tejas were clocked at merely 2.8 Ghz and had a 150W TDP, much higher than both the previous Prescott and the first Core CPUs that followed.
FreckledTrout - Friday, September 22, 2017 - link
You have to realize these z14's are costing half million and up just for the initial cost. They sell some lower end versions but nobody buys those they just get cheaper servers. In reality most people are likely paying 5 million a year to run these once you account for hardware, software, electricity and your maintenance contracts.Dug - Saturday, September 23, 2017 - link
Compared to what they bring in on a daily basis it might make sense? I’m actually surprised they have enough customers to make this happen.FunBunny2 - Saturday, September 23, 2017 - link
-- I’m actually surprised they have enough customers to make this happen.it's a symbiotic death spiral. the Fortune 500 adopted mainframes and COBOL ~1960 and kept going for decades. re-writing those billions and billions of lines of code in java or PHP isn't feasible, so IBM's cost to continue to produce mainframes may rise (not necessarily, btw given how much of the guts of z machine have shrunk compared to a 370) which gets passed on to your bank, insurance company, etc.
FreckledTrout - Wednesday, September 27, 2017 - link
People also run Linux partitions like hundreds of Linux VM's on these. That helps offset the costs of buying a mainframe. I find it a novel idea as you migrate apps from old COBOL to java etc you can use extra compute power to run more VM's.DanNeely - Friday, September 22, 2017 - link
"and uses 12T libraries (vs. 9T and 7.5T for various 14 nodes)"This is the second time I've seen a comment about #T libraries in a recent process update article. What exactly does it mean?
CajunArson - Friday, September 22, 2017 - link
I second that comment. There's never been an explanation about what "T" has to do with design libraries (not to mention that different "design libraries" could behave quite differently).MrSpadge - Friday, September 22, 2017 - link
A lower number means denser designs are possible, so maybe it's the average number of transistors in ... some standard cell mix? I definitely second the explanation request.kaeljae - Friday, September 22, 2017 - link
This is all over my head but I did find this while googling:"The cell height can also be measured by the number of metal tracks that are needed for routing for the cell; in recent nodes we have gone from 12-track (12T), to 9T, to ~7.5T in the latest 14- and 16-nm processes. "
http://electroiq.com/chipworks_real_chips_blog/201...
saratoga4 - Friday, September 22, 2017 - link
>This is the second time I've seen a comment about #T libraries in a recent process update article. What exactly does it mean?T = "Track". Rather than just looking at how big each individual fin is in a FinFET transistor, you also have to consider how large each logic cell is (how many fins and metal contacts it will require). Even if the fins stay the same size, if you can make each one more capable, you can make the logic cell smaller by using fewer fins and/or metal wires.
name99 - Friday, September 22, 2017 - link
Here's a very simple explainer.Each fin of a finFET can carry a limited amount of current (which is usually not as much as is needed). This means multiple fins are required to create a single transistor. In the past this was as much as 4 fins per transistor (more for some specialty high-current transistors), and a focus of recent design has been to make the fins taller, which allows them to carry less current, thus reducing the number required (most of the time) to three fins, then two, and ultimately one.
Each stage of being able to reduce the number of fins means that you can rearrange (and shrink) the various items required to create standard cells (like a single bit of an SRAM).
There's always been some variation in how SRAMs are designed and laid out (in the past people talked about 6T vs 8T designs, T in that case meaning the number of transistors), but this issue of "track size of an SRAM bit" has become a bigger issue since FinFETs because each time you can make the fins taller and reduce the number per transistor, you get an additional density boost.
I assume that IBM's using these high-track-number cells means, essentially, they want lots of fins for each transistor creating higher currents, which in turn allows for faster switching, at the cost of more power.
(There is a second set of issues related to how the metal layers per cell are laid out connecting the various pieces together, but the above gives the idea and the main concerns.)
CajunArson - Friday, September 22, 2017 - link
Thanks!evancox10 - Friday, September 22, 2017 - link
Not sure this is really correct. The std cell track height has nothing to do with SRAM, which has a completely different design/layout. (Although yes, the T in 6T SRAM does stand for transistor.)The track height is simply a dimensionless measure of the height of the standard cells used in a given library. (Note that there are almost always multiple libraries available for a given process, although with smaller nodes the development of the process and the std cell libraries has become more tightly coupled.) If a given library has cells that are 120 nm tall, and the M1 pitch is 10 nm then you have a 120 nm / 10 nm = 12 Track cell. This measure is used instead of the raw height because it scales with the process node shrinks, so we can talk about a 12 T cell in 10 nm and also 90 nm and still (kind of) mean the same thing. (The absolute performance difference would of course be huge.) In both of these you could route 12 separate metal lines horizontally through the cell. In reality, you get fewer than 12 because power and ground are routed in a solid stripe along the top and bottom of the rows.
Typically from what I've seen, 7 T or less is for the densest (aka cheapest) designs, 7-10 is middle of the road, and 10+ for high performance. I'm not an expert in cell design, so I can't really say why larger cells give you more performance. It definitely gives the cell designer more area to work with. You additional vertical area, and can still use as much width as needed. Because unlike height, width is variable from cell to cell.
FreckledTrout - Friday, September 22, 2017 - link
Does the SOI cost that much more that AMD choose to use the 14LPP process over the 14HP process to make Zen cores? Just curious as it seems very likely they could have increased frequency due to how efficient the CPU's are.saratoga4 - Friday, September 22, 2017 - link
The wafers are a lot more expensive, yeah.MrSpadge - Saturday, September 23, 2017 - link
It's not that much more expensive (see 1st post). But I'm sure the process wasn't ready for Zen & Vega they only start to talk about it now. The regular 14 nm process was licensed from Samsung and freshly implemented when AMD already needed it (at decent yields, of course).Brodz - Wednesday, September 27, 2017 - link
Well that must hold true because AMD have mostly always used SOI, right up to Piledriver.Not sure entirely how different it is though from the 14LPP process, but it seems like AMD are moving forward with 12nm and then 7nm FinFET. Either it requires too much time or cost to rework Zen, process itself costs too much, a combination of those three, or IBM won't let them use it.
FunBunny2 - Friday, September 22, 2017 - link
-- IBM Z mainframes are based on specially designed IBM z-series CPUs, which are unique both in terms of microarchitecture, feature set and even physical layout.there are billions and billions of lines of COBOL which depend on hardware assists implemented decades ago. and there are, likely, millions of lines of assembler running all manner of critical systems.
phoenix_rizzen - Friday, September 22, 2017 - link
From the article:"Each IBM z14 SC CPU consists of 6.1 billion transistors, runs at 5.2 GHz and contains 10 cores with dedicated 6 MB L2 per core (2MB L2 for instructions, 4MB L2 for data) and 128 MB shared L3. Meanwhile, the system control (SC) chip consists of 9.7 ..."
Shouldn't that first "Z14 SC CPU" be "Z14 CP CPU"?
Was really confused reading that wondering why the system control chip had 10 cores with no mention of the actual CP setup. Then the next sentence is about the SC again, with different specs.
Adele Hars - Sunday, October 1, 2017 - link
IBM's decision to go with FinFET on SOI dates back well over five years, and the physics have not changed in that time. If you're willing to take a bit of a deep dive, Terry Hook, who's IBM's SOI-FinFET holes & electrons guru, did a few pieces for Advanced Substrate News back in 2012/2013. I'll paste in links below.http://bit.ly/2xPCNIW - FinFET on SOI: Potential Becomes Reality
http://bit.ly/2yBtuKj - IBM: FinFET Isolation Considerations and Ramifications – Bulk vs. SOI
http://bit.ly/2kh4dTA - IBM: Why Fin-on-Oxide (FOx/SOI) Is Well-Positioned to Deliver Optimal FinFET Value
Also, the folks at the SOI Consortium laid it out in very simple terms (re: advantages of doing planar FD-SOI or FinFET-on-SOI) back in a 2012 piece -- you can find that here: http://bit.ly/2fCLTix.
And finally, a note for those worried about SOI wafer costs, the wafers are slightly more expensive, but that cost is immediately amortized because the manufacturing process in the fab uses fewer mask steps when the starting wafer is an SOI wafer. That holds for both FinFETs on SOI and FD-SOI, btw.