Would love to know how/why a power outage could cause so much damage. And who was responsible for risk assessment and mitigation at the plant that didn't build the functions to survive a power outage without that sort of outcome.
No need to manufacture power outages or stage floods. Fab companies who want to drop production due to oversupply take the much simpler route of making a public announcement (usually as part of an investor call or other financial filing) that they are dropping output due to oversupply.
Price fixing? Do you think the accident didn't happen? The wafers weren't really destroyed? Or you think WD and Toshiba just "took one for the team"? I suppose Airbus and Boeing are similarly price fixing for aircraft by crashing two of Boeing's aircraft.
There was backup power, the outage lasted longer than the backup lasted.
The reason the losses were so extensive is that you can't stop any wafer step midway through without ruining it; and it takes a few months to go from a blank wafer of silicon to one that's ready to be cut into individual dies; so when the outage exceeded the onsite backup runtime it killed every wafer that was in progress inside a machine at the time; which was roughly half the quarterly output of the plant.
I was also getting to the conclusion that a wafer must takes weeks for the processing, but I'm still amazed that that's the case. Weeks/months from the first photolithography to the ready die? that way more than I thought. At 96 layers, that's one day per layer!
The same way a 15 minute power outage can cause a nuclear power plant to meltdown. Aside from the obvious irony of that statement, the pumps that operate the coolant pools are not powered independently from the power generated by the nuclear fuel rods. Which is why there are numerous real-time generators and super capacitors.
It's safe to say this plant should have had, or possibly does have, a similar multi-redundant backup solution, but it seems like the entire grid failed, preventing even backup power from entering the various FABs.
This is ridiculous. Power plants don't go into meltdown when you cut their power. Even old designs. New designs will just stop the reaction when any problem occurs.
Your right, historically (and in testing) it has taken up to a few hours for a core to completely meldtown.
None the less, PWR and BWR's become incredibly unstable after a few minutes of pump service being seized, and in particular a BWR (which are the majority of reactors in service in the United States, btw) once the pressure builds from lack of cooling there is catastrophic failure throughout the reactors components.
It's important to point out EVERY nuclear power plant disaster ever has been caused by unstable core pressure from improper cooling, and other than Chernobyl, they were pump failures or power failures (Three Mile, Fukushima, etc)
Hybrid reactors or breeder reactors are the only 'safe designs' for short-term power failure, but obviously in any type of nuclear fuel, they don't exactly 'cool off' quickly. Most cores take weeks or months to be suspended.
You need to read up more on nuclear power generation. Fission of heavy elements is not a single step process, rather uranium splits to fission products, which split to other fission products with a chance relative to their half-life, which split to other fission products, etc. The top of the chain is stopped quickly in the event of malfunction, but further decay will continue for quite some time, all producing heat. It'll be weeks before the reactor reached as cold standby state, and particularly right after the reactor shuts down there is an awful lot of heat which needs to continue to be removed, otherwise reactor fuel will melt.
thewishy, you're focusing on fission reactors, newer designs. The United States has like 2 of them, and the majority of nuclear reactors around the world are 50+ year old designs. They do not 'shut down' like a breeder reactor.
A reactor core will SCRAM to kill the chain reaction in the event of a variety of problems (or routinely as a normal shutdown procedure), but cooling needs to be functional to continue to cope with decay heat or the core will suffer from meltdown. Just because all the control rods are in place and the nuclear chain reaction has been shutdown does not mean the reactor is in a safe and stable state. However, I do agree that fifteen minutes is far to small of a window for a real disaster to fully play out.
You are talking about 128 layers of NAND memory. That means there are 128 Photo and Etch layers, with deposition layers thrown in there for good measure. The number of steps there are in the process flow, all included will stretch to way over a 1000 (this is a conservative estimate). For each step you have a cluster/fleet of tools running the same process to keep up with the demand. So when you are talking about a power outage of this magnitude, you can easily have 10K wafers in the line at various stages being processed - again this is a conservative estimate. All these wafers have to be scrapped because the tools have broken their clean room/vacuum space mechanism, the air filters are not working, processes are halted before they are finished. These wafers are as good as junk because the contamination levels on these will be very high. Take all these factors into consideration, it measures up to a big loss.
My thoughts exactly.With this supposedly huge loss..A company could have used that amount of money to have built a modern power plant for backup..Time to bite the bullet..
Manufacturing semiconductors can take months from start to finish. Anything being processed in equipment that cannot restart (most) will be either lost or, depending on the step, need rework (e.g. clean off the partial layer with acid then reprocess). The number does seem large to me, but it may be a very high output fab with a very large amount of WIP.
With so much potential for revenue loss and time required to recover, I am curious what kind of power redundancy they have. I understand this facility probably uses an image amount of energy, but I assume all of the power redundancy infrastructure necessary to keep the plant running would cost less than $339,000,000.00
This was covered pretty extensively in the comments last time an article about this incident was posted, but at some point it's not feasible to build additional backup power and they just get insurance instead. Assuming insurance pays out, they won't lose money, and may actually benefit if this event causes NAND prices to increase.
If you're willing to pay enough to a boutique insurer you can get a policy for almost anything. After a certain point though if the risk is too hard to price they'll end up asking so much that it's not worth buying a policy vs accepting all the risk yourself. (When in doubt they err on the side of not taking a major loss vs only earning their average return.)
I have not read up on this incident heavily but I view your logic as reasonable except for one key point. I find it hard to believe the fabs used enough power to make having backups last longer than a few minutes unjustifiable cost-wise. The biggest expense cost-wise for backup power is an immediate supply of power to safeguard against grid fluctuates. Even if the grid has an issue for less than a second it can wreak havoc with the billions of sensitive manufacturing machines. As such, no company in their right mind would run such a plant without a battery backup (batteries and supercapacitors are the only sources fast enough to react in these senouries) in place. Of course the issue is that a batteries to run a plant that uses power in the megawatt range is prohibitively expensive for more than a few minutes. Which is way battery systems are normally designed to last only long enough to get back up generators up and running. For big diesel generators this can take a few minutes. A quick google search shows new portable 1 mega watt portable generators (in semi trailers) in the US cost about $400k. So the cost for ~20+ megawatts in the form of stationary backup diesel generators should cost under 10 million usd. Making it one of the cheaper if not cheapest systems in the fab.
Generators are dirt cheap compared to the losses incurred here. No one would build a plant without at least enough back up power to perform an orderly shutdown, which I assume could take hours. They almost certainly had generators, but obviously some fault prevented them from operating correctly.
My bet is they had the infrastructure in place, and it got old, and while executives argued about paying for replacements for this part or that part, the power outage occurred and someone finally realized they really needed the parts.
This happens in corporations a LOT. Executives argue and debate about paying for things without realizing how much those things are really needed. Such penny pinchers don't realize the damage they do to themselves.
My current company has executives arguing against replacing our 7 year old storage. We had 8 drives fail on one unit within one day, causing our biggest customer to lose 7 VMs on one datastore. It took us a week and a half to get the storage back up and the VMs restored from backup. Our customer was extremely ticked off. That encouraged them to actually move that customer onto newer storage, but most of our customers are still on the old stuff, and we have 6-9 failed drives per week on it. We've already had two major customers leave because of problems from failed equipment.
That doesn't even get into the problems created from stressed employees because of lack of proper staff or paying well enough to attract staff with enough skill to do the job.
Oh the irony! All they needed was 20-30 minutes (preferably more) of battery backup for the factory and the irony is that Toshiba makes grid scale battery systems! https://www.scib.jp/en/applications/energy.htm
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
30 Comments
Back to Article
yacoub35 - Thursday, August 1, 2019 - link
Would love to know how/why a power outage could cause so much damage. And who was responsible for risk assessment and mitigation at the plant that didn't build the functions to survive a power outage without that sort of outcome.euskalzabe - Friday, August 2, 2019 - link
Exactly my thoughts. You can smell the price fixing from miles away. Disgusting.edzieba - Friday, August 2, 2019 - link
No need to manufacture power outages or stage floods. Fab companies who want to drop production due to oversupply take the much simpler route of making a public announcement (usually as part of an investor call or other financial filing) that they are dropping output due to oversupply.Yojimbo - Friday, August 2, 2019 - link
That's not price fixing... Reducing production is not price fixing or collusion..Yojimbo - Friday, August 2, 2019 - link
Price fixing? Do you think the accident didn't happen? The wafers weren't really destroyed? Or you think WD and Toshiba just "took one for the team"? I suppose Airbus and Boeing are similarly price fixing for aircraft by crashing two of Boeing's aircraft.Sahrin - Friday, August 2, 2019 - link
Yeah, they kind of are. They are both paying huge bribes to right wing politicians to deregulate them, which is the ultimate cause of the accident.Manch - Tuesday, August 6, 2019 - link
So it's Trumps fault?! LOLDanNeely - Friday, August 2, 2019 - link
There was backup power, the outage lasted longer than the backup lasted.The reason the losses were so extensive is that you can't stop any wafer step midway through without ruining it; and it takes a few months to go from a blank wafer of silicon to one that's ready to be cut into individual dies; so when the outage exceeded the onsite backup runtime it killed every wafer that was in progress inside a machine at the time; which was roughly half the quarterly output of the plant.
buxe2quec - Friday, August 2, 2019 - link
I was also getting to the conclusion that a wafer must takes weeks for the processing, but I'm still amazed that that's the case. Weeks/months from the first photolithography to the ready die? that way more than I thought. At 96 layers, that's one day per layer!Kristian Vättö - Friday, August 2, 2019 - link
It takes ~3 months to produce a 3D NAND wafer and there are over 400 steps in the process.Samus - Friday, August 2, 2019 - link
The same way a 15 minute power outage can cause a nuclear power plant to meltdown. Aside from the obvious irony of that statement, the pumps that operate the coolant pools are not powered independently from the power generated by the nuclear fuel rods. Which is why there are numerous real-time generators and super capacitors.It's safe to say this plant should have had, or possibly does have, a similar multi-redundant backup solution, but it seems like the entire grid failed, preventing even backup power from entering the various FABs.
buxe2quec - Friday, August 2, 2019 - link
Definitely not, 15 minutes are way not enough for a meltdown.mgs - Friday, August 2, 2019 - link
This is ridiculous. Power plants don't go into meltdown when you cut their power. Even old designs. New designs will just stop the reaction when any problem occurs.Samus - Friday, August 2, 2019 - link
Your right, historically (and in testing) it has taken up to a few hours for a core to completely meldtown.None the less, PWR and BWR's become incredibly unstable after a few minutes of pump service being seized, and in particular a BWR (which are the majority of reactors in service in the United States, btw) once the pressure builds from lack of cooling there is catastrophic failure throughout the reactors components.
It's important to point out EVERY nuclear power plant disaster ever has been caused by unstable core pressure from improper cooling, and other than Chernobyl, they were pump failures or power failures (Three Mile, Fukushima, etc)
Hybrid reactors or breeder reactors are the only 'safe designs' for short-term power failure, but obviously in any type of nuclear fuel, they don't exactly 'cool off' quickly. Most cores take weeks or months to be suspended.
thewishy - Friday, August 2, 2019 - link
You need to read up more on nuclear power generation. Fission of heavy elements is not a single step process, rather uranium splits to fission products, which split to other fission products with a chance relative to their half-life, which split to other fission products, etc. The top of the chain is stopped quickly in the event of malfunction, but further decay will continue for quite some time, all producing heat. It'll be weeks before the reactor reached as cold standby state, and particularly right after the reactor shuts down there is an awful lot of heat which needs to continue to be removed, otherwise reactor fuel will melt.Samus - Saturday, August 3, 2019 - link
thewishy, you're focusing on fission reactors, newer designs. The United States has like 2 of them, and the majority of nuclear reactors around the world are 50+ year old designs. They do not 'shut down' like a breeder reactor.PeachNCream - Friday, August 2, 2019 - link
A reactor core will SCRAM to kill the chain reaction in the event of a variety of problems (or routinely as a normal shutdown procedure), but cooling needs to be functional to continue to cope with decay heat or the core will suffer from meltdown. Just because all the control rods are in place and the nuclear chain reaction has been shutdown does not mean the reactor is in a safe and stable state. However, I do agree that fifteen minutes is far to small of a window for a real disaster to fully play out.edzieba - Friday, August 2, 2019 - link
Fabs like these draw tens to hundreds of MEGAwatts. Any long-term backup power solution for one of them is a power station in its own right.MananDedhia - Friday, August 2, 2019 - link
You are talking about 128 layers of NAND memory. That means there are 128 Photo and Etch layers, with deposition layers thrown in there for good measure. The number of steps there are in the process flow, all included will stretch to way over a 1000 (this is a conservative estimate). For each step you have a cluster/fleet of tools running the same process to keep up with the demand. So when you are talking about a power outage of this magnitude, you can easily have 10K wafers in the line at various stages being processed - again this is a conservative estimate. All these wafers have to be scrapped because the tools have broken their clean room/vacuum space mechanism, the air filters are not working, processes are halted before they are finished. These wafers are as good as junk because the contamination levels on these will be very high. Take all these factors into consideration, it measures up to a big loss.towner7 - Friday, August 2, 2019 - link
My thoughts exactly.With this supposedly huge loss..A company could have used that amount of money to have built a modern power plant for backup..Time to bite the bullet..Sivar - Saturday, August 3, 2019 - link
Manufacturing semiconductors can take months from start to finish. Anything being processed in equipment that cannot restart (most) will be either lost or, depending on the step, need rework (e.g. clean off the partial layer with acid then reprocess).The number does seem large to me, but it may be a very high output fab with a very large amount of WIP.
oRAirwolf - Thursday, August 1, 2019 - link
With so much potential for revenue loss and time required to recover, I am curious what kind of power redundancy they have. I understand this facility probably uses an image amount of energy, but I assume all of the power redundancy infrastructure necessary to keep the plant running would cost less than $339,000,000.00quorm - Friday, August 2, 2019 - link
This was covered pretty extensively in the comments last time an article about this incident was posted, but at some point it's not feasible to build additional backup power and they just get insurance instead. Assuming insurance pays out, they won't lose money, and may actually benefit if this event causes NAND prices to increase.Kristian Vättö - Friday, August 2, 2019 - link
There is no insurance for fab outage.mgs - Friday, August 2, 2019 - link
No sane insurance company would insure for this.DanNeely - Friday, August 2, 2019 - link
If you're willing to pay enough to a boutique insurer you can get a policy for almost anything. After a certain point though if the risk is too hard to price they'll end up asking so much that it's not worth buying a policy vs accepting all the risk yourself. (When in doubt they err on the side of not taking a major loss vs only earning their average return.)Skeptical123 - Saturday, August 3, 2019 - link
I have not read up on this incident heavily but I view your logic as reasonable except for one key point. I find it hard to believe the fabs used enough power to make having backups last longer than a few minutes unjustifiable cost-wise. The biggest expense cost-wise for backup power is an immediate supply of power to safeguard against grid fluctuates. Even if the grid has an issue for less than a second it can wreak havoc with the billions of sensitive manufacturing machines. As such, no company in their right mind would run such a plant without a battery backup (batteries and supercapacitors are the only sources fast enough to react in these senouries) in place. Of course the issue is that a batteries to run a plant that uses power in the megawatt range is prohibitively expensive for more than a few minutes. Which is way battery systems are normally designed to last only long enough to get back up generators up and running. For big diesel generators this can take a few minutes. A quick google search shows new portable 1 mega watt portable generators (in semi trailers) in the US cost about $400k. So the cost for ~20+ megawatts in the form of stationary backup diesel generators should cost under 10 million usd. Making it one of the cheaper if not cheapest systems in the fab.voicequal - Monday, August 5, 2019 - link
Generators are dirt cheap compared to the losses incurred here. No one would build a plant without at least enough back up power to perform an orderly shutdown, which I assume could take hours. They almost certainly had generators, but obviously some fault prevented them from operating correctly.dgingeri - Saturday, August 3, 2019 - link
My bet is they had the infrastructure in place, and it got old, and while executives argued about paying for replacements for this part or that part, the power outage occurred and someone finally realized they really needed the parts.This happens in corporations a LOT. Executives argue and debate about paying for things without realizing how much those things are really needed. Such penny pinchers don't realize the damage they do to themselves.
My current company has executives arguing against replacing our 7 year old storage. We had 8 drives fail on one unit within one day, causing our biggest customer to lose 7 VMs on one datastore. It took us a week and a half to get the storage back up and the VMs restored from backup. Our customer was extremely ticked off. That encouraged them to actually move that customer onto newer storage, but most of our customers are still on the old stuff, and we have 6-9 failed drives per week on it. We've already had two major customers leave because of problems from failed equipment.
That doesn't even get into the problems created from stressed employees because of lack of proper staff or paying well enough to attract staff with enough skill to do the job.
Asurmen - Saturday, August 3, 2019 - link
Oh the irony! All they needed was 20-30 minutes (preferably more) of battery backup for the factory and the irony is that Toshiba makes grid scale battery systems! https://www.scib.jp/en/applications/energy.htm