I read somewhere that Intel has no plans to go below 10 NM on processor side of things so they will look into silicon stacking. I guess they are doing stacked floating gate NAND to prepare for that.
lol how? It's one thing to have simple repetitive structures and go 3D and quite another to have a very complex SoC. Would be amazing if someone could do 3D GPUs (those at least have thousands of cores) but no clue if even that is in any way doable.
This is purely speculation, but the move to 3 d layers may make FPG's more important for the chip design. Layering in hundreds of layers may require more ability to program around the flawed area's of the chip. Having FPG's integrated into the stack could allow them to reroute around broken or flawed areas on the chip. The downside for FPG's has been the slower process until Intel started selling their best process to companies that produce chips that don't compete with x86 chips.
FPG's are also slow to change in comparison to preset silicon like current Intel CPU's. I would expect integrating them into the processor from the begining would be no small engineering feat, but compared to continuing to shrink die sizes it may be much easier.
How do they make the dies for 3d NAND? My understanding of process tech is not strong, I thought that they did lithography on the face of a silicon wafer. How do they create structures deeply embedded within silicon for 3d NAND?
I know, how compilicated current SoC designs are. 8-10 layers of interconnects and parts are norm. But 32 layers of NAND, which are themselves 3D structures now (3-4 layers of components for each NAND layer, I'm guessing), ups this to over 100+ component layers. This makes actual litography process longer, and errors stack. Yields HAVE to go down, unless Samsung has way of laying down large part or all of layers at once.
Except costs and the huge difference in manufacturing between RAM and CPUs. For a long time some people expected RAM and CPU on the same die and it didn't and won't happen because it's not cost effective. Advanced packaging is another matter but that's not a monolithic 3D IC.
Don't forget they can use the stacking to make bigger caches. At present a cache miss is costly in terms of CPU cycles wasted to get the right data. Increasing the cache sizes by 8-32 times their present sizes will make a measurable difference.
In my view 3D processors are not in the cards, because you have heat leakage on a stocked skyscraper platform. Please, note read/write are not emitting heat(or very little), but logical calculations and transistors switches do. It appears for the moment, 10nm is the last achievable node for the human race, since many of the silicons are not sustainable. "A more significant limitation comes from plasma damage to low-k materials. The extent of damage is typically 20 nm thick,[1] but can also go up to about 100 nm.[2] The damage sensitivity is expected to get worse as the low-k materials become more porous." wikipedia So after 10nm, the manufacturing might be impossible to us.
With silicon is key here. Intel has already talked about using other materials, like indium gallium arsenide for nodes below 10nm. There are several research univeristies (and I assume fabs too) that have working sub-10nm lithographies running on research scales.
We'll hit below 10nm, it just probably won't be on silicon.
The hardest problem is design. You take a SoC and what do you do, even if you put diff compute units on diff layers and you end up with 3-4 layers,then what? How do you scale it further, if it's a 1 time only and you can't scale it further, it has minimal relevance. NAND went 24 layers, 32, 48 and will keep going. It has many repetitive structures and you can keep scaling. For a SoC you can't do that and there is no need for now for lots of CPU cores to just add that. For GPU it would work, at least when adding more layers it's easy to figure out what to put on them, more cores. In the end even the monolithic 3D ICs that we aim for are just layered (as in ,lest say a core, is still planar and on one layer not a cube using multiple layers) and going beyond that is a lot harder and way more interesting. Folks will find a way to go around all the other problems in the early stages but how to scale beyond a few layers in a way that makes sense ...
It can possibly make some sense for a SoC. Once you get beyond thermal issues, you can reduce latencies within the SoC by layering. Place the SRAM on top (or bottom), memory controller on the other side and you've potentially reduced the distance that signals have to travel from the cores/system bus to memory. Sure, we are talking picosecond latency reductions...but it does reduce latency. It also possibly opens things up (if it reduces overall cost) to larger L3 and L2 caches, possibly larger iGPUs and so on.
Nevermind the manufacturing difficulties. You have a physical limitation you can't get around. You can't just double or triple the number of active transistors in a given area without doubling or tripling the size without melting the chip. They could do clever and complicated things to use more transistors and stay in the same power budget but you won't see incredible leaps in performance like the incredible leaps in density we're seeing with storage.
Tosh said they are sampling not shipping and Sandisk said "Pilot production will commence as planned in the second half of 2015 in the Yokkaichi joint venture facility, with meaningful commercial production targeted for 2016."
"It's impossible to outright say that one cell structure is better than the other because in the end it all boils down to cost where floating gate design is probably more cost efficient for Intel-Micron given their deep knowledge of its functionality."
Charge trap is supposed to be more cost effective and reliable so your statement seems a bit misleading. Maybe they couldn't figure out charge trap or maybe they decided to not risk failing at it but it seems unlikely that they went this way for better costs.
more cost effective and more reliable might be true for the long term, but it's possible Micron/Intel wasn't willing to take that risk on a their gen 3D NAND.
I haven't read anything that would indicate charge trap being more cost effective in general. From a manufacturing standpoint it might be, but once you take all the years of R&D into account the equation is no longer that simple. I didn't say floating gate is generally cheaper, but that it was likely more cost effective for Intel and Micron, which I would say is quite obvious because they did consider other structures too and I'm sure they also did the financial estimations that lead to floating gate being the chosen one.
Unless they did it to mitigate risks (Intel is not the one to take risks) or this was plan B.You assume cost was the primary factor but we all know that's not always the case.
From a finance perspective, risk has a monetary value as well. It may result in higher profit, but it may also result in loss. It sounds reasonable that Intel-Micron decided to play safe and use a well known cell structure because going with an alternative structure was likely too risky and hence possibly higher cost as well (when considering the development cost, that is).
Whereupon we repeat once again our ill-fated proposal of a new storage metric: HOW LONG DOES IT TAKE TO READ THE ENTIRE DEVICE JUST ONCE? e.g. 4TB / 200MBps (HDD) = ~5.6 hours, 10TB / 600MBps (SSD) = ~4.6 hours "There is no oligopoly here," declared the church mouse. "Just the very same 3D announcements on the very same day," replied the Choir Master. :)
After the experience I had with Intel 320 SSD, I am not sure about Intel's reliability. They kept saying they fixed the bricking for their SSD and it still does.
The description was based on a graph that was deleted by Micron's request and replaced by the graph that's now in place. Ryan deleted the graph while I was asleep, but didn't edit the explanation, hence the confusion. I've now edited the paragraph to be inline with the new graph.
P.S. My color vision should be just fine, or at least I've never been diagnosed with color blindness :)
Given that they share at least one broadly similar feature size, does anyone know how Intel can cram 256Gbit (32GB) on 32 layers of MLC, whereas Samsung "only" manages 16GB over 48 layers?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
34 Comments
Back to Article
blanarahul - Thursday, March 26, 2015 - link
I think Intel is preparing for 3D processors.blanarahul - Thursday, March 26, 2015 - link
I read somewhere that Intel has no plans to go below 10 NM on processor side of things so they will look into silicon stacking. I guess they are doing stacked floating gate NAND to prepare for that.JKflipflop98 - Saturday, March 28, 2015 - link
I can assure you we have plans to go below 10nm. You heard wrong.jjj - Thursday, March 26, 2015 - link
lol how?It's one thing to have simple repetitive structures and go 3D and quite another to have a very complex SoC.
Would be amazing if someone could do 3D GPUs (those at least have thousands of cores) but no clue if even that is in any way doable.
Morawka - Thursday, March 26, 2015 - link
can you imagine the yeilds.. 4 out of 5 layers turned out good, we are gonna have to make this a i5.they are going to have to invent ways to propagate heat vertically if 3d gpu/cpu are gonna become a thing..
ptmmac - Sunday, March 29, 2015 - link
This is purely speculation, but the move to 3 d layers may make FPG's more important for the chip design. Layering in hundreds of layers may require more ability to program around the flawed area's of the chip. Having FPG's integrated into the stack could allow them to reroute around broken or flawed areas on the chip. The downside for FPG's has been the slower process until Intel started selling their best process to companies that produce chips that don't compete with x86 chips.FPG's are also slow to change in comparison to preset silicon like current Intel CPU's. I would expect integrating them into the processor from the begining would be no small engineering feat, but compared to continuing to shrink die sizes it may be much easier.
bji - Thursday, March 26, 2015 - link
How do they make the dies for 3d NAND? My understanding of process tech is not strong, I thought that they did lithography on the face of a silicon wafer. How do they create structures deeply embedded within silicon for 3d NAND?jjj - Thursday, March 26, 2015 - link
Maybe try reading this, it's not about NAND but will give you an idea how complicated all is.http://www.anandtech.com/show/8223/an-introduction...Or just look at this pic from that article http://images.anandtech.com/doci/8223/550px-Cmos-c...
For NAND not ideal for what you are asking but this article mentions the review for the Samsung 850 Pro and it would help you understand http://www.anandtech.com/show/8216/samsung-ssd-850...
jjj - Thursday, March 26, 2015 - link
You can also have a look here http://www.chipworks.com/en/technical-competitive-...Vatharian - Thursday, March 26, 2015 - link
I know, how compilicated current SoC designs are. 8-10 layers of interconnects and parts are norm. But 32 layers of NAND, which are themselves 3D structures now (3-4 layers of components for each NAND layer, I'm guessing), ups this to over 100+ component layers. This makes actual litography process longer, and errors stack. Yields HAVE to go down, unless Samsung has way of laying down large part or all of layers at once.zepi - Thursday, March 26, 2015 - link
Multilayer CPU'd would allow huge L4 caches possibly making even DRAM unnecessary...zepi - Thursday, March 26, 2015 - link
Oh, and just piling memory layers on top of each other and then CPU on top of that should allow reasonably easy way to disable bad blocks.And nothing really prevents piling EDRAM / SRAM whatnot there...
jjj - Thursday, March 26, 2015 - link
Except costs and the huge difference in manufacturing between RAM and CPUs.For a long time some people expected RAM and CPU on the same die and it didn't and won't happen because it's not cost effective. Advanced packaging is another matter but that's not a monolithic 3D IC.
jjj - Thursday, March 26, 2015 - link
For anyone curious Qualcomm was talking about monolithic 3D ICs last year but it feels more like a call to arms than them having a practical solution soon.http://www.eetimes.com/author.asp?doc_id=1322783
http://www.techdesignforums.com/blog/2014/06/05/ka...
earl colby pottinger - Friday, March 27, 2015 - link
Don't forget they can use the stacking to make bigger caches. At present a cache miss is costly in terms of CPU cycles wasted to get the right data. Increasing the cache sizes by 8-32 times their present sizes will make a measurable difference.Vlad_Da_Great - Thursday, March 26, 2015 - link
In my view 3D processors are not in the cards, because you have heat leakage on a stocked skyscraper platform. Please, note read/write are not emitting heat(or very little), but logical calculations and transistors switches do. It appears for the moment, 10nm is the last achievable node for the human race, since many of the silicons are not sustainable. "A more significant limitation comes from plasma damage to low-k materials. The extent of damage is typically 20 nm thick,[1] but can also go up to about 100 nm.[2] The damage sensitivity is expected to get worse as the low-k materials become more porous." wikipedia So after 10nm, the manufacturing might be impossible to us.azazel1024 - Friday, March 27, 2015 - link
With silicon is key here. Intel has already talked about using other materials, like indium gallium arsenide for nodes below 10nm. There are several research univeristies (and I assume fabs too) that have working sub-10nm lithographies running on research scales.We'll hit below 10nm, it just probably won't be on silicon.
jjj - Friday, March 27, 2015 - link
The hardest problem is design.You take a SoC and what do you do, even if you put diff compute units on diff layers and you end up with 3-4 layers,then what? How do you scale it further, if it's a 1 time only and you can't scale it further, it has minimal relevance.
NAND went 24 layers, 32, 48 and will keep going. It has many repetitive structures and you can keep scaling. For a SoC you can't do that and there is no need for now for lots of CPU cores to just add that. For GPU it would work, at least when adding more layers it's easy to figure out what to put on them, more cores.
In the end even the monolithic 3D ICs that we aim for are just layered (as in ,lest say a core, is still planar and on one layer not a cube using multiple layers) and going beyond that is a lot harder and way more interesting.
Folks will find a way to go around all the other problems in the early stages but how to scale beyond a few layers in a way that makes sense ...
azazel1024 - Monday, March 30, 2015 - link
It can possibly make some sense for a SoC. Once you get beyond thermal issues, you can reduce latencies within the SoC by layering. Place the SRAM on top (or bottom), memory controller on the other side and you've potentially reduced the distance that signals have to travel from the cores/system bus to memory. Sure, we are talking picosecond latency reductions...but it does reduce latency. It also possibly opens things up (if it reduces overall cost) to larger L3 and L2 caches, possibly larger iGPUs and so on.willis936 - Thursday, March 26, 2015 - link
Nevermind the manufacturing difficulties. You have a physical limitation you can't get around. You can't just double or triple the number of active transistors in a given area without doubling or tripling the size without melting the chip. They could do clever and complicated things to use more transistors and stay in the same power budget but you won't see incredible leaps in performance like the incredible leaps in density we're seeing with storage.jjj - Thursday, March 26, 2015 - link
"Toshiba announced that it has begun shipments "Tosh said they are sampling not shipping and Sandisk said "Pilot production will commence as planned in the second half of 2015 in the Yokkaichi joint venture facility, with meaningful commercial production targeted for 2016."
"It's impossible to outright say that one cell structure is better than the other because in the end it all boils down to cost where floating gate design is probably more cost efficient for Intel-Micron given their deep knowledge of its functionality."
Charge trap is supposed to be more cost effective and reliable so your statement seems a bit misleading. Maybe they couldn't figure out charge trap or maybe they decided to not risk failing at it but it seems unlikely that they went this way for better costs.
menting - Thursday, March 26, 2015 - link
more cost effective and more reliable might be true for the long term, but it's possible Micron/Intel wasn't willing to take that risk on a their gen 3D NAND.jjj - Thursday, March 26, 2015 - link
Yeah and i actually mention that but the statement suggests that floating gate was likely cheaper and that seems baseless.Kristian Vättö - Thursday, March 26, 2015 - link
I haven't read anything that would indicate charge trap being more cost effective in general. From a manufacturing standpoint it might be, but once you take all the years of R&D into account the equation is no longer that simple. I didn't say floating gate is generally cheaper, but that it was likely more cost effective for Intel and Micron, which I would say is quite obvious because they did consider other structures too and I'm sure they also did the financial estimations that lead to floating gate being the chosen one.jjj - Thursday, March 26, 2015 - link
Unless they did it to mitigate risks (Intel is not the one to take risks) or this was plan B.You assume cost was the primary factor but we all know that's not always the case.Kristian Vättö - Friday, March 27, 2015 - link
From a finance perspective, risk has a monetary value as well. It may result in higher profit, but it may also result in loss. It sounds reasonable that Intel-Micron decided to play safe and use a well known cell structure because going with an alternative structure was likely too risky and hence possibly higher cost as well (when considering the development cost, that is).MRFS - Thursday, March 26, 2015 - link
Whereupon we repeat once again our ill-fated proposal of a new storage metric:HOW LONG DOES IT TAKE TO READ THE ENTIRE DEVICE JUST ONCE? e.g.
4TB / 200MBps (HDD) = ~5.6 hours, 10TB / 600MBps (SSD) = ~4.6 hours
"There is no oligopoly here," declared the church mouse.
"Just the very same 3D announcements on the very same day," replied the Choir Master. :)
Hulk - Thursday, March 26, 2015 - link
3000 P/E cycles for MLC or TLC?sharath.naik - Thursday, March 26, 2015 - link
After the experience I had with Intel 320 SSD, I am not sure about Intel's reliability. They kept saying they fixed the bricking for their SSD and it still does.toyotabedzrock - Thursday, March 26, 2015 - link
I think you have a minor case of color blindness given your description of the diagram.rpg1966 - Friday, March 27, 2015 - link
Yes. It's blue and black.Kristian Vättö - Friday, March 27, 2015 - link
The description was based on a graph that was deleted by Micron's request and replaced by the graph that's now in place. Ryan deleted the graph while I was asleep, but didn't edit the explanation, hence the confusion. I've now edited the paragraph to be inline with the new graph.P.S. My color vision should be just fine, or at least I've never been diagnosed with color blindness :)
rpg1966 - Friday, March 27, 2015 - link
Given that they share at least one broadly similar feature size, does anyone know how Intel can cram 256Gbit (32GB) on 32 layers of MLC, whereas Samsung "only" manages 16GB over 48 layers?Kristian Vättö - Friday, March 27, 2015 - link
Larger die size and higher memory array efficiency due to the higher die capacity. Also, Samsung's 128Gbit (16GB) V-NAND is also 32-layer.