This is pretty exciting. I wonder if we'll see future SSDs with PCM/Xpoint to act as cache esp. since it's non volatile compared to DRAM that is being used currently for caching.
Funny, I was just thinking about that a few weeks ago when I read the Xpoint article. It only seems logical to see it first on high-end RAID controllers that currently rely on batteries to maintain their DRAM cache in the case of power failure. At work the only time our IBM i Series server get's powered down is to change out the internal RAID controller batteries. This happens every few years as routine maintenance. It's always a big deal because we have to schedule it for down time and everything.
Those aren't hot swap? You can use a RAID utility to flush the RAID cache (LSI\Areca this is done via command line) then pull the battery and replace it real quick. This is even more comfortable if you have dual power supplies that are connected to a UPS.
I've worked on servers with hot swap memory and it works the same way...you use a utility or flip a switch on the motherboard to put it into memory swap mode, then get to it. Always do this stuff at night during a maintenance window, but there is no need for real downtime. It's the 21st century.
Cache needs even more rewrite endurance that RAM (every cache line hash-maps to more than one RAM address.) As stated above, most of these NAND alternatives have "limited endurance (even if it is much higher than flash)".
Caches that sit between the CPU and DRAM need to be fast and sustain constant modification, but those aren't the only caches around. Almost all mass storage devices and a lot of other peripherals have caches and buffers for their communication with the host system. An SSD's write cache only needs to have the same total endurance as the drive itself, so if you've got a new memory technology with 1000x the endurance of NAND but DRAM-like speed, then you could put 1GB of that on a 1TB SSD without having to worry about burning out your cache. (Provided that you do wear-leveling for your cache, such as by using it as a circular buffer.)
Depending on what the cache is for... if it is a cache for say network card then it would only be "utilized" when there is network related traffic. whereas, ram is the "layer" to cpu and as such would have much higher writes. your point is valid when comparing the cache to the final destination that it would need more endurance than the destination, eg: hdd.
I was admittedly thinking more in terms of node cache in large custered systems with shared memory topology over networked fabric (the access latency of PCM already rules it out as a SRAM or even DRAM replacement.)
On flash drives, I'm not aware of SLC write-caches currently being a performance bottleneck...
The solution is DRAM, a smaller battery and a new memory type to write the cache into faster. Sandforce 2XXX controllers had NO DRAM cache at all, so I guess it is possible to make it work with either little cache or non-volatile memory as cache. ReRAM or PCM have 100k write cycles as endurance.
My understanding of DRAM cache on flash drives (not sure about HDDs), is that it's used to hold the page mapping tables for allication, GC, wear leveling, etc. - i.e. for use by the controller. That's what makes the batteries/supercaps necessary on the enterprise drives: if a power outage wipes the cache in the middle of GC, the entire drive us potentially trashed. Whereas losing data mid-flight on its way to storage is not catastrophic, and can't be prevented even with on-drive power backup since the drive is not entirely in control of what gets written when, and how much at a time (that scenario is better addressed by a UPS.)
For the controller, I'd assume it would be important to minimize access latencies to its data structures. The microsecond scale access times on this PCM do not compare favorably to the nanosecond response times of DRAM (never mind SRAM); substituting PCM for DRAM in this case will have a big negative impact on controller performance, therefore. That would particularly matter on high-performance drives...
Crap... The above was meant as a reply to Billy Tallis. Also, please forgive autocorrect-induced spelling snafus (I'm posting this from my Android phone.)
Some SSD controllers have made a point of emphasizing that they don't store user data in the external DRAM. If you don't have battery/capacitor backup, this is a very good idea. It's likely that some controllers have used the DRAM as a write cache in a highly unsafe manner, but I don't know which off the top of my head.
The data structures usually used to track what's in the flash are very analogous to what's used by journaling file systems, so unexpected power failure is highly unlikely to result in full drive corruption. Instead, you'll get rolled back to an earlier snapshot of what was on the drive, which will be mostly recoverable.
The 1-2 microsecond numbers refer to access latency for the whole storage stack. For the direct access of the SSD controller to the PCM, read latencies are only about 80 nanoseconds, which is definitely in DRAM territory. It's the multiple PCIe transactions that add the serious latency.
Depends on how you use it. With such low latencies, game makers or even Microsoft may decide to host all their code themselves and only make it accessible to you on demand. Yes, I know this is far fetched, but it's technically it is possible. Plus, in the 90s online verification of a game key was far-fetched and today, in some instances, we can't even play single player games without an active connection.
"None of the major contenders are suitable for directly replacing DRAM, either due to to limited endurance (even if it is much higher than flash), poor write performance, or vastly insufficient capacity."
Wait, if a "NAND replacement" technology has vastly insufficient capacity to replace DRAM then how is it going to have sufficient capacity to replace NAND?
Anyway even the latest interfaces will be out of date when devices start coming out with these next-gen NVM solutions, even if it's used just initially as a very large (and safe) cache to supplement traditional high-capacity NAND. Something like 2TB of NAND and 8GB of PCM/XPoint cache, for example.
Interestingly, PCIe4 (with double the PCIe3 bandwidth) is due to deploy by this time next year. Combined with the NVMe protocol, it just might suffice for the next few years...
What I had in mind with the "vastly insufficient capacity" comment was stuff like Everspin's ST-MRAM, which checks all the boxes except capacity, where they're just now moving up to 64Mb from the 16Mb chips that are $30+.
None of the existing interfaces will be particularly well-suited for these new memory technologies, but what HGST is doing is probably pretty important. People will want to reap as much benefit they can get out of these new memories without requiring a whole new kind of PHY to be added to their CPU die.
So we can have three non-volatile stages of storage now? Going from volatile memory aka DRAM to a small super fast PCM cache that acts as a buffer for a for fast medium to large active NAND based storage arrays followed up with a massive mechanical array of archival/cold storage data
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
19 Comments
Back to Article
Shadow7037932 - Friday, August 14, 2015 - link
This is pretty exciting. I wonder if we'll see future SSDs with PCM/Xpoint to act as cache esp. since it's non volatile compared to DRAM that is being used currently for caching.Flunk - Friday, August 14, 2015 - link
I could see something like that being available on the enterprise level some time soon.Einy0 - Friday, August 14, 2015 - link
Funny, I was just thinking about that a few weeks ago when I read the Xpoint article. It only seems logical to see it first on high-end RAID controllers that currently rely on batteries to maintain their DRAM cache in the case of power failure. At work the only time our IBM i Series server get's powered down is to change out the internal RAID controller batteries. This happens every few years as routine maintenance. It's always a big deal because we have to schedule it for down time and everything.Samus - Friday, August 14, 2015 - link
Those aren't hot swap? You can use a RAID utility to flush the RAID cache (LSI\Areca this is done via command line) then pull the battery and replace it real quick. This is even more comfortable if you have dual power supplies that are connected to a UPS.I've worked on servers with hot swap memory and it works the same way...you use a utility or flip a switch on the motherboard to put it into memory swap mode, then get to it. Always do this stuff at night during a maintenance window, but there is no need for real downtime. It's the 21st century.
jwhannell - Saturday, August 15, 2015 - link
or just put the host into maintenance mode, move all the VMs off, and perform the work at your leisure ;)boeush - Friday, August 14, 2015 - link
Cache needs even more rewrite endurance that RAM (every cache line hash-maps to more than one RAM address.) As stated above, most of these NAND alternatives have "limited endurance (even if it is much higher than flash)".Billy Tallis - Friday, August 14, 2015 - link
Caches that sit between the CPU and DRAM need to be fast and sustain constant modification, but those aren't the only caches around. Almost all mass storage devices and a lot of other peripherals have caches and buffers for their communication with the host system. An SSD's write cache only needs to have the same total endurance as the drive itself, so if you've got a new memory technology with 1000x the endurance of NAND but DRAM-like speed, then you could put 1GB of that on a 1TB SSD without having to worry about burning out your cache. (Provided that you do wear-leveling for your cache, such as by using it as a circular buffer.)XZerg - Friday, August 14, 2015 - link
Depending on what the cache is for... if it is a cache for say network card then it would only be "utilized" when there is network related traffic. whereas, ram is the "layer" to cpu and as such would have much higher writes. your point is valid when comparing the cache to the final destination that it would need more endurance than the destination, eg: hdd.boeush - Friday, August 14, 2015 - link
I was admittedly thinking more in terms of node cache in large custered systems with shared memory topology over networked fabric (the access latency of PCM already rules it out as a SRAM or even DRAM replacement.)On flash drives, I'm not aware of SLC write-caches currently being a performance bottleneck...
frenchy_2001 - Friday, August 14, 2015 - link
The solution is DRAM, a smaller battery and a new memory type to write the cache into faster.Sandforce 2XXX controllers had NO DRAM cache at all, so I guess it is possible to make it work with either little cache or non-volatile memory as cache.
ReRAM or PCM have 100k write cycles as endurance.
boeush - Friday, August 14, 2015 - link
My understanding of DRAM cache on flash drives (not sure about HDDs), is that it's used to hold the page mapping tables for allication, GC, wear leveling, etc. - i.e. for use by the controller. That's what makes the batteries/supercaps necessary on the enterprise drives: if a power outage wipes the cache in the middle of GC, the entire drive us potentially trashed. Whereas losing data mid-flight on its way to storage is not catastrophic, and can't be prevented even with on-drive power backup since the drive is not entirely in control of what gets written when, and how much at a time (that scenario is better addressed by a UPS.)For the controller, I'd assume it would be important to minimize access latencies to its data structures. The microsecond scale access times on this PCM do not compare favorably to the nanosecond response times of DRAM (never mind SRAM); substituting PCM for DRAM in this case will have a big negative impact on controller performance, therefore. That would particularly matter on high-performance drives...
boeush - Friday, August 14, 2015 - link
Crap... The above was meant as a reply to Billy Tallis. Also, please forgive autocorrect-induced spelling snafus (I'm posting this from my Android phone.)Billy Tallis - Sunday, August 16, 2015 - link
Some SSD controllers have made a point of emphasizing that they don't store user data in the external DRAM. If you don't have battery/capacitor backup, this is a very good idea. It's likely that some controllers have used the DRAM as a write cache in a highly unsafe manner, but I don't know which off the top of my head.The data structures usually used to track what's in the flash are very analogous to what's used by journaling file systems, so unexpected power failure is highly unlikely to result in full drive corruption. Instead, you'll get rolled back to an earlier snapshot of what was on the drive, which will be mostly recoverable.
The 1-2 microsecond numbers refer to access latency for the whole storage stack. For the direct access of the SSD controller to the PCM, read latencies are only about 80 nanoseconds, which is definitely in DRAM territory. It's the multiple PCIe transactions that add the serious latency.
bug77 - Monday, August 17, 2015 - link
Depends on how you use it. With such low latencies, game makers or even Microsoft may decide to host all their code themselves and only make it accessible to you on demand. Yes, I know this is far fetched, but it's technically it is possible. Plus, in the 90s online verification of a game key was far-fetched and today, in some instances, we can't even play single player games without an active connection.Alexvrb - Friday, August 14, 2015 - link
"None of the major contenders are suitable for directly replacing DRAM, either due to to limited endurance (even if it is much higher than flash), poor write performance, or vastly insufficient capacity."Wait, if a "NAND replacement" technology has vastly insufficient capacity to replace DRAM then how is it going to have sufficient capacity to replace NAND?
Anyway even the latest interfaces will be out of date when devices start coming out with these next-gen NVM solutions, even if it's used just initially as a very large (and safe) cache to supplement traditional high-capacity NAND. Something like 2TB of NAND and 8GB of PCM/XPoint cache, for example.
boeush - Saturday, August 15, 2015 - link
Interestingly, PCIe4 (with double the PCIe3 bandwidth) is due to deploy by this time next year. Combined with the NVMe protocol, it just might suffice for the next few years...Peeping Tom - Sunday, August 16, 2015 - link
Sadly they are saying it could take until 2017 before PCIe4 is finalized:http://www.eetimes.com/document.asp?doc_id=1326922
Billy Tallis - Sunday, August 16, 2015 - link
What I had in mind with the "vastly insufficient capacity" comment was stuff like Everspin's ST-MRAM, which checks all the boxes except capacity, where they're just now moving up to 64Mb from the 16Mb chips that are $30+.None of the existing interfaces will be particularly well-suited for these new memory technologies, but what HGST is doing is probably pretty important. People will want to reap as much benefit they can get out of these new memories without requiring a whole new kind of PHY to be added to their CPU die.
SunLord - Monday, August 17, 2015 - link
So we can have three non-volatile stages of storage now? Going from volatile memory aka DRAM to a small super fast PCM cache that acts as a buffer for a for fast medium to large active NAND based storage arrays followed up with a massive mechanical array of archival/cold storage data