Do you have any information on where do cell sizes stand, these days? Samsung, as an example, when they switched to layered NAND reverted to a more durable and reliable ~40nm cell size, from their previous 1Xnm class planar stuff. And I imagine other manufacturers did something similar. Did this still old true and they have just been resorting to stacking more layers to increase density, or have the cell sizes begun to shrink again?
I don't have any good numbers on that handy. Cell horizontal dimensions have probably shrunk a little bit, and the spacing between vertical channels has definitely been reduced to the point that interference between strings is one of the more significant sources of error. I'm not sure what games they've been playing with layer thickness. There probably won't be drastic shrinks to horizontal dimensions because they don't want to have to go back to using multiple patterning like they did for 15nm planar NAND.
Encouraging news. It's still very impressive how fast SSD tech is advancing. High-speed low power 4-channel controllers are good news for portables and low prices, while the high-speed x8 controllers look set to saturate PCIe 5.0 before it's even properly widely used.
SSDs still notably weak on random I/O though. Optane tried to address this but it looks almost dead in the consumer market.
Billy, any thoughts on how SSDs will move forward with random i/o or do you think it's not worth addressing? Personally I'd happily give up 25% on max sequential for a doubling of random i/o but I appreciate that may not suit most people.
Random read throughput at high queue depths is growing healthily.
TLC read latency is still creeping downward, despite the natural tendency for tR to grow as NAND strings get longer. They're putting in a lot of effort to optimize latency already.
The only ways to drastically improve tR for NAND flash memory are to use smaller page sizes than the current standard of 16kB, or to store fewer bits per cell. Both techniques are very detrimental to $/GB, but Samsung and Kioxia have both dabbled in small page size 3D SLC as an Optane competitor. What keeps them from experiencing much success in the market is that improving tR has diminishing returns for overall system/application performance. There aren't a lot of applications that run significantly better with 500k IOPS @ 10µs random reads than with 500k IOPS @ 60µs random reads (on a larger and cheaper drive).
Thank you for the reply. Yup high queue depth random i/o looks good. I was thinking more of low queue depths, but thank you for explaining the issues.
One way forward could be to designate part of the drive as specially tuned for small files / small page size? A hypothetical 50GB-ish area could contain a hell of a lot of small files and wouldn't be missed on a 1TB+ drive, and would only cost a few dollars more. The advantages would be faster return to sleep, lower power consumption, and yes, marketing numbers.
This might look like the cache debate all over again, but this would be specifically designated for small frequently used files, so not a cache, and not quite a tier / SLC holding area as already implemented in many drives. The SSD mapping would just point towards the special area for small files (and possibly move the least recently used small files to main storage if more space was required)
It does seem like you're describing pseudo-SLC caching most SSDs do, but if I'm interpreting correctly, you're differentiating it as a permanent storage for small, frequently-used files versus just a cache of all frequently-used files. So basically you're talking about tiered storage, where you move the most-accessed files into the highest-speed storage storage and the rest in a lower storage category. This is a feature of a lot of SANs, but for local SSDs, you'd need OS support + hardware support to make that happen.
Almost right thanks. Lots of consumer SSDs do already have tiered storage, where part of the SSD is treated as SLC and the rest is TLC / QLC / PLC. (Details differ between models).
However as Billy Tallis said, for large files there is not much scope for gain from improving latency. Suppose a 100MB file takes 0.1 second to transfer. Whether latency takes 60 ms or 20ms makes little difference.
However for a 4KB or 128KB file, latency makes up a much bigger part of the transfer time. So having a special storage area with ultra low latency for frequently used files under say 512KB could really help with low queue (consumer use) random small file I/O. To give you an idea of the potential, SSDs can shift large files at around 4GB/sec, but for single queue small files, that falls to around 50MB/ sec.
Small files are the kind of thing that portables wake up for - email, notifications, social media, network activity, housekeeping etc, so improving this area would get them back to sleep again faster. As well as supporting other areas like AI-related data scanning and database building.
Yes. As SSD's diverge in capabilities, storage tiering (Like the FuzeDrive did) will become a necessity. Placing the hottest files in smaller pSLC blocks could really help with QLC reliability and average latency, not to speak of PLC.
But in applications where peak performance isn't vital, like consumer drives, the tendency is towards driver/hardware/firmware simplification (To reduce costs). Someone will end up using QLC SSD's with ZNS to murder the budget TLC market.
And while ZNS already adopted ways for managing distinct random reads/writes regions, I don't know how tiering could be done without specifically designing for it, and if they didn't already do it with ZNS it will be hard to change that.
Enmotus sells a Phison E12 QLC drive with custom firmware that presents an SLC portion and a QLC portion to the host system, rather than using drive-managed SLC caching. Their 2TB model gives you ~1.6TB usable space, the first 128GB of which are SLC. Their FuzeDrive software manages data placement, but also lets you manually pin data to either the fast or slow tier. It's a really interesting approach.
I've had the hardware for a while, but haven't gotten around to properly testing it. The tiering software for the FuzeDrive SSD is Windows-only, which is a nuisance since a lot of my test suite is Linux based. But I did make some changes to the new Linux test suite with an eye toward being able to test the SLC and QLC portions of that drive separately.
I look forward to this. I keep waiting for something significant to happen that will reduce game lead times and increase overall performance in my use case. I still run 3x1TB 850 EVOs in RAID 0 on my desktop because I cannot feel any appreciable difference in performance over a great pcie 3.0 ssd like the SK Hynix Gold P31 I put in my XPS 17 9700. Maybe this new method of caching will help?
Rather than small files, I'd use low-density pages for filesystem journal, metadata, and i-nodes. That would really speed up your random access for all files, and should scale a lot better, since filesystems vary a lot in terms of their filesize distribution.
Even though I have too little knowledge about 3D NAND to understand all of this article, I found it interesting and learned something.
One question: All these presentations at the academic ISSCC that you are reporting on come out of industry, not academia, and I guess development, not industry research (although there is not always a clear divide). What is the motivation for the companies for presenting these things?
There are many labs which could buy the products and tear them apart to gain much of what's being published here. The critical know-how if "how to make it". So they publish some uncritical information for publicity and probably also discussion.
Every time I look at QLC specs, I think "why didn't they just stop with TLC as the baseline?" Then all storage could be expected to hit at least those minimum specs. Sort of like the XSX/PS5 SSDs - the SSDs have to meet minimum specs, which then lets developers program to having that level of IO always available. Instead, we have QLC and soon PLC just resetting the bar lower and lower until we're almost back at HDD levels (like some of those Intel 600-series).
I'm really curious what companies see as QLC's consumer future in the next five years.
The consumer numbers just don't add up in any reasonable way and haven't for years now. Cheap TLC SSDs are still reaching great prices at 1 TB ($110) and 2 TB($220): https://pcpartpicker.com/products/internal-hard-dr...
QLC is $100 and $200, a measly 10% discount respectively. How many more years can QLC sustain such poor market positioning before manufacturers call it quits and move to commercial / enterprise drives only? Two years? Four years?
I guess it'll be like SMR: subterfuge, lies, and OEM buy-in. IIRC, all of HP's Spectre laptops still ship with a 1% Optane & 99% QLC drive (the Intel H10) by default.
QLC and PLC are just getting started. At large capacities eg 4TB QLC performs surprisingly well with a lot of room to fold into a SLC hot cache / tier. At a (very) wild guess PLC will also start performing well at 8TB+ SSDs. See Wereweeb's comment below on future SSD capacities.
New tech often starts out with a worse performance than the optimised old tech in the first few years of its life.
I guess 2020 was the last stage of the S-Curve in terms of cost reduction. There are still margin could be squeezed from the current cost. But in terms of NAND I dont see a clear path for a general 2TB SSD at sub $100 in 2023 / 24.
What? All these companies plan on making NAND with 500+ layers in the next few years. I.e. four times as much storage capacity in roughly the same area as current 128 layer designs. If all goes well in 2026 we'll have 32TB m.2 SSD's.
Intel 3D NAND is Floating Gate fabricated with an oxide/poly stack. Oxide/poly is much harder to etch than Charge Trap oxide/nitride stacks and they are pretty much stuck at 48 layers per string. The 3 x 48 = 144 has been known for a while now. What is surprising to me is SK Hynix says they will extend this technology for 2 to 3 more generations which would imply 4 x 48 = 192L, 5 x 48 = 240L and maybe even 6 x 48 = 288L.
I don't suppose anyone at ISSCC was talking about post-NAND storage, where they? There was a brief burst of news a couple years back when Optane hit and other technology entered development, but now there's nothing.
The truth is, there's just not much to talk about with the new NVM's. Everything has been demonstrated in labs over and over again. A few of them are shipping in small quantities for niche applications. But they can't compete with DRAM or NAND so almost no one is willing to sink billions to try mass-producing one of them.
Only Intel and Micron had the balls to, but they unfortunately betted on the wrong horse. XPoint has revealed itself to be VERY inferior to DRAM, and it simply can't scale like NAND does. Intel's Optane business is STILL bleeding money thanks to that, and I don't know if will ever pay for itself.
MRAM is actually shipping in small quantities, and it's competitive with DRAM in performance, plus can be tuned for a variety of different purposes (including MLC), so it's generally promising. After a decade of big talk and nothing to show for it (Except for rumours of military applications), even Nantero's NRAM seems to be an actual thing now, with Fujitsu licensing the technology.
The problem is, since none of these can compete with the price per bit of DRAM (Or NAND, for XPoint), so there's no incentive to refine them. For the big companies, time and money spent on these technologies means losing market share in the NAND and DRAM mass market.
We just have to wait for DRAM to hit a wall, or for the MRAM startups to mature their technologies enough for the big silicon companies to start adquiring them, jumpstarting the start of the transition. Considering MRAM is still at 1Gb dies, vs DRAM's 32Gb (With plans for 64Gb)... it might take a while.
I just hope someone will pick up CeRAM and give it a good go, it looks very promising.
It’s slightly further up the memory hierarchy, but Apple is doing very interesting things with their CPU and memory. I would definitely keep an eye on that as they’ve only just started exploring the potential there. I have absolutely no idea what their desktop chips / memory subsystems will look like in a few years down the line (unlike Intel / AMD).
You won’t need to wait that long - Apple will be releasing V2 of their M1 systems later this year which should give us an indication of their direction of travel.
I assumed Wereweeb meant scaling as in shrinking the cell size to improve bit density, though I guess that's effectively the same thing as "scaling up" to bigger capacities - just viewed from the opposite direction!
I suspect both are true, too. If they can't make high-capacity dies at relatively small sizes then the product isn't going to sell well, which in turn precludes increasing the quantity of dies in mass-production.
the thing is, scaling to a 'flat' 64 bit address space in NVM, not just virtual memory by whatever name, can only be worth the effort when code (O/S and applications) and hardware throws out all those tiers/caches/buffers/etc. but the industry bought into 'memory hierarchy' too long ago. deciding on a transaction protocol when, theoretically, you only need to write once out in NVM will be no small feat. just consider how much of a current cpu and O/S are devoted to managing all those memory classes.
I would think Intel, someplace in its skunkworks, has a skeletal cpu (FPGA based one expects) and appropriate O/S, also anorexic, to demonstrate this. if done smartly any application that doesn't do its own I/O (aka, not industrial strength RDBMS) won't need anything more than a C compiler and a recompile.
other than google, how many applications need more than 16 exabytes anyway?
You're a bit behind. There's a lot less code dealing with tiers/ caches / buffers than you think. Most applications don't deal with tiers/ caches/ buffers etc. They mostly just create data structures and write to / read from them.
The OS deals with storing these data structures, and even so, that mostly means storing in RAM, and a bit of transferring to / from cold storage (mostly SSD nowadays) as needed.
Inside a SSD, the SSD firmware deals with the minutiae, not the OS. The OS doesn't have a blind idea where anything physically is on the SSD, and doesn't need to, it's all virtualised storage. The SSD is a black box that presents a virtual front end to the OS.
As for the CPU, the CPU firmware deals with the caches, not the OS. All modern CPUs are basically also black boxes with a virtualised front end (usually CISC) that the OS deals with. Instructions sent to the CPU front end are translated internally into RISC code that differs between various steppings and models of CPUs. That's how you can swap between a wide variety of models and types of CPU without the OS falling over.
You may be thinking that's a lot of black boxes. It makes for flexible software and hardware, where different parts can be swapped for improved versions without the rest of the system needing to know that anything has changed. An even more modern trend is for browser apps, which don't even need to know if they are running on Windows or macOS or Linux or anything else - the entire system is a black box under the browser app.
The power of abstraction. I sometimes wonder whether our universe isn't running under virtualisation on the Creator's computer, but this remark will be controversial, so let me leave it at that.
There are a few proper scientific papers that make the rounds, every now and again, that attempt to posit how we can determine if we're indeed in some sort of Matrix-like simulation, or if the universe is a simulation running inside some sort of hyper-dimensional computer.
Thanks. Can't wait to read that, and will return with my thoughts. But I will say this for now: I have often felt there's something very computer-like to it all. Quantum mechanics' haziness, for one, reminds me of lossy compression. Perhaps that was more economical than storing everything to the uttermost precision. Anyhow, I hope the universe is backed up and there are no power failures, otherwise we're cooked.
> Quantum mechanics' haziness, for one, reminds me of lossy compression.
Beware of taking analogies too far. Quantum mechanics defies intuition. You just need to learn its rules and leave behind your preconceptions and macro thinking. I love how quantum computers are part extreme engineering, part cutting-edge science experiment:
"That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi's paradox"
TL;DR: the aliens are sleeping until the universe cools enough that they can OC their superintelligent CPUs. OC by like 10^30 times. I'd hit the snooze button for that.
> I hope the universe is backed up and there are no power failures, otherwise we're cooked.
Or we could get taken out by an asteroid or a gamma-ray burst. But, a more terrestrial threat (of non-human origin) exists: super-volcanos. They're more common than big asteroid impacts and can have similar climatic effects.
Apologies in advance: my layman physics is quite rusty. And yes, we should be wary of analogies, especially those concerning QM, but it's fun thinking these things, more so when they're whimsical.
I reckon the oddness of QM is due to its having stumbled upon the low-level implementation of the universe, whereas classical physics was a bit like an interface or API. The two don't quite square. Take entanglement, doesn't seem to make sense intuitively; but what if it were a hint---a side effect, as a programmer would say---of the underlying structure. Perhaps at that level, things are stored in a non-local fashion, or as pure data, but going through a transformation layer (our space-time), we get the illusion of distance. That would explain the instantaneous collapse into opposite spins, though the particles are separated by arbitrary distance. I also fancy that superposition before collapse is simply because particles don't have that "property" till it *has* to be calculated: perhaps a cost saving measure by the Designer, along with some useful side effects. (Cf. a C++ class, where not all values are stored. Others are calculated as they're requested.)
I'll respond to the other points you noted as I think about them.
That's a pretty big leap. Entanglement is the rare exception, not the rule.
If we're lucky, we'll live to see the day that physics is completely solved. Then, we can start to meaningfully ask "why?" and "what does it mean?" Until then, I just see it as a marvel of human understanding that's beautiful in its weirdness and otherness.
Concerning quantum computers, I need to get a better grip on the topic. I remember when first reading about them in 2016, I felt a bit sceptical, especially concerning that DWAVE machine or whatever it was called. But, need to bring myself up to date on the topic.
"if you somehow manage to live long enough, you'll freeze. The universe will ultimately undergo heat death"
I think one can spend a lifetime meditating on this topic, about the end of the universe, heat death, and entropy. I used to think about it a lot but haven't in a while, so it's hazy. (Still have to work out whether Nolan's reverse entropy was nonsense or plausible.) Anyway, there's some elusive link between time, entropy, heat, and the cooling at the end. Tegmark had some nice remarks there, as well as Penrose and Rovelli. As to fiction, Asimov's "Last Question" is well worth reading if you haven't already. It's about a computer that ponders the question: how can the heat death be avoided, and entropy reversed? The ending is golden. I apologise for the poor link but couldn't find a better one:
"Until then, I just see it as a marvel of human understanding that's beautiful in its weirdness and otherness."
It's mind-boggling how the human mind found out all these things. Even our devices today spring from these discoveries. I doubt whether physics will be solved in our lifetime. Yet, I just wish I could understand "why," and what's out there, beyond our universe. Some will say a futile question but I think about it often.
> Inside a SSD, the SSD firmware deals with the minutiae, not the OS.
There's a move to expose the details of SSDs for the host to manage, rather than the SSD controller having to guess what the host OS/application wants. It's mostly for enterprise applications, though. I forget what name it's under.
> As for the CPU, the CPU firmware deals with the caches, not the OS.
That's not true. Even leaving aside security mitigations, software (i.e. OS/drivers) always had to flush or invalidate caches for memory regions being read/written by devices.
> You may be thinking that's a lot of black boxes. It makes for flexible software and hardware, where different parts can be swapped for improved versions without the rest of the system needing to know that anything has changed.
This is a convenient approach, until either the cost of those abstractions adds up, or they prevent software from doing clever optimizations that can't be anticipated by the caches. For instance, GPUs have at least some of the on-chip memories managed by the software, because it has a better idea of what data it wants fast access to, and for how long. Also, caching has overheads, even when it does exactly what you want.
The trick is to have an API that hides enough details that you get portability between different hardware (and that includes running *well* on them, sometimes referred to as "performance portability"), while still being as easy as possible to use (correctly) and low-overhead enough that it doesn't hamper the perfomance and lead to developers seeking other options.
That's not a small order, and goes some ways towards explaining why APIs need to change with the technology (both on host and device-side). Also, workloads evolve with the technology, and that exposes new bottlenecks or limits on the scalability of earlier APIs.
I agree, that's the ideal. Many APIs and frameworks were taken a bit too far, where ease, tidiness, and security came before everything else. Over time, or from the word go, their performance was lacking. Though I haven't used it, Microsoft's .NET comes to mind; and Qt, while being fun, tends to encourage one to use features in an inefficient way.
Certainly no expert but I'd say the Win32 API is an example of having the right balance. It forces programmers to work in a roundabout but efficient way, compared to newer frameworks and languages, where the shorter styles tend to come at a cost. Sure, it's old-fashioned and clumsy, but has a strange elegance.
Well, an interesting example was the old argument about immediate mode vs retained mode graphics APIs. Retained mode was intended to be more efficient, but resulted in a much more complicated API. Some folks (I think most notably, John Carmack) decided to compare DX with OpenGL, to see if it was worthwhile, and found that the extra CPU overhead of immediate mode was small and arguably worth the simplicity of using immediate mode.
Yes, if overhead is minimal, any gain in simplicity is worth it. I wonder if graphics programmers find Vulkan and DX12 any harder than OpenGL and DX11. As for the results, they're quite striking: Vulkan, at least, picks up frame rate quite a bit.
Yes! They are both *much* more cumbersome, unless you were really pushing up against (and having to work around) the limitations of their predecessors. So, DX12 and Vulkan both make sense for game *engines*, but not the average 3D app that was just using the APIs, directly.
My experience is really with OpenGL, and it does a heck of a lot of housekeeping, scheduling, and load-balancing for you that Vulkan doesn't. But, I've heard that developers haven't fully embraced DX12, either.
I don't think many have embraced DX12 but Vulkan has certainly been delivering in engines. I saw a video where Doom, on a 2200G, goes from hopeless to fully playable; and if I'm not mistaken, id removed the OpenGL path from Eternal. Also, quite a boost on the Cemu emulator. As for me, I never did any graphics programming really, except for dabbling very slightly in D3D9 in 2008.
Don't get me wrong, there are benefits to be had by getting more direct access to the hardware, for those already willing and able to take on the additional complexity. In fact, if you were already having to work around the limitations of OpenGL or DX11, it could conceivably be *less* work to use the new APIs.
That's a nice saying. I think that's why, even in life, we rely on layers of abstraction. I decide to eat oatmeal today, not [quark structure of oatmeal]. Watching my favourite 1940s film, or is it a sequence of bytes, representing an H.264 bitstream, decoded on silicon, passed electrically through circuits, and lastly, lighting some liquid crystals. It goes on and on. Life would be impossible.
Very interesting read. My thank to @billytallis. I would like to request you to do a technology primer of ZNS. can't understand the existing explanations.
Great article. Small feedback: Thanks for writing the proper SI/IEEC units in the table. However, in the text the space is lacking between the number and the unit of measurement in several places. For instance "1.6 to 2.0 Gb/s" and "1.2Gb/s IO speeds" etc. A lot of people are influenced by Anandtech, so if you one day start to write MHZ instead of MHz, then that is going to teach a lot of people to write the improper way. The same goes for how you write units with a lack of the proper space between the number and the unit of measurement.
Considering 512 Gbit is known as is 8.5 Gbit/mm2, why don't you write ~60mm2 as the die size?
Also, checking densities of TLC vs QLC, it seems QLC is offering negligible density improvements. I wonder why they even bother developing QLC - MLC and TLC both quickly offered density improvements over prior tech, and in turn lowered $ per GB, while QLC doesn't really. Is that say 6-12 month gap in density worth enough to bother, does QLC serve as a nice learning experience for TLC, or is simply QLC seen as with much more potential to grow so these companies suffer through early days?
> Since 8 channels running at 1.2Gb/s is already enough for a SSD to saturate a PCIe 4.0 x4 connection
I don't follow, unless you mean 1.2 GB/sec. Because the way I figure it, 8 * 1.2 Gbps is 1.2 GB/sec and x4 PCIe 4.0 is about 8 GB/sec. However, 8 * 1.2 GB/sec = 9.6 GB/sec, so I guess that answers my question?
NAND IO speed is given in bandwidth per pin, similar to GDDR and LPDDR. In this case, we're talking about 8-bit wide interfaces. See http://www.onfi.org/specifications
Thanks for clarifying. Unless I missed it in the article, maybe you could mention that in the future to help inform people like me who don't follow NAND developments closely?
Aside from that minor point, thank you very much for demystifying the current state and ongoing developments in the NAND storage industry.
Billy, is there any talk of moving beyond the PCIe interface? It's a major constraint on performance at this point, even PCIe 4.0. Optane could never deliver on its unique capabilities because of the constraints of PCIe. I'm not sure why Intel bothered if they weren't going to support better interfaces to really let it breathe.
CAPI and OpenCAPI are intriguing because they have much lower latency than PCIe. It's something like 20,000 CPU cycles for every PCIe transaction vs. 500 cycles for OpenCAPI: https://en.wikipedia.org/wiki/Coherent_Accelerator...
The I/O would be unleashed if they moved to OpenCAPI or a similar deal like CXL or whatever that one is that Intel is promoting. I assume all this stuff is aimed at enterprises and data centers first, but there should be more coverage and push for it. The end of Moore's Law means we need to look at other aspects of the computing experience, like I/O and storage.
Microsoft's DirectStorage is also intriguing for getting SSDs to send directly to the GPU instead of having the CPU orchestrate everything. And for fast decompression by the GPU. All that will be faster if they move PCs to OpenCAPI.
As I understand it, there's a standard for persistent memory DIMMs and then there's OpenCAPI.
However, modern CPUs can only address a certain amount of physical memory, so I'm not sure of that would impose a practical scaling limit for memory-addressable SSDs.
The memory bus also isn't great for higher-latency SSDs, since it stalls out a CPU thread, while using PCIe enables you to post transactions or have the device initiate transfers into host memory.
I am stuck at the times where TLC's reliability to hold those 3 bits per cell was specified and compared versus MLC's 2 bit superior long term capacity. However I have here a bricked 1 GB Mushkin Reactor, apparently because I didn't need to access it for a year or so...A few years ago, tired of damaged HDs, I truly thought MLC was technically an enough good answer to my usual mobile storage needs for Virtual Instruments (VIs). In the Audio world many of us would love to have most non & musical instruments sampled available at our hands and that sometimes means a sampled piano weight of 200 or 300 GBs alone. And yes, when we play with it, we want no-latency performances, despite the myriad of effects and middle-ware DSP processing...
///Therefore how or when are going to be offered SSDs tech accessible to buy and stack nearby (separated by sound types). A library that should last say 10 years or more and that could rest idle for years and also be ready for full busy 12 hours performance for days or weeks... (a single 4 or 8 TB unit would be too risky in case of loss, better to move around as much as 1 or 2 TBs).
Right now I would need two or three of these SSDs already, but all I get offered is TLC or worst QLC drives, with a full in depth article --this one above-- with no figures or cited specifications that I could find about endurance or cell leakage-over-time explanations.
Am I stuck in the past? Lost? Or manufacturers simply do not consider SSDs for middle-long term storage and thus those professionals whose work centers at mobile workstations and need to swap large amounts of info, just have to tame aspirations for the speed, low latency, fall resistance, noiselessness and low weight SSD are already delivering? Feel like am missing something.
> Or manufacturers simply do not consider SSDs for middle-long term storage
Exactly. They are in a race for the consumer market of people who use their device daily or weekly. Intel used to be the only one I saw publishing the data retention stats on their SSDs, but I cannot even find detailed specifications of their newer products.
So, this puts you in the expensive data center market, or else you have to look at other storage options.
I would advise keeping your library on optical storage that is rated for stability over the course of decades, and then you just have to copy the instruments you need, before you want to use them.
Thank you for your answer mode_13h I was afraid to face that alternative. So, we are not already there... Middle/Long term SSD's storage is yet not available.
And also yes; optical storage, specifically MDisc technology is the only current long term solution available. In fact lasting hundreds of years and available for US$2 each 25 GB double sided bluray. Thankfully such solution exist for whole musical projects retrieval.
However once your project or data source surpass the 100 GB number or when you need to plug and obtain performance from that data (iex with VIs), particularly in music studios or any of the myriad of mobile duties, then either optical discs or risky hard drives are simply not good solutions.
///I keep witnessing horror stories of important data losses related to hard drive failures, so I cannot understand the market numbers that preclude to develop better MLC or whatever long lasting SSD tech that is required for such untapped storage need.
> Middle/Long term SSD's storage is yet not available.
No, it's going the other way. Smaller cells mean shorter retention. I recently read pictures off a Compact Flash card from 10 years ago. These days, even a premium SDXC card probably won't hold its contents for more than a year without use.
I don't know how well Optane (3D XPoint) does at longevity. It's not exactly cheap, though.
> I keep witnessing horror stories of important data losses related to hard drive failures
I use RAID-6 and make sure it regularly gets "scrubbed" (i.e. consistency-checked). Of course, I could lose it in a fire, but my high-value data is backed up in the could.
> I cannot understand the market numbers that preclude to develop better MLC or whatever long lasting SSD tech that is required for such untapped storage need.
It won't be any form of NAND flash. Besides Optane/3D XPoint, the other promising technology seems to be MRAM.
Sorry, I meant to write NRAM (see Nantero). I've also seen some suggestions that FRAM could be another option, but I'm not sure if its density is competitive.
M-Disc went bankrupt a couple of years ago, and their prices increased afterward (they continued under new ownership).
I don't know their current status, but it doesn't seem like the new owners are doing a good job with production and the website. I think drives that write M-Discs will become harder to find, and the discs will be more and more expensive, or even disappear.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
75 Comments
Back to Article
PVG - Friday, February 19, 2021 - link
Do you have any information on where do cell sizes stand, these days?Samsung, as an example, when they switched to layered NAND reverted to a more durable and reliable ~40nm cell size, from their previous 1Xnm class planar stuff. And I imagine other manufacturers did something similar.
Did this still old true and they have just been resorting to stacking more layers to increase density, or have the cell sizes begun to shrink again?
Billy Tallis - Friday, February 19, 2021 - link
I don't have any good numbers on that handy. Cell horizontal dimensions have probably shrunk a little bit, and the spacing between vertical channels has definitely been reduced to the point that interference between strings is one of the more significant sources of error. I'm not sure what games they've been playing with layer thickness. There probably won't be drastic shrinks to horizontal dimensions because they don't want to have to go back to using multiple patterning like they did for 15nm planar NAND.PVG - Friday, February 19, 2021 - link
Makes sense. Thank you!Tomatotech - Friday, February 19, 2021 - link
Encouraging news. It's still very impressive how fast SSD tech is advancing. High-speed low power 4-channel controllers are good news for portables and low prices, while the high-speed x8 controllers look set to saturate PCIe 5.0 before it's even properly widely used.SSDs still notably weak on random I/O though. Optane tried to address this but it looks almost dead in the consumer market.
Billy, any thoughts on how SSDs will move forward with random i/o or do you think it's not worth addressing? Personally I'd happily give up 25% on max sequential for a doubling of random i/o but I appreciate that may not suit most people.
Billy Tallis - Friday, February 19, 2021 - link
Random read throughput at high queue depths is growing healthily.TLC read latency is still creeping downward, despite the natural tendency for tR to grow as NAND strings get longer. They're putting in a lot of effort to optimize latency already.
The only ways to drastically improve tR for NAND flash memory are to use smaller page sizes than the current standard of 16kB, or to store fewer bits per cell. Both techniques are very detrimental to $/GB, but Samsung and Kioxia have both dabbled in small page size 3D SLC as an Optane competitor. What keeps them from experiencing much success in the market is that improving tR has diminishing returns for overall system/application performance. There aren't a lot of applications that run significantly better with 500k IOPS @ 10µs random reads than with 500k IOPS @ 60µs random reads (on a larger and cheaper drive).
Tomatotech - Friday, February 19, 2021 - link
Thank you for the reply. Yup high queue depth random i/o looks good. I was thinking more of low queue depths, but thank you for explaining the issues.One way forward could be to designate part of the drive as specially tuned for small files / small page size? A hypothetical 50GB-ish area could contain a hell of a lot of small files and wouldn't be missed on a 1TB+ drive, and would only cost a few dollars more. The advantages would be faster return to sleep, lower power consumption, and yes, marketing numbers.
This might look like the cache debate all over again, but this would be specifically designated for small frequently used files, so not a cache, and not quite a tier / SLC holding area as already implemented in many drives. The SSD mapping would just point towards the special area for small files (and possibly move the least recently used small files to main storage if more space was required)
romrunning - Friday, February 19, 2021 - link
It does seem like you're describing pseudo-SLC caching most SSDs do, but if I'm interpreting correctly, you're differentiating it as a permanent storage for small, frequently-used files versus just a cache of all frequently-used files. So basically you're talking about tiered storage, where you move the most-accessed files into the highest-speed storage storage and the rest in a lower storage category. This is a feature of a lot of SANs, but for local SSDs, you'd need OS support + hardware support to make that happen.Tomatotech - Sunday, February 21, 2021 - link
Almost right thanks. Lots of consumer SSDs do already have tiered storage, where part of the SSD is treated as SLC and the rest is TLC / QLC / PLC. (Details differ between models).However as Billy Tallis said, for large files there is not much scope for gain from improving latency. Suppose a 100MB file takes 0.1 second to transfer. Whether latency takes 60 ms or 20ms makes little difference.
However for a 4KB or 128KB file, latency makes up a much bigger part of the transfer time. So having a special storage area with ultra low latency for frequently used files under say 512KB could really help with low queue (consumer use) random small file I/O. To give you an idea of the potential, SSDs can shift large files at around 4GB/sec, but for single queue small files, that falls to around 50MB/ sec.
Small files are the kind of thing that portables wake up for - email, notifications, social media, network activity, housekeeping etc, so improving this area would get them back to sleep again faster. As well as supporting other areas like AI-related data scanning and database building.
Wereweeb - Friday, February 19, 2021 - link
Yes. As SSD's diverge in capabilities, storage tiering (Like the FuzeDrive did) will become a necessity. Placing the hottest files in smaller pSLC blocks could really help with QLC reliability and average latency, not to speak of PLC.But in applications where peak performance isn't vital, like consumer drives, the tendency is towards driver/hardware/firmware simplification (To reduce costs). Someone will end up using QLC SSD's with ZNS to murder the budget TLC market.
And while ZNS already adopted ways for managing distinct random reads/writes regions, I don't know how tiering could be done without specifically designing for it, and if they didn't already do it with ZNS it will be hard to change that.
Billy Tallis - Saturday, February 20, 2021 - link
Enmotus sells a Phison E12 QLC drive with custom firmware that presents an SLC portion and a QLC portion to the host system, rather than using drive-managed SLC caching. Their 2TB model gives you ~1.6TB usable space, the first 128GB of which are SLC. Their FuzeDrive software manages data placement, but also lets you manually pin data to either the fast or slow tier. It's a really interesting approach.I've had the hardware for a while, but haven't gotten around to properly testing it. The tiering software for the FuzeDrive SSD is Windows-only, which is a nuisance since a lot of my test suite is Linux based. But I did make some changes to the new Linux test suite with an eye toward being able to test the SLC and QLC portions of that drive separately.
oRAirwolf - Monday, February 22, 2021 - link
I look forward to this. I keep waiting for something significant to happen that will reduce game lead times and increase overall performance in my use case. I still run 3x1TB 850 EVOs in RAID 0 on my desktop because I cannot feel any appreciable difference in performance over a great pcie 3.0 ssd like the SK Hynix Gold P31 I put in my XPS 17 9700. Maybe this new method of caching will help?mode_13h - Monday, February 22, 2021 - link
This would be great for placement of the journal, i-nodes, and filesystem metadata.Is there any reason that the host OS or SSD controller couldn't dynamically decide whether to make a page SLC, MLC, or TLC, or QLC?
mode_13h - Monday, February 22, 2021 - link
Rather than small files, I'd use low-density pages for filesystem journal, metadata, and i-nodes. That would really speed up your random access for all files, and should scale a lot better, since filesystems vary a lot in terms of their filesize distribution.AntonErtl - Friday, February 19, 2021 - link
Even though I have too little knowledge about 3D NAND to understand all of this article, I found it interesting and learned something.One question: All these presentations at the academic ISSCC that you are reporting on come out of industry, not academia, and I guess development, not industry research (although there is not always a clear divide). What is the motivation for the companies for presenting these things?
DanNeely - Friday, February 19, 2021 - link
The people designing the new tech are scientists, for whom publications are a key part of their resume. Also bragging rights.MrSpadge - Friday, February 19, 2021 - link
There are many labs which could buy the products and tear them apart to gain much of what's being published here. The critical know-how if "how to make it". So they publish some uncritical information for publicity and probably also discussion.romrunning - Friday, February 19, 2021 - link
Every time I look at QLC specs, I think "why didn't they just stop with TLC as the baseline?" Then all storage could be expected to hit at least those minimum specs. Sort of like the XSX/PS5 SSDs - the SSDs have to meet minimum specs, which then lets developers program to having that level of IO always available. Instead, we have QLC and soon PLC just resetting the bar lower and lower until we're almost back at HDD levels (like some of those Intel 600-series).ikjadoon - Friday, February 19, 2021 - link
I'm really curious what companies see as QLC's consumer future in the next five years.The consumer numbers just don't add up in any reasonable way and haven't for years now. Cheap TLC SSDs are still reaching great prices at 1 TB ($110) and 2 TB($220): https://pcpartpicker.com/products/internal-hard-dr...
QLC is $100 and $200, a measly 10% discount respectively. How many more years can QLC sustain such poor market positioning before manufacturers call it quits and move to commercial / enterprise drives only? Two years? Four years?
I guess it'll be like SMR: subterfuge, lies, and OEM buy-in. IIRC, all of HP's Spectre laptops still ship with a 1% Optane & 99% QLC drive (the Intel H10) by default.
Tomatotech - Friday, February 19, 2021 - link
QLC and PLC are just getting started. At large capacities eg 4TB QLC performs surprisingly well with a lot of room to fold into a SLC hot cache / tier. At a (very) wild guess PLC will also start performing well at 8TB+ SSDs. See Wereweeb's comment below on future SSD capacities.New tech often starts out with a worse performance than the optimised old tech in the first few years of its life.
Oxford Guy - Friday, February 19, 2021 - link
Diminished returns will magically evaporate.squngy - Sunday, February 21, 2021 - link
Most QLC drives have a big SLC cache.Developers shouldn't need to care about QLC for the most part.
ksec - Friday, February 19, 2021 - link
I guess 2020 was the last stage of the S-Curve in terms of cost reduction. There are still margin could be squeezed from the current cost. But in terms of NAND I dont see a clear path for a general 2TB SSD at sub $100 in 2023 / 24.Wereweeb - Friday, February 19, 2021 - link
What? All these companies plan on making NAND with 500+ layers in the next few years. I.e. four times as much storage capacity in roughly the same area as current 128 layer designs. If all goes well in 2026 we'll have 32TB m.2 SSD's.MrSpadge - Friday, February 19, 2021 - link
But they have to do this by string stacking, which means the cost per bit hardly goes down.scottenj - Friday, February 19, 2021 - link
Intel 3D NAND is Floating Gate fabricated with an oxide/poly stack. Oxide/poly is much harder to etch than Charge Trap oxide/nitride stacks and they are pretty much stuck at 48 layers per string. The 3 x 48 = 144 has been known for a while now. What is surprising to me is SK Hynix says they will extend this technology for 2 to 3 more generations which would imply 4 x 48 = 192L, 5 x 48 = 240L and maybe even 6 x 48 = 288L.Mr Perfect - Friday, February 19, 2021 - link
I don't suppose anyone at ISSCC was talking about post-NAND storage, where they? There was a brief burst of news a couple years back when Optane hit and other technology entered development, but now there's nothing.Wereweeb - Friday, February 19, 2021 - link
The truth is, there's just not much to talk about with the new NVM's. Everything has been demonstrated in labs over and over again. A few of them are shipping in small quantities for niche applications. But they can't compete with DRAM or NAND so almost no one is willing to sink billions to try mass-producing one of them.Only Intel and Micron had the balls to, but they unfortunately betted on the wrong horse. XPoint has revealed itself to be VERY inferior to DRAM, and it simply can't scale like NAND does. Intel's Optane business is STILL bleeding money thanks to that, and I don't know if will ever pay for itself.
MRAM is actually shipping in small quantities, and it's competitive with DRAM in performance, plus can be tuned for a variety of different purposes (including MLC), so it's generally promising. After a decade of big talk and nothing to show for it (Except for rumours of military applications), even Nantero's NRAM seems to be an actual thing now, with Fujitsu licensing the technology.
The problem is, since none of these can compete with the price per bit of DRAM (Or NAND, for XPoint), so there's no incentive to refine them. For the big companies, time and money spent on these technologies means losing market share in the NAND and DRAM mass market.
We just have to wait for DRAM to hit a wall, or for the MRAM startups to mature their technologies enough for the big silicon companies to start adquiring them, jumpstarting the start of the transition. Considering MRAM is still at 1Gb dies, vs DRAM's 32Gb (With plans for 64Gb)... it might take a while.
I just hope someone will pick up CeRAM and give it a good go, it looks very promising.
Tomatotech - Saturday, February 20, 2021 - link
It’s slightly further up the memory hierarchy, but Apple is doing very interesting things with their CPU and memory. I would definitely keep an eye on that as they’ve only just started exploring the potential there. I have absolutely no idea what their desktop chips / memory subsystems will look like in a few years down the line (unlike Intel / AMD).You won’t need to wait that long - Apple will be releasing V2 of their M1 systems later this year which should give us an indication of their direction of travel.
Dark_wizzie - Saturday, February 20, 2021 - link
'XPoint has revealed itself to be VERY inferior to DRAM, and it simply can't scale like NAND does'What does scaling mean in this context? Thanks.
Tomatotech - Saturday, February 20, 2021 - link
NAND is the type of memory SSDs use. ‘Scaling’ in this context could mean either or both of two things:- scaling up to bigger and bigger capacities e.g 4TB, 8TB, 16TB+ SSDs.
- scaling out to mass production and becoming much cheaper per TB due to economies of scale.
XPoint is having problems with achieving one or both of these things.
Spunjji - Monday, February 22, 2021 - link
I assumed Wereweeb meant scaling as in shrinking the cell size to improve bit density, though I guess that's effectively the same thing as "scaling up" to bigger capacities - just viewed from the opposite direction!I suspect both are true, too. If they can't make high-capacity dies at relatively small sizes then the product isn't going to sell well, which in turn precludes increasing the quantity of dies in mass-production.
FunBunny2 - Saturday, February 20, 2021 - link
the thing is, scaling to a 'flat' 64 bit address space in NVM, not just virtual memory by whatever name, can only be worth the effort when code (O/S and applications) and hardware throws out all those tiers/caches/buffers/etc. but the industry bought into 'memory hierarchy' too long ago. deciding on a transaction protocol when, theoretically, you only need to write once out in NVM will be no small feat. just consider how much of a current cpu and O/S are devoted to managing all those memory classes.I would think Intel, someplace in its skunkworks, has a skeletal cpu (FPGA based one expects) and appropriate O/S, also anorexic, to demonstrate this. if done smartly any application that doesn't do its own I/O (aka, not industrial strength RDBMS) won't need anything more than a C compiler and a recompile.
other than google, how many applications need more than 16 exabytes anyway?
Tomatotech - Saturday, February 20, 2021 - link
You're a bit behind. There's a lot less code dealing with tiers/ caches / buffers than you think. Most applications don't deal with tiers/ caches/ buffers etc. They mostly just create data structures and write to / read from them.The OS deals with storing these data structures, and even so, that mostly means storing in RAM, and a bit of transferring to / from cold storage (mostly SSD nowadays) as needed.
Inside a SSD, the SSD firmware deals with the minutiae, not the OS. The OS doesn't have a blind idea where anything physically is on the SSD, and doesn't need to, it's all virtualised storage. The SSD is a black box that presents a virtual front end to the OS.
As for the CPU, the CPU firmware deals with the caches, not the OS. All modern CPUs are basically also black boxes with a virtualised front end (usually CISC) that the OS deals with. Instructions sent to the CPU front end are translated internally into RISC code that differs between various steppings and models of CPUs. That's how you can swap between a wide variety of models and types of CPU without the OS falling over.
You may be thinking that's a lot of black boxes. It makes for flexible software and hardware, where different parts can be swapped for improved versions without the rest of the system needing to know that anything has changed. An even more modern trend is for browser apps, which don't even need to know if they are running on Windows or macOS or Linux or anything else - the entire system is a black box under the browser app.
GeoffreyA - Sunday, February 21, 2021 - link
The power of abstraction. I sometimes wonder whether our universe isn't running under virtualisation on the Creator's computer, but this remark will be controversial, so let me leave it at that.mode_13h - Monday, February 22, 2021 - link
There are a few proper scientific papers that make the rounds, every now and again, that attempt to posit how we can determine if we're indeed in some sort of Matrix-like simulation, or if the universe is a simulation running inside some sort of hyper-dimensional computer.Here's a link for you: https://phys.org/news/2016-11-matrix-style-simulat...
GeoffreyA - Monday, February 22, 2021 - link
Thanks. Can't wait to read that, and will return with my thoughts. But I will say this for now: I have often felt there's something very computer-like to it all. Quantum mechanics' haziness, for one, reminds me of lossy compression. Perhaps that was more economical than storing everything to the uttermost precision. Anyhow, I hope the universe is backed up and there are no power failures, otherwise we're cooked.mode_13h - Monday, February 22, 2021 - link
> Quantum mechanics' haziness, for one, reminds me of lossy compression.Beware of taking analogies too far. Quantum mechanics defies intuition. You just need to learn its rules and leave behind your preconceptions and macro thinking. I love how quantum computers are part extreme engineering, part cutting-edge science experiment:
https://phys.org/news/2021-02-lack-symmetry-qubits...
> otherwise we're cooked.
No, if you somehow manage to live long enough, you'll freeze. The universe will ultimately undergo heat death.
Which reminds me of another fascinating idea I ran across: https://arxiv.org/abs/1705.03394
"That is not dead which can eternal lie: the aestivation hypothesis for resolving Fermi's paradox"
TL;DR: the aliens are sleeping until the universe cools enough that they can OC their superintelligent CPUs. OC by like 10^30 times. I'd hit the snooze button for that.
mode_13h - Monday, February 22, 2021 - link
> I hope the universe is backed up and there are no power failures, otherwise we're cooked.Or we could get taken out by an asteroid or a gamma-ray burst. But, a more terrestrial threat (of non-human origin) exists: super-volcanos. They're more common than big asteroid impacts and can have similar climatic effects.
GeoffreyA - Tuesday, February 23, 2021 - link
Apologies in advance: my layman physics is quite rusty. And yes, we should be wary of analogies, especially those concerning QM, but it's fun thinking these things, more so when they're whimsical.I reckon the oddness of QM is due to its having stumbled upon the low-level implementation of the universe, whereas classical physics was a bit like an interface or API. The two don't quite square. Take entanglement, doesn't seem to make sense intuitively; but what if it were a hint---a side effect, as a programmer would say---of the underlying structure. Perhaps at that level, things are stored in a non-local fashion, or as pure data, but going through a transformation layer (our space-time), we get the illusion of distance. That would explain the instantaneous collapse into opposite spins, though the particles are separated by arbitrary distance. I also fancy that superposition before collapse is simply because particles don't have that "property" till it *has* to be calculated: perhaps a cost saving measure by the Designer, along with some useful side effects. (Cf. a C++ class, where not all values are stored. Others are calculated as they're requested.)
I'll respond to the other points you noted as I think about them.
mode_13h - Tuesday, February 23, 2021 - link
That's a pretty big leap. Entanglement is the rare exception, not the rule.If we're lucky, we'll live to see the day that physics is completely solved. Then, we can start to meaningfully ask "why?" and "what does it mean?" Until then, I just see it as a marvel of human understanding that's beautiful in its weirdness and otherness.
GeoffreyA - Wednesday, February 24, 2021 - link
Concerning quantum computers, I need to get a better grip on the topic. I remember when first reading about them in 2016, I felt a bit sceptical, especially concerning that DWAVE machine or whatever it was called. But, need to bring myself up to date on the topic."if you somehow manage to live long enough, you'll freeze. The universe will ultimately undergo heat death"
I think one can spend a lifetime meditating on this topic, about the end of the universe, heat death, and entropy. I used to think about it a lot but haven't in a while, so it's hazy. (Still have to work out whether Nolan's reverse entropy was nonsense or plausible.) Anyway, there's some elusive link between time, entropy, heat, and the cooling at the end. Tegmark had some nice remarks there, as well as Penrose and Rovelli. As to fiction, Asimov's "Last Question" is well worth reading if you haven't already. It's about a computer that ponders the question: how can the heat death be avoided, and entropy reversed? The ending is golden. I apologise for the poor link but couldn't find a better one:
baencd.freedoors.org/Books/The%20World%20Turned%20Upside%20Down/0743498747__19.htm
GeoffreyA - Wednesday, February 24, 2021 - link
"Until then, I just see it as a marvel of human understanding that's beautiful in its weirdness and otherness."It's mind-boggling how the human mind found out all these things. Even our devices today spring from these discoveries. I doubt whether physics will be solved in our lifetime. Yet, I just wish I could understand "why," and what's out there, beyond our universe. Some will say a futile question but I think about it often.
GeoffreyA - Wednesday, February 24, 2021 - link
https://novels80.com/the-complete-stories/the-last...mode_13h - Thursday, February 25, 2021 - link
Thanks for the recommendation. I'll check it out, sometime.mode_13h - Monday, February 22, 2021 - link
> Inside a SSD, the SSD firmware deals with the minutiae, not the OS.There's a move to expose the details of SSDs for the host to manage, rather than the SSD controller having to guess what the host OS/application wants. It's mostly for enterprise applications, though. I forget what name it's under.
> As for the CPU, the CPU firmware deals with the caches, not the OS.
That's not true. Even leaving aside security mitigations, software (i.e. OS/drivers) always had to flush or invalidate caches for memory regions being read/written by devices.
> You may be thinking that's a lot of black boxes. It makes for flexible software and hardware, where different parts can be swapped for improved versions without the rest of the system needing to know that anything has changed.
This is a convenient approach, until either the cost of those abstractions adds up, or they prevent software from doing clever optimizations that can't be anticipated by the caches. For instance, GPUs have at least some of the on-chip memories managed by the software, because it has a better idea of what data it wants fast access to, and for how long. Also, caching has overheads, even when it does exactly what you want.
Spunjji - Monday, February 22, 2021 - link
That would be ZNS:https://www.anandtech.com/show/15959/nvme-zoned-na...
GeoffreyA - Monday, February 22, 2021 - link
"the cost of those abstractions adds up"Indeed, that's why there's been the trend of late to go closer to the hardware, like DirectX 12, Vulkan, and Metal.
FunBunny2 - Thursday, February 25, 2021 - link
"there's been the trend of late to go closer to the hardware"as a wise guy I used to know (Ph.D. in math stat) said, "infinite granularity yields infinite complexity".
mode_13h - Thursday, February 25, 2021 - link
The trick is to have an API that hides enough details that you get portability between different hardware (and that includes running *well* on them, sometimes referred to as "performance portability"), while still being as easy as possible to use (correctly) and low-overhead enough that it doesn't hamper the perfomance and lead to developers seeking other options.That's not a small order, and goes some ways towards explaining why APIs need to change with the technology (both on host and device-side). Also, workloads evolve with the technology, and that exposes new bottlenecks or limits on the scalability of earlier APIs.
GeoffreyA - Saturday, February 27, 2021 - link
I agree, that's the ideal. Many APIs and frameworks were taken a bit too far, where ease, tidiness, and security came before everything else. Over time, or from the word go, their performance was lacking. Though I haven't used it, Microsoft's .NET comes to mind; and Qt, while being fun, tends to encourage one to use features in an inefficient way.Certainly no expert but I'd say the Win32 API is an example of having the right balance. It forces programmers to work in a roundabout but efficient way, compared to newer frameworks and languages, where the shorter styles tend to come at a cost. Sure, it's old-fashioned and clumsy, but has a strange elegance.
mode_13h - Saturday, February 27, 2021 - link
Well, an interesting example was the old argument about immediate mode vs retained mode graphics APIs. Retained mode was intended to be more efficient, but resulted in a much more complicated API. Some folks (I think most notably, John Carmack) decided to compare DX with OpenGL, to see if it was worthwhile, and found that the extra CPU overhead of immediate mode was small and arguably worth the simplicity of using immediate mode.GeoffreyA - Sunday, February 28, 2021 - link
Yes, if overhead is minimal, any gain in simplicity is worth it. I wonder if graphics programmers find Vulkan and DX12 any harder than OpenGL and DX11. As for the results, they're quite striking: Vulkan, at least, picks up frame rate quite a bit.mode_13h - Monday, March 1, 2021 - link
Yes! They are both *much* more cumbersome, unless you were really pushing up against (and having to work around) the limitations of their predecessors. So, DX12 and Vulkan both make sense for game *engines*, but not the average 3D app that was just using the APIs, directly.My experience is really with OpenGL, and it does a heck of a lot of housekeeping, scheduling, and load-balancing for you that Vulkan doesn't. But, I've heard that developers haven't fully embraced DX12, either.
GeoffreyA - Tuesday, March 2, 2021 - link
I don't think many have embraced DX12 but Vulkan has certainly been delivering in engines. I saw a video where Doom, on a 2200G, goes from hopeless to fully playable; and if I'm not mistaken, id removed the OpenGL path from Eternal. Also, quite a boost on the Cemu emulator. As for me, I never did any graphics programming really, except for dabbling very slightly in D3D9 in 2008.mode_13h - Thursday, March 4, 2021 - link
Don't get me wrong, there are benefits to be had by getting more direct access to the hardware, for those already willing and able to take on the additional complexity. In fact, if you were already having to work around the limitations of OpenGL or DX11, it could conceivably be *less* work to use the new APIs.GeoffreyA - Friday, March 5, 2021 - link
Yes, I suppose sort of like a standardised way, instead of each developer having to come up with his/her own set of "optimisations."GeoffreyA - Saturday, February 27, 2021 - link
"infinite granularity yields infinite complexity"That's a nice saying. I think that's why, even in life, we rely on layers of abstraction. I decide to eat oatmeal today, not [quark structure of oatmeal]. Watching my favourite 1940s film, or is it a sequence of bytes, representing an H.264 bitstream, decoded on silicon, passed electrically through circuits, and lastly, lighting some liquid crystals. It goes on and on. Life would be impossible.
FunBunny2 - Saturday, February 27, 2021 - link
one might, although he didn't so far as I can recall, substitute 'abstraction' for 'granularity'.drajitshnew - Saturday, February 20, 2021 - link
Very interesting read. My thank to @billytallis.I would like to request you to do a technology primer of ZNS. can't understand the existing explanations.
Billy Tallis - Saturday, February 20, 2021 - link
I already tried: https://www.anandtech.com/show/15959/nvme-zoned-na...Hit me up with further questions if there's anything I need to clarify in that article.
GeoffreyA - Saturday, February 20, 2021 - link
Interesting reading, Billy. Thank you.Martin84a - Saturday, February 20, 2021 - link
Great article. Small feedback: Thanks for writing the proper SI/IEEC units in the table. However, in the text the space is lacking between the number and the unit of measurement in several places. For instance "1.6 to 2.0 Gb/s" and "1.2Gb/s IO speeds" etc. A lot of people are influenced by Anandtech, so if you one day start to write MHZ instead of MHz, then that is going to teach a lot of people to write the improper way. The same goes for how you write units with a lack of the proper space between the number and the unit of measurement.Samus - Saturday, February 20, 2021 - link
This is the kind of shit I expect from AT. Awesome article!Zizy - Monday, February 22, 2021 - link
Considering 512 Gbit is known as is 8.5 Gbit/mm2, why don't you write ~60mm2 as the die size?Also, checking densities of TLC vs QLC, it seems QLC is offering negligible density improvements. I wonder why they even bother developing QLC - MLC and TLC both quickly offered density improvements over prior tech, and in turn lowered $ per GB, while QLC doesn't really. Is that say 6-12 month gap in density worth enough to bother, does QLC serve as a nice learning experience for TLC, or is simply QLC seen as with much more potential to grow so these companies suffer through early days?
mode_13h - Monday, February 22, 2021 - link
> Since 8 channels running at 1.2Gb/s is already enough for a SSD to saturate a PCIe 4.0 x4 connectionI don't follow, unless you mean 1.2 GB/sec. Because the way I figure it, 8 * 1.2 Gbps is 1.2 GB/sec and x4 PCIe 4.0 is about 8 GB/sec. However, 8 * 1.2 GB/sec = 9.6 GB/sec, so I guess that answers my question?
Billy Tallis - Monday, February 22, 2021 - link
NAND IO speed is given in bandwidth per pin, similar to GDDR and LPDDR. In this case, we're talking about 8-bit wide interfaces. See http://www.onfi.org/specificationsmode_13h - Monday, February 22, 2021 - link
Thanks for clarifying. Unless I missed it in the article, maybe you could mention that in the future to help inform people like me who don't follow NAND developments closely?Aside from that minor point, thank you very much for demystifying the current state and ongoing developments in the NAND storage industry.
JoeDuarte - Monday, March 8, 2021 - link
Billy, is there any talk of moving beyond the PCIe interface? It's a major constraint on performance at this point, even PCIe 4.0. Optane could never deliver on its unique capabilities because of the constraints of PCIe. I'm not sure why Intel bothered if they weren't going to support better interfaces to really let it breathe.CAPI and OpenCAPI are intriguing because they have much lower latency than PCIe. It's something like 20,000 CPU cycles for every PCIe transaction vs. 500 cycles for OpenCAPI: https://en.wikipedia.org/wiki/Coherent_Accelerator...
The I/O would be unleashed if they moved to OpenCAPI or a similar deal like CXL or whatever that one is that Intel is promoting. I assume all this stuff is aimed at enterprises and data centers first, but there should be more coverage and push for it. The end of Moore's Law means we need to look at other aspects of the computing experience, like I/O and storage.
Microsoft's DirectStorage is also intriguing for getting SSDs to send directly to the GPU instead of having the CPU orchestrate everything. And for fast decompression by the GPU. All that will be faster if they move PCs to OpenCAPI.
mode_13h - Tuesday, March 9, 2021 - link
As I understand it, there's a standard for persistent memory DIMMs and then there's OpenCAPI.However, modern CPUs can only address a certain amount of physical memory, so I'm not sure of that would impose a practical scaling limit for memory-addressable SSDs.
The memory bus also isn't great for higher-latency SSDs, since it stalls out a CPU thread, while using PCIe enables you to post transactions or have the device initiate transfers into host memory.
Nexing - Wednesday, March 10, 2021 - link
I am stuck at the times where TLC's reliability to hold those 3 bits per cell was specified and compared versus MLC's 2 bit superior long term capacity.However I have here a bricked 1 GB Mushkin Reactor, apparently because I didn't need to access it for a year or so...A few years ago, tired of damaged HDs, I truly thought MLC was technically an enough good answer to my usual mobile storage needs for Virtual Instruments (VIs). In the Audio world many of us would love to have most non & musical instruments sampled available at our hands and that sometimes means a sampled piano weight of 200 or 300 GBs alone. And yes, when we play with it, we want no-latency performances, despite the myriad of effects and middle-ware DSP processing...
///Therefore how or when are going to be offered SSDs tech accessible to buy and stack nearby (separated by sound types). A library that should last say 10 years or more and that could rest idle for years and also be ready for full busy 12 hours performance for days or weeks... (a single 4 or 8 TB unit would be too risky in case of loss, better to move around as much as 1 or 2 TBs).
Right now I would need two or three of these SSDs already, but all I get offered is TLC or worst QLC drives, with a full in depth article --this one above-- with no figures or cited specifications that I could find about endurance or cell leakage-over-time explanations.
Am I stuck in the past? Lost? Or manufacturers simply do not consider SSDs for middle-long term storage and thus those professionals whose work centers at mobile workstations and need to swap large amounts of info, just have to tame aspirations for the speed, low latency, fall resistance, noiselessness and low weight SSD are already delivering?
Feel like am missing something.
mode_13h - Wednesday, March 10, 2021 - link
> Or manufacturers simply do not consider SSDs for middle-long term storageExactly. They are in a race for the consumer market of people who use their device daily or weekly. Intel used to be the only one I saw publishing the data retention stats on their SSDs, but I cannot even find detailed specifications of their newer products.
So, this puts you in the expensive data center market, or else you have to look at other storage options.
I would advise keeping your library on optical storage that is rated for stability over the course of decades, and then you just have to copy the instruments you need, before you want to use them.
Nexing - Wednesday, March 10, 2021 - link
Thank you for your answer mode_13h I was afraid to face that alternative.So, we are not already there... Middle/Long term SSD's storage is yet not available.
And also yes; optical storage, specifically MDisc technology is the only current long term solution available. In fact lasting hundreds of years and available for US$2 each 25 GB double sided bluray. Thankfully such solution exist for whole musical projects retrieval.
However once your project or data source surpass the 100 GB number or when you need to plug and obtain performance from that data (iex with VIs), particularly in music studios or any of the myriad of mobile duties, then either optical discs or risky hard drives are simply not good solutions.
///I keep witnessing horror stories of important data losses related to hard drive failures, so I cannot understand the market numbers that preclude to develop better MLC or whatever long lasting SSD tech that is required for such untapped storage need.
mode_13h - Wednesday, March 10, 2021 - link
> Middle/Long term SSD's storage is yet not available.No, it's going the other way. Smaller cells mean shorter retention. I recently read pictures off a Compact Flash card from 10 years ago. These days, even a premium SDXC card probably won't hold its contents for more than a year without use.
I don't know how well Optane (3D XPoint) does at longevity. It's not exactly cheap, though.
> I keep witnessing horror stories of important data losses related to hard drive failures
I use RAID-6 and make sure it regularly gets "scrubbed" (i.e. consistency-checked). Of course, I could lose it in a fire, but my high-value data is backed up in the could.
> I cannot understand the market numbers that preclude to develop better MLC or whatever long lasting SSD tech that is required for such untapped storage need.
It won't be any form of NAND flash. Besides Optane/3D XPoint, the other promising technology seems to be MRAM.
mode_13h - Thursday, March 11, 2021 - link
Sorry, I meant to write NRAM (see Nantero). I've also seen some suggestions that FRAM could be another option, but I'm not sure if its density is competitive.JoeDuarte - Thursday, March 11, 2021 - link
M-Disc went bankrupt a couple of years ago, and their prices increased afterward (they continued under new ownership).I don't know their current status, but it doesn't seem like the new owners are doing a good job with production and the website. I think drives that write M-Discs will become harder to find, and the discs will be more and more expensive, or even disappear.