Original Link: https://www.anandtech.com/show/13951/the-samsung-983-zet-znand-ssd-review
The Samsung 983 ZET (Z-NAND) SSD Review: How Fast Can Flash Memory Get?
by Billy Tallis on February 19, 2019 8:00 AM ESTIn 2015, Intel and Micron unveiled 3D XPoint memory, a new competitor to flash memory that promised significantly higher performance and endurance. Several years later, Intel has successfully commercialized 3D XPoint memory in a growing range of Optane products, but other alternative non-volatile memory technologies are still largely stuck in the lab.
To compete against Intel's Optane SSDs, Samsung decided to exercise their lead in 3D NAND flash memory to produce a specialized high-performance variant, which they call Z-NAND. Fittingly, the first SSDs they use it in are branded "Z-SSDs". The first two models to be released were the SZ983 and SZ985, which were both high-end drives specifically for datacenter customers.
Meanwhile, with their major datacenter customers taken care of, Samsung is moving to make more of their enterprise and datacenter storage products available through retail distribution channels instead of just large-volume B2B sales. Spearheading that initiative, the SZ983 is now being sold to retail customers as the Samsung 983 ZET.
Enter the 983 ZET: Going Back To SLC
Samsung originally announced Z-NAND in 2016, a year after 3D XPoint memory was announced and before any Optane products had shipped. Fundamentally, the first generation of Z-NAND is an effort to turn back the clock a bit; to step back from today's modern, high-density, (relatively) high-latency Triple Level Cell (TLC) NAND and back to simpler Single Level Cell (SLC) designs.
SLC designs are relatively straightforward: since they only need to store a single bit of data per cell, the cell only needs to be in one of two voltage states. And this makes them both faster to read and faster to write – sometimes immensely so. The tradeoff is that they offer less density per cell – one-half or one-third as much data as the equivalent MLC or TLC NAND – and therefore a higher cost per bit overall. This has lead to the rapid adoption of MLC and then TLC, which for most use cases is plenty sufficient in terms of performance while also offering great capacity.
However there are markets and use cases where absolute speed (and not capacity) is king, and this is where a SLC-based storage solution can provide much better real-world performance; enough so to justify the higher per-bit cost. And it's this market that Intel and Samsung have been exploiting with their 3D XPoint and Z-NAND products respectively.
Adding an extra wrinkle to all of this is that Samsung's Z-NAND isn't merely SLC NAND; if simply operating existing NAND as SLC was all there is to Z-NAND, then we would also expect Toshiba, WD, SK Hynix to have also delivered their competitors by now. Instead, Samsung has taken additional steps to further improve their SLC-based Z-NAND. We'll go into greater detail on this on the next page, but one of the big changes here was lowering the read and program times of the NAND, which further improves its read/write performance. This is important for Samsung both to give them an edge over the aforementioned competition, but also to ensure Z-NAND is competitive with 3D XPoint, which has proven to be no slouch in this area.
On paper then, Samsung's Z-NAND looks plenty fast for the kinds of workloads and markets Samsung is chasing. Now it comes to Samsung's 983 ZET to deliver on those ambitions.
Samsung 983 ZET Specifications | |||
Capacity | 480 GB | 960 GB | |
Controller | Samsung Phoenix | ||
Form Factor | HHHL PCIe AIC | ||
Interface, Protocol | PCIe 3.0 x4 NVMe 1.2b | ||
NAND Flash | Samsung 64Gb 48L SLC Z-NAND | ||
Sequential Read | 3400 MB/s | ||
Sequential Write | 3000 MB/s | ||
4kB Random Read | QD32 Throughput | 750k IOPS | |
QD1 99.99% Latency | 30 µs | ||
4kB Random Write | QD32 Throughput | 60k IOPS | 75k IOPS |
QD1 99.99% Latency | 30 µs | ||
Power Consumption | Read | 8.5 W | |
Write | 9.0 W | ||
Idle | 5.5 W | ||
Write Endurance Drive Writes Per Day |
7.4 PB 8.5 DWPD |
17.5 PB 10 DWPD |
|
Warranty | 5 years | ||
Price | $999.99 ($2.08/GB) | $2075.85 ($2.16/GB) |
The Samsung 983 ZET uses the same Phoenix controller that we are familiar with from their TLC-based 983 DCT and the 970 family of consumer NVMe SSDs. Eight channels makes for a high-end controller in the consumer market, but this is more of an entry-level controller in the datacenter space—the other flash-based SSDs we're comparing against have more powerful 12 or 16 channel controllers.
The 983 ZET is available in just two capacities, both using a PCIe add-in card form factor. Samsung has demonstrated a M.2 Z-SSD and this controller is used in several other M.2 and U.2 drives, but the Z-SSDs are a relatively low-volume product and this retail channel version is even more of a niche product, so the limited range of SKUs makes sense.
The sequential read and write specs for the 983 ZET are typical for high-end NVMe drives with PCIe 3 x4 interfaces, but it's uncommon to see these speeds at such low capacities: the small per-die capacity of Samsung's Z-NAND gives the 480GB 983 DCT as much parallelism to work with as a 2TB TLC drive. The random read performance of 750k IOPS is impressive for a SSD of any capacity, and while it isn't entirely unprecedented, it is significantly higher than the 550k IOPS that Intel's Optane SSD DC P4800X is rated for.
The random write specs bring the first difference in performance between the two capacities of the 983 ZET, and a stark reminder that we're still dealing with some of the limitations of flash memory. The steady-state random write performance is just 60k to 75k IOPS, an order of magnitude lower than the random read performance. Intel's Optane SSDs are only slightly slower for random writes than random reads, so the 983 ZET won't be able to come close to matching Optane performance on workloads that include a significant quantity of random writes.
Write endurance for the 983 ZET also falls short of the bar set by Intel's Optane SSDs, with 8.5 DWPD for the 480GB 983 ZET and 10 DWPD for the 960 GB model, while the Optane SSD debuted with a 30 DWPD rating that has since been increased to 60 DWPD.
The Competition
This review builds on our recent roundup of enterprise SSDs, and follows the same format and test procedures. This review is strictly focused on the use of the 983 ZET as a datacenter SSD, but we will have a follow-up to assess its suitability as an enthusiast class workstation/consumer drive.
Our collection of enterprise and datacenter SSDs is much smaller than our almost comprehensive catalog of consumer SSDs, but we do have drives from several different market segments to compare against. Most of these drives were described in detail in our last enterprise SSD review. The most important competitor is obviously the Intel Optane SSD DC P4800X, Intel's flagship and the drive that Samsung's Z-SSDs were created to compete against.
Sadly unavailable for this review is the Micron P320h, an early PCIe SSD that used 34nm planar SLC NAND and advertised similar random read performance, and better random write performance and endurance. The controller was a 32-channel monster with a PCIe 2.0 x8 host interface that used a proprietary protocol rather than the nascent NVMHCI standard that we now know as NVMe. That controller product line passed from IDT to PMC-Sierra to Microsemi and now Microchip, and its descendants are used in two other drives included in this review, both of which use the 16-channel versions rather than the 32-channel:
- The Micron 9100 MAX 2.4TB, based on 16nm MLC with excessive overprovisioning: a 4TB raw capacity but only 2.4TB usable.
-
The Memblaze PBlaze5 C900 and D900, both based on Micron 32L TLC NAND. We have a 6.4TB sample of the newer generation PBlaze5 with 64L TLC on the way for a future review.
What Is Z-NAND?
When Samsung first announced Z-NAND in 2016, it was a year after 3D XPoint memory was announced and before any Optane products had shipped. Samsung was willing to preview some information about the Z-NAND based drives that were on the way, but for a year and a half they kept almost all information about Z-NAND itself under wraps. Initially, the company would only state that Z-NAND was a high-performance derivative of their V-NAND 3D NAND flash memory. Meanwhile at Flash Memory Summit 2017, they confirmed that Z-NAND was a SLC (one bit per cell) memory, while the company also announced that they were also working on a second generation of Z-NAND will introduce a MLC version of Z-NAND. (for reference, mainstream NAND flash is now almost always 3 bit per cell TLC).
If simply operating existing NAND as SLC was all there is to Z-NAND, then we would also expect Toshiba, WD, SK Hynix to have also delivered their competitors by now. But there are further tweaks required to challenge 3D XPoint. A year ago at IEEE's International Solid State Circuits Conference (ISSCC), Samsung pulled back the veil a bit and shared more information about Z-NAND. The full presentation was not made public, but PC Watch's coverage captured the important details. Samsung's first-generation Z-NAND is a 48-layer part with a capacity of 64Gb. Samsung's mainstream capacity-optimized NAND is currently transitioning from 64 layers to what's officially "9x" layers, most likely 96. There are probably several factors for why Z-NAND is lagging behind by almost two generations of manufacturing tech, but one important element is that adding layers can be detrimental to performance.
Samsung 3D NAND Comparison | |||||
Generation | 48L SLC Z-NAND |
48L TLC |
64L TLC |
9xL TLC |
|
Nominal Die Capacity | 64Gb (8GB) |
256Gb (32GB) |
512Gb (64GB) |
256Gb (32GB) |
|
Read Latency (tR) | 3 µs | 45 µs | 60 µs | 50 µs | |
Program Latency (tPROG) | 100 µs | 660 µs | 700 µs | 500 µs | |
Page Size | 2kB, 4kB | 16kB | 16kB | 16kB? |
Compared to their past few generations of TLC NAND, Samsung's SLC Z-NAND improves read latency by a factor of 15-20x, but program latency is only improved by a factor of 5-7x. Note however that the read and program times shown above denote how long it takes to transfer information between the flash memory array and the on-chip buffers; so that 3µs read time doesn't include transferring the data to the SSD controller, let alone shipping it over the PCIe link to the CPU.
With Samsung using 16kB page sizes for their TLC NAND, the 4kB page size for SLC Z-NAND seems to be a reasonable choice as only a slight shrink in total number of memory cells per page, but the capability to instead operate with a 2kB page size indicates that small page sizes are an important part of the performance enhancements Z-NAND is supposed to offer.
Missing from this data set is information about the erase block size and erase time. Erasing flash memory is a much slower process than the program operation and it requires activating large and power-hungry charge pumps to generate the high voltages necessary. For this reason, all NAND flash memory groups many pages together to form each erase block, which nowadays tends to be at least several megabytes.
Samsung's Z-NAND may be able to offer far better read and program times than mainstream NAND, but they may not have been able to improve erase times as much. And shrinking erase blocks would significantly inflate the die space required for peripheral circuitry, further harming memory density that is already at a steep disadvantage for 48L SLC compared to mainstream 64L+ TLC.
Test System
Intel provided our enterprise SSD test system, one of their 2U servers based on the Xeon Scalable platform (codenamed Purley). The system includes two Xeon Gold 6154 18-core Skylake-SP processors, and 16GB DDR4-2666 DIMMs on all twelve memory channels for a total of 192GB of DRAM. Each of the two processors provides 48 PCI Express lanes plus a four-lane DMI link. The allocation of these lanes is complicated. Most of the PCIe lanes from CPU1 are dedicated to specific purposes: the x4 DMI plus another x16 link go to the C624 chipset, and there's an x8 link to a connector for an optional SAS controller. This leaves CPU2 providing the PCIe lanes for most of the expansion slots, including most of the U.2 ports.
Enterprise SSD Test System | |
System Model | Intel Server R2208WFTZS |
CPU | 2x Intel Xeon Gold 6154 (18C, 3.0GHz) |
Motherboard | Intel S2600WFT |
Chipset | Intel C624 |
Memory | 192GB total, Micron DDR4-2666 16GB modules |
Software | Linux kernel 4.19.8 fio version 3.12 |
Thanks to StarTech for providing a RK2236BKF 22U rack cabinet. |
The enterprise SSD test system and most of our consumer SSD test equipment are housed in a StarTech RK2236BKF 22U fully-enclosed rack cabinet. During testing for this review, the front door on this rack was generally left open to allow better airflow, since the rack doesn't include exhaust fans of its own. The rack is currently installed in an unheated attic and it's the middle of winter, so this setup provided a reasonable approximation of a well-cooled datacenter.
The test system is running a Linux kernel from the most recent long-term support branch. This brings in the latest Meltdown/Spectre mitigations, though strategies for dealing with Spectre-style attacks are still evolving. The benchmarks in this review are all synthetic benchmarks, with most of the IO workloads generated using FIO. Server workloads are too widely varied for it to be practical to implement a comprehensive suite of application-level benchmarks, so we instead try to analyze performance on a broad variety of IO patterns.
Enterprise SSDs are specified for steady-state performance and don't include features like SLC caching, so the duration of benchmark runs doesn't have much effect on the score, so long as the drive was thoroughly preconditioned. Except where otherwise specified, for our tests that include random writes, the drives were prepared with at least two full drive writes of 4kB random writes. For all the other tests, the drives were prepared with at least two full sequential write passes.
Our drive power measurements are conducted with a Quarch XLC Programmable Power Module. This device supplies power to drives and logs both current and voltage simultaneously. With a 250kHz sample rate and precision down to a few mV and mA, it provides a very high resolution view into drive power consumption. For most of our automated benchmarks, we are only interested in averages over time spans on the order of at least a minute, so we configure the power module to average together its measurements and only provide about eight samples per second, but internally it is still measuring at 4µs intervals so it doesn't miss out on short-term power spikes.
QD1 Random Read Performance
Drive throughput with a queue depth of one is usually not advertised, but almost every latency or consistency metric reported on a spec sheet is measured at QD1 and usually for 4kB transfers. When the drive only has one command to work on at a time, there's nothing to get in the way of it offering its best-case access latency. Performance at such light loads is absolutely not what most of these drives are made for, but they have to make it through the easy tests before we move on to the more realistic challenges.
The 983 ZET comes even closer to matching the Optane SSD on our random read power efficiency score. Even in this enterprise configuration, the Samsung Phoenix is still a relatively low-power NVMe SSD controller, and the 983 ZET only draws a bit more power overall than the TLC-based 983 DCT. While the Optane SSD may have delivered almost twice the raw performance, it only has a 15-20% advantage in performance-per-Watt here.
Power Efficiency in kIOPS/W | Average Power in W |
The random read latency stats for the 983 ZET clearly set it apart from the rest of the flash-based SSDs and put it in the same league as the Optane SSD. The Optane SSD's average latency of just under 9µs is still better than the 16µs from the 983 ZET, but the tail latencies for the two are quite similar. Both Z-NAND and 3D XPoint provide better 99.99th percentile latency here than the average latencies of the MLC and TLC drives.
The random read performance of the 983 ZET is clearly optimized specifically for 4kB reads—both smaller and larger transfers take a significant hit to IOPS. By contrast, the Optane SSD's IOPS declines smoothly as the transfer size increases from the minimum of a single 512 byte logical block.
The other flash-based SSDs show fairly low throughput until the transfer sizes get up to around 128kB, but the Z-NAND's smaller page size allows it to exercise parallelism even for smaller transfers: the 960GB 983 ZET has higher throughput for 32kB reads than the TLC-based 983 DCT for 128kB reads.
QD1 Random Write Performance
The Samsung 983 ZET takes the lead for QD1 random write performance, and the Intel Optane SSD doesn't stand out from other flash-based SSDs. Intel's 3D XPoint memory has a far faster write latency than the programming times of NAND flash memory (even Z-NAND), but flash-based SSDs are very good at covering this up with DRAM caches. Thanks to their power loss protection capacitors, enterprise SSDs can safely report writes as complete while the data is still in their RAM, and the writes can be deferred and batched in the background. Intel's Optane SSDs do not have any DRAM and instead performs writes directly without the caching layer (and without the large power loss protection capacitors).
Power Efficiency in kIOPS/W | Average Power in W |
The Samsung 983 ZET also provides the best power efficiency during our QD1 random write test, but the Optane SSD and the TLC-based 983 DCT come close. In terms of total power, the Optane SSD draws about the same performing random writes as random reads, but all of the flash-based SSDs require much more power for writes than reads. The Samsung 983 ZET requires significantly more power than the 983 DCT, but not quite as much as the Intel P4510 and the other flash-based SSDs that use larger controllers.
The latency stats for the Samsung 983 ZET performing random writes at QD1 are the best in the bunch, but it doesn't stand out by much. The Intel P4510 has high tail latencies, but the rest of these NVMe drives have 99.99th percentile latencies that are no worse than four times their average latency. At low queue depths, almost all of these drives have no problem with QoS.
The Samsung 983 ZET's optimization for 4kB accesses is again apparent, but the performance for smaller writes is not crippled as it is for the Memblaze PBlaze5 and Micron 9100 MAX. Using transfers larger than 4kB doesn't yield any steady-state random write throughput increases all the way up to writing 1MB blocks, so IOPS falls off quickly as transfer size grows.
QD1 Sequential Read Performance
The queue depth 1 sequential read performance of the 983 ZET is lower than the TLC-based 983 DCT: about 2GB/s instead of 2.5GB/s. This puts the 983 ZET more in line with Intel's drives, including the Optane SSD. This is one area where the smaller page size of the Z-NAND is detrimental.
Power Efficiency in MB/s/W | Average Power in W |
The 983 ZET draws only slightly less power than the 983 DCT, which combined with its 20% worse performance leads to a lower efficiency score. However, both Samsung drives still provide better performance per Watt than the other drives included here, which all have more power-hungry controllers than the Samsung Phoenix.
The 983 ZET's sequential read speed at QD1 is almost at full speed for block sizes of at least 32kB, but there's a slight improvement at 512kB or larger. The 983 DCT and Intel P4510 top out with 256kB transfers, while the Memblaze PBlaze5 delivers poor read speeds without either a large block size or high queue depth.
QD1 Sequential Write Performance
The Samsung 983 ZET excelled at QD1 random writes, but it is hardly any better than the TLC drives for QD1 sequential writes, and the smaller Z-SSD is substantially slower. This is because even at QD1, there's enough data moving to keep the drive at steady state, where the background garbage collection is limited by the slow block erase operations that affect Z-NAND just as much as traditional NAND flash memory. The Optane SSD is almost three times the speed of the larger Z-SSD. It is followed by the PBlaze5 C900, which benefits from a PCIe 3 x8 interface.
Power Efficiency in MB/s/W | Average Power in W |
Both capacities of the 983 ZET use slightly more power during the sequential write test than the 983 DCT, which leaves the smaller Z-SSD with a significantly worse efficiency score. The larger Z-SSD and the two TLC-based 983 DCTs have the best efficiency scores among the flash-based SSDs, but that's still only two thirds the performance per Watt provided by the Optane SSD.
For sequential writes, the 983 ZET does not penalize sub-4kB transfers in terms of IOPS, but such small writes cannot deliver much throughput. The 983 ZET is close to full steady-state throughput with 8kB transfers, but doesn't actually peak until the transfers are up to 64kB.
Peak Random Read Performance
For client/consumer SSDs we primarily focus on low queue depth performance for its relevance to interactive workloads. Server workloads are often intense enough to keep a pile of drives busy, so the maximum attainable throughput of enterprise SSDs is actually important. But it usually isn't a good idea to focus solely on throughput while ignoring latency, because somewhere down the line there's always an end user waiting for the server to respond.
In order to characterize the maximum throughput an SSD can reach, we need to test at a range of queue depths. Different drives will reach their full speed at different queue depths, and increasing the queue depth beyond that saturation point may be slightly detrimental to throughput, and will drastically and unnecessarily increase latency. SATA drives can only have 32 pending commands in their queue, and any attempt to benchmark at higher queue depths will just result in commands sitting in the operating system's queues before being issued to the drive. On the other hand, some high-end NVMe SSDs need queue depths well beyond 32 to reach full speed.
Because of the above, we are not going to compare drives at a single fixed queue depth. Instead, each drive was tested at a range of queue depths up to the excessively high QD 512. For each drive, the queue depth with the highest performance was identified. Rather than report that value, we're reporting the throughput, latency, and power efficiency for the lowest queue depth that provides at least 95% of the highest obtainable performance. This often yields much more reasonable latency numbers, and is representative of how a reasonable operating system's IO scheduler should behave. (Our tests have to be run with any such scheduler disabled, or we would not get the queue depths we ask for.)
One extra complication is the choice of how to generate a specified queue depth with software. A single thread can issue multiple I/O requests using asynchronous APIs, but this runs into at several problems: if each system call issues one read or write command, then context switch overhead becomes the bottleneck long before a high-end NVMe SSD's abilities are fully taxed. Alternatively, if many operations are batched together for each system call, then the real queue depth will vary significantly and it is harder to get an accurate picture of drive latency. Finally, the current Linux asynchronous IO APIs only work in a narrow range of scenarios.
There is work underway to provide a new general-purpose async IO interface that will enable drastically lower overhead, but until that work lands in stable kernel versions, we're sticking with testing through the synchronous IO system calls that almost all Linux software uses. This means that we test at higher queue depths by using multiple threads, each issuing one read or write request at a time.
Using multiple threads to perform IO gets around the limits of single-core software overhead, and brings an extra advantage for NVMe SSDs: the use of multiple queues per drive. The NVMe drives in this review all support 32 separate IO queues, so we can have 32 threads on separate cores independently issuing IO without any need for synchronization or locking between threads.
When performing random reads with a high thread count, the Samsung 983 ZET delivers significantly better throughput than any other drive we've tested: about 775k IOPS, which is over 3GB/s and about 33% faster than the Intel Optane SSD DC P4800X. The Optane SSD hits its peak throughput at QD8, while the 983 ZET requires a queue depth of at least 16 to match the Optane SSD's peak, and the peak for the 983 ZET is at QD64.
Power Efficiency in kIOPS/W | Average Power in W |
At this point, it's no surprise to see the 983 turn in great efficiency scores. It's drawing almost 8W during this test, but that's not particularly high by the standards of enterprise NVMe drives. The TLC-based 983 DCT provides the next-best performance per Watt due to even lower power consumption than the Z-SSDs.
The Optane SSD still holds on to a clear advantage in the random read latency scores, with 99.99th percentile latency that is lower than the average read latency of even the Z-SSDs. The two Z-SSDs do provide lower tail latencies than the other flash-based SSDs, several of which require very high thread counts to reach full throughput and thus end up with horrible 99.99th percentile latencies due to contention for CPU cores.
Peak Sequential Read Performance
Since this test consists of many threads each performing IO sequentially but without coordination between threads, there's more work for the SSD controller and less opportunity for pre-fetching than there would be with a single thread reading sequentially across the whole drive. The workload as tested bears closer resemblance to a file server streaming to several simultaneous users, rather than resembling a full-disk backup image creation.
The Memblaze PBlaze5 C900 has the highest peak sequential read speed thanks to its PCIe 3.0 x8 interface. Among the drives with the more common four lane connection, the Samsung 983 ZETs are tied for first place, but they reach that ~3.1GB/s with a slightly lower queue depth than the 983 DCT or PBlaze5 D900. The Optane SSD DC P4800X comes in last place, being limited to just 2.5GB/s for multi-stream sequential reads.
Power Efficiency in MB/s/W | Average Power in W |
The Samsung 983s clearly have the best power efficiency on this sequential read test, but the TLC-based 983 DCT continues once again uses slightly less power than the 983 ZET, so the Z-SSDs don't quite take first place. The Optane SSD doesn't have the worst efficiency rating, because despite its low performance, it only uses 2W more than the Samsung drives, far less than the Memblaze or Micron drives.
Steady-State Random Write Performance
The hardest task for most enterprise SSDs is to cope with an unending stream of writes. Once all the spare area granted by the high overprovisioning ratios has been used up, the drive has to perform garbage collection while simultaneously continuing to service new write requests, and all while maintaining consistent performance. The next two tests show how the drives hold up after hours of non-stop writes to an already full drive.
The Samsung 983 ZETs outperform the TLC-based 983 DCTs for steady-state random writes, but otherwise are outclassed by the larger flash-based SSDs and the Optane SSD, which is almost six times faster than the 960GB 983 ZET. Using Z-NAND clearly helps some with steady-state write performance, but the sheer capacity of the bigger TLC drives helps even more.
Power Efficiency in kIOPS/W | Average Power in W |
The 983 ZET uses more power than the 983 DCT, but not enough to overcome the performance advantage; the 983 ZET has the best power efficiency among the smaller flash-based SSDs. The 4TB and larger drives outperform the 983 ZET so much that they have significantly better efficiency scores even drawing 2.6x the power. The Intel Optane SSD consumes almost twice the power of the 983 ZET but still offers better power efficiency than any of the flash-based SSDs.
The Samsung drives and the Intel Optane SSD all have excellent latency stats for the steady-state random write test. The Intel P4510, Memblaze PBlaze5 and Micron 9100 all have decent average latencies but much worse QoS, with 99.99th percentile latencies of multiple milliseconds. These drives don't require particularly high queue depths to saturate their random write speed, so these QoS issues aren't due to any host-side software overhead.
Steady-State Sequential Write Performance
As with random writes, the steady-state sequential write performance of the Samsung 983 ZET is not much better than its TLC-based sibling. The only way for a flash-based SSD to handle sustained writes as well as the Optane SSD is to have very high capacity and overprovisioning.
Power Efficiency in MB/s/W | Average Power in W |
The Samsung 983 ZET provides about half the performance per Watt of the Optane SSD during sequential writes. That's still decent compared to many other flash-based NVMe SSDs, but the 983 DCT and Memblaze PBlaze5 are slightly more efficient.
Mixed Random Performance
Real-world storage workloads usually aren't pure reads or writes but a mix of both. It is completely impractical to test and graph the full range of possible mixed I/O workloads—varying the proportion of reads vs writes, sequential vs random and differing block sizes leads to far too many configurations. Instead, we're going to focus on just a few scenarios that are most commonly referred to by vendors, when they provide a mixed I/O performance specification at all. We tested a range of 4kB random read/write mixes at queue depths of 32 and 128. This gives us a good picture of the maximum throughput these drives can sustain for mixed random I/O, but in many cases the queue depth will be far higher than necessary, so we can't draw meaningful conclusions about latency from this test. As with our tests of pure random reads or writes, we are using 32 (or 128) threads each issuing one read or write request at a time. This spreads the work over many CPU cores, and for NVMe drives it also spreads the I/O across the drive's several queues.
The full range of read/write mixes is graphed below, but we'll primarily focus on the 70% read, 30% write case that is a fairly common stand-in for moderately read-heavy mixed workloads.
Queue Depth 32 | Queue Depth 128 |
On the 70/30 mixed random IO tests, the Samsung 983 ZET provides similar performance to a TLC-based 983 DCT of twice the capacity. This leaves both Samsung drives as among the slowest drives in this batch, because the other flash-based drives are mostly higher-capacity models using controllers with higher channel counts than the Samsung Phoenix. The Samsung drives perform about the same at QD128 as at QD32, while some of the larger drives take advantage of the higher queue depth to almost catch up to the Intel Optane SSD.
QD32 Power Efficiency in MB/s/W | QD32 Average Power in W | ||||||||
QD128 Power Efficiency in MB/s/W | QD128 Average Power in W |
The TLC-based 1.92TB 983 DCT uses the least power on these mixed IO tests and it ends up with some of the best efficiency scores among the flash-based SSDs, about 60% of the performance per Watt that the Intel Optane SSD provides. The two 983 ZET drives use a bit more power than the 983 DCT, so they end up with fairly ordinary efficiency scores that are outclassed by the largest TLC drives in the QD128 test.
QD32 | |||||||||
QD128 |
The Samsung 983 ZET starts off the mixed IO tests with extremely high random read performance, but the performance drops very steeply as writes are added to the mix—just 5% writes is enough to almost cut total throughput in half, and the advantage of Z-NAND is gone by the time the workload is 20% writes. Except on the most read-heavy workloads, the flash-based SSDs are limited largely by their steady-state write performance, and the 983 ZET is handicapped there by its low overall capacities.
Aerospike Certification Tool
Aerospike is a high-performance NoSQL database designed for use with solid state storage. The developers of Aerospike provide the Aerospike Certification Tool (ACT), a benchmark that emulates the typical storage workload generated by the Aerospike database. This workload consists of a mix of large-block 128kB reads and writes, and small 1.5kB reads. When the ACT was initially released back in the early days of SATA SSDs, the baseline workload was defined to consist of 2000 reads per second and 1000 writes per second. A drive is considered to pass the test if it meets the following latency criteria:
- fewer than 5% of transactions exceed 1ms
- fewer than 1% of transactions exceed 8ms
- fewer than 0.1% of transactions exceed 64ms
Drives can be scored based on the highest throughput they can sustain while satisfying the latency QoS requirements. Scores are normalized relative to the baseline 1x workload, so a score of 50 indicates 100,000 reads per second and 50,000 writes per second. Since this test uses fixed IO rates, the queue depths experienced by each drive will depend on their latency, and can fluctuate during the test run if the drive slows down temporarily for a garbage collection cycle. The test will give up early if it detects the queue depths growing excessively, or if the large block IO threads can't keep up with the random reads.
We used the default settings for queue and thread counts and did not manually constrain the benchmark to a single NUMA node, so this test produced a total of 64 threads scheduled across all 72 virtual (36 physical) cores.
The usual runtime for ACT is 24 hours, which makes determining a drive's throughput limit a long process. For fast NVMe SSDs, this is far longer than necessary for drives to reach steady-state. In order to find the maximum rate at which a drive can pass the test, we start at an unsustainably high rate (at least 150x) and incrementally reduce the rate until the test can run for a full hour, and then decrease the rate further if necessary to get the drive under the latency limits.
The Samsung 983 ZET outscores its TLC-based sibling 983 DCT, but doesn't beat the throughput that the larger TLC drives provide. There's enough write activity in this test to seriously limit the performance of the Z-SSDs, and so the drives with higher channel counts and much more capacity overall can pull ahead. The Intel Optane SSD DC P4800X still provides the best score on this test, 2.6 times that of the larger 983 ZET and 21% faster than the fastest flash-based SSD we have on hand.
Power Efficiency | Average Power in W |
The 983 ZET and 983 DCT have similar power consumption during this test, and both use substantially less power than the other SSDs in this batch. That doesn't translate to any big advantage in power efficiency, though the larger 983 ZET does provide the second-best efficiency score among the flash-based SSDs. The Optane SSD provides 50% better performance per Watt than any of the flash-based SSDs.
Conclusion
The Samsung 983 ZET and related Z-NAND drives are meant to deliver higher performance than any other flash-based SSD currently available. Thanks to the innate benefits of SLC NAND and Samsung's further efforts to optimize the resulting Z-NAND for reads and writes, the company has put together what is undoubtedly some of the best-performing NAND we've ever seen. But is this enough to give the company and its Z-NAND-based drives and edge over the competition, both flash and otherwise?
Compared to other flash-based enterprise SSDs, the 983 ZET certainly provides better performance than is otherwise possible for drives of such low capacity. The random read performance is unmatched by even the largest and most powerful TLC-based drives we've tested so far. But Z-NAND offers little advantage for sustained write performance, so the small capacity and low overprovisioning ratio of the 983 ZET leaves it at a disadvantage compared to similarly priced TLC drives. However, even when its throughput is unimpressive, the 983 ZET never fails to provide very low latency and excellent QoS that no other current flash-based SSD can beat.
While the 983 ZET is an excellent performer by the standards of flash-based SSDs, those aren't its primary competition. Rather, Intel's Optane SSDs are, and In almost every way the 983 ZET falls short of the Optane drives that motivated Samsung to develop Z-NAND. Samsung wasn't really aiming quite that high with their Z-SSDs, so the more important question is whether the 983 ZET comes close enough, given that it is about 35% cheaper per GB based on current pricing online. (Volume pricing may differ significantly, but is not generally public information.)
Whether the 983 ZET is worthwhile or preferable to the Optane SSD DC P4800X is highly dependent on the workload. The Optane SSD provides great performance on almost any workload regardless of the mix of reads and writes, and latency is low and consistent. Comparatively, the Samsung 983 ZET's strengths are very narrowly concentrated: it is basically all about the random read performance, and its maximum throughput is significantly higher than the Optane SSD while still being attainable with reasonably low latency and queue depths. Otherwise there are some massive TLC-based enterprise SSDs that also get close to 1M random read IOPS, but only with extremely high queue depths. The 983 ZET also offers better sequential read throughput than the Optane SSD, but there are far cheaper drives that can do the same.
The biggest problem for the 983 ZET is that its excellent performance only holds up for extremely read-intensive workloads; it doesn't take many writes to drag performance down. This is because Z-NAND is still afflicted by the need for wear leveling and complicated flash management with very slow block erase operations. On sustained write workloads, those background processes become the bottleneck. Intel's 3D XPoint memory allows in-place modification of data in fine-grained chunks, which is why its write performance doesn't fall off a cliff when the drive fills up. It would be interesting to see how much this performance gap between Z-NAND and 3D XPoint can be alleviated by overprovisioning, but there's not a lot of room to add to the BOM of the 983 ZET before it ends up matching the price of the Optane SSD DC P4800X.
Power efficiency is usually not a big concern for use cases that call for a premium SSD like the 983 ZET or an Optane SSD, but the Samsung 983 ZET does well here, thanks in part to the Samsung Phoenix controller it shares with Samsung's consumer product line. The Phoenix controller is designed to work within the constraints of a M.2 SSD in a battery-powered system, so it uses far less power than most high-end enterprise-only SSD controllers. The 983 ZET does consistently draw a bit more power than the TLC-based 983 DCT, but both still have competitive power efficiency in general. On the random read workloads where the 983 ZET offers unsurpassed performance, it also has a big power efficiency advantage over everything else, including the Intel Optane SSDs.
In the long run, Samsung is still working to develop their own alternative memory technologies; they've publicly disclosed that they are researching Spin-Torque Magnetoresistive RAM (ST-MRAM) and phase change memories, so Z-NAND may end up being more of an interim technology to fill a gap that will hopefully be better served by a new memory in a few years. But in the meantime, Z-NAND does have a niche to compete in, even if it's a bit narrower than the range of use cases that Intel's Optane SSDs are suitable for.