Original Link: https://www.anandtech.com/show/14249/the-intel-optane-memory-h10-review-two-ssds-in-one
The Intel Optane Memory H10 Review: QLC and Optane In One SSD
by Billy Tallis on April 22, 2019 11:50 AM ESTSSD caching has been around for a long time, as a way to reap many of the performance benefits of fast storage without completely abandoning the high capacity and lower prices of slower storage options. In recent years, the fast, small, expensive niche has been ruled by Intel's Optane products using their 3D XPoint non-volatile memory. Intel's third generation of Optane Memory SSD caching products has arrived, bringing the promise of Optane performance to a new product segment. The first Optane Memory products were tiny NVMe SSDs intended to accelerate access to larger slower SATA drives, especially mechanical hard drives. Intel is now supporting using Optane Memory SSDs to cache other NVMe SSDs, with an eye toward the combination of Optane and QLC NAND flash. They've put both types of SSD onto a single M.2 module to create the new Optane Memory H10.
The Intel Optane Memory H10 allows Intel for the first time to put their Optane Memory caching solution into ultrabooks that only have room for one SSD, and have left SATA behind entirely. Squeezing two drives onto a single-sided 80mm long M.2 module is made possible in part by the high density of Intel's four bit per cell 3D QLC NAND flash memory. Intel's 660p QLC SSD has plenty of unused space on the 1TB and 512GB versions, and an Optane cache has great potential to offset the performance and endurance shortcomings of QLC NAND. Putting the two onto one module has some tradeoffs, but for the most part the design of the H10 is very straightforward.
The Optane Memory H10 does not introduce any new ASICs or any hardware to make the Optane and QLC portions of the drive appear as a single device. The caching is managed entirely in software, and the host system accesses the Optane and QLC sides of the H10 independently. Each half of the drive has two PCIe lanes dedicated to it. Earlier Optane Memory SSDs have all been PCIe x2 devices so they aren't losing anything, but the Intel 660p uses a 4-lane Silicon Motion NVMe controller, which is now restricted to just two lanes. In practice, the 660p almost never needed more bandwidth than an x2 link can provide, so this isn't a significant bottleneck.
Intel Optane Memory H10 Specifications | |||||
Advertised Capacity | 256 GB | 512 GB | 1TB | ||
Form Factor | single-sided M.2 2280 | ||||
NAND Controller | Silicon Motion SM2263 | ||||
NAND Flash | Intel 64L 3D QLC | ||||
Optane Controller | Intel SLL3D | ||||
Optane Media | Intel 128Gb 3D XPoint | ||||
QLC NAND Capacity | 256 GB | 512 GB | 1024 GB | ||
Optane Capacity | 16 GB | 32 GB | 32 GB | ||
Sequential Read | 1450 MB/s | 2300 MB/s | 2400 MB/s | ||
Sequential Write | 650 MB/s | 1300 MB/s | 1800 MB/s | ||
Random Read IOPS | 230k | 320k | 330k | ||
Random Write IOPS | 150k | 250k | 250k | ||
L1.2 Idle Power | < 15 mW | ||||
Warranty | 5 years | ||||
Write Endurance | 75 TB 0.16 DWPD |
150 TB 0.16 DWPD |
300 TB 0.16 DWPD |
With a slow QLC SSD and a fast Optane SSD on one device, Intel had to make judgement calls in determining the rated performance specifications. The larger two capacities of H10 are rated for sequential read speeds in excess of 2GB/s, reflecting how Intel's Optane Memory caching software can fetch data from both QLC and Optane portions of the H10 simultaneously. Writes can also be striped, but the maximum rating doesn't exceed any obvious limit for single-device performance. The random IO specs for the H10 fall between the performance of the existing Optane Memory and 660p SSDs, but are much closer to Optane performance. Intel's not trying to advertise a perfect cache hit rate, but they expect it to be pretty good for ordinary real-world usage.
The Optane cache should help reduce the write burden that the QLC portion of the H10 has to bear, but Intel still rates the whole device for the same 0.16 drive writes per day that their 660p QLC SSDs are rated for.
Intel's marketing photos of the Optane Memory H10 show it with a two-tone PCB to emphasize the dual nature of the drive, but in reality it's a solid color. The PCB layout is unique with two controllers and three kinds of memory, but it is also obviously reminiscent of the two discrete products it is based on. The QLC NAND half of the drive is closer to the M.2 connector and features the SM2263 controller and one package each of DRAM and NAND. The familiar Silicon Motion test/debug connections are placed at the boundary between the NAND half and the Optane half. That Optane half contains Intel's small Optane controller, a single package of 3D XPoint memory, and most of the power management components. Both the Intel SSD 660p and the earlier Optane Memory SSDs had very sparse PCBs; the Optane Memory H10 is crowded and may have the highest part count of any M.2 SSD on the market.
On the surface, little has changed with the Optane Memory software; there's just more flexibility now in which devices can be selected to be cached. (Intel has also opened extended Optane Memory support to Pentium and Celeron branded processors on platforms that were already supported with Core processors.) When the boot volume is cached, Intel's software allows the user to specify files and applications that should be pinned to the cache and be immune from eviction. Other than this, there's no room for tweaking of the cache behavior.
Some OEMs that sell systems equipped with Optane Memory have been advertising memory capacities as the sum of DRAM and Optane capacities, which might be reasonable if we were talking about Optane DC Persistent Memory modules that connect to the CPU's memory controller, but is very misleading when the Optane product in question is an SSD. Intel says to blame the OEMs for this misleading branding, but Intel's own Optane Memory software does the same thing.
Initially, the Optane Memory H10 will be an OEM-only part, available to consumers only pre-installed in new systems—primarily notebooks. Intel is considering bringing the H10 to retail both as a standalone product and as part of a NUC kit, but they have not committed to plans for either. Their motherboard partners have been laying the groundwork for H10 support for almost a year, and many desktop 300-series motherboards already support the H10 with the latest publicly available firmware.
Platform Compatibility
Putting two PCIe devices on one M.2 card is novel to say the least. Intel has put two SSD controllers on one PCB before with high-end enterprise drives like the P3608 and P4608, but those drives use PCIe switch chips to split an x8 host connection into x4 for each of the two NVMe controllers on board. That approach leads to a 40W TDP for the entire card, which is not at all useful when trying to work within the constraints of a M.2 card.
There are also several PCIe add-in cards that allow four M.2 PCIe SSDs to be connected through one PCIe x16 slot. A few of these cards also include PCIe switches, but most rely on the host system supporting PCIe port bifurcation to split a single x16 port into four independent x4 ports. Mainstream consumer CPUs usually don't support this, and are generally limited to x8+x4+x4 or just x8+x8 bifurcation, and only when the lanes are being re-routed to different slots to support multi-GPU use cases. Recent server and workstation CPUs are more likely to support bifurcation down to x4 ports, but motherboard support for enabling this functionality isn't universal.
Even on CPUs where an x16 slot can be split into four x4 ports, further bifurcation down to x2 ports is seldom or never possible. The chips that do support operating a lot of PCIe lanes as narrow x2 or x1 ports are the southbridge/PCH chips on most motherboards. These tend to not support ports any wider than x4, because that's the normal width of the connection upstream to the CPU.
Based on the above, we put theory to the test and tried the Optane Memory H10 with almost every PCIe 3.0 port we had on hand, using whatever adapters were necessary. Our results are summarized below:
Intel Optane Memory H10 Platform Compatibility |
||||||
Platform | PCIe Source |
NAND Usable |
Optane Usable |
Optane Memory Caching |
||
Whiskey Lake | PCH | Yes | Yes | Yes | ||
Coffee Lake | CPU | Yes | No | No | ||
PCH | Yes | Yes | No* | |||
Kaby Lake | CPU | Yes | No | No | ||
PCH | Yes | No | No | |||
Skylake | CPU | Yes | No | No | ||
PCH | Yes | No | No | |||
Skylake-SP (Purley) |
CPU | Yes | No | No | ||
PCH | Yes | No | No | |||
Threadripper | CPU | No | Yes | No | ||
Avago PLX Switch | Yes | No | No | |||
Microsemi PFX Switch | No | Yes | No |
The Whiskey Lake notebook Intel provided for this review is of course fully compatible with the Optane Memory H10, and will be available for purchase in this configuration soon. Compatibility with older platforms and non-Intel platforms is mostly as expected, with only the NAND side of the H10 accessible—those motherboards don't expect to find two PCIe devices sharing a physical M.2 x4 slot, and aren't configured to detect and initialize both devices. There are a few notable exceptions:
First, the H370 motherboard in our Coffee Lake system is supposed to fully support the H10, but GIGABYTE botched the firmware update that claims to have added H10 support: both the NAND and Optane portions of the H10 are accessible when using a M.2 slot that connects to the PCH, but it isn't possible to enable caching. There are plenty of 300-series motherboards that have successfully added H10 support, and I'm sure GIGABYTE will release a fixed firmware update for this particular board soon. Putting the H10 into a PCIe x16 slot that connects directly to the CPU does not provide access to the Optane side, reflecting the CPU's lack of support for PCIe port bifurcation down to x2+x2.
The only modern AMD system we had on hand was a Threadripper/X399 motherboard. All of the PCIe and M.2 slots we tried led to the Optane side of the H10 being visible instead of the NAND side.
We also connected the H10 through two different brands of PCIe 3.0 switch. Avago's PLX PEX8747 switch only provided access to the NAND side, which is to be expected since it only supports PCIe port bifurcation down to x4 ports. The Microsemi PFX PM8533 switch does claim to support bifurcation down to x2 and we were hoping it would enable access to both sides of the H10, but instead we only got access to the Optane half. The Microsemi switch and Threadripper motherboard may both be just a firmware update away from working with both halves of the H10, and earlier Intel PCH generations might also have that potential, but Intel won't be providing any such updates. Even if these platforms were able to access both halves of the H10, they would not be supported by Intel's Optane Memory caching drivers, but third-party caching software exists.
Test Setup
Our primary system for consumer SSD testing is a Skylake desktop. This is equipped with a Quarch XLC Power Module for detailed SSD power measurements and is used for our ATSB IO trace tests and synthetic benchmarks using FIO. This system predates all of the Optane Memory products, and Intel and their motherboard partners did not want to roll out firmware updates to provide Optane Memory caching support on Skylake generation systems. Using this testbed, we can only access the QLC NAND half of the Optane Memory H10.
As usual for new Optane Memory releases, Intel sent us an entire system with the new Optane Memory H10 pre-installed and configured. This year's review system is an HP Spectre x360 13t notebook with an Intel Core i7-8565U Whiskey Lake processor and 16GB of DDR4. In previous years Intel has provided desktop systems for testing Optane Memory products, but the H10's biggest selling point is that it is a single M.2 module that fits in small systems, so the choice of a 13" notebook this year makes sense. Intel has confirmed that the Spectre x360 will soon be available for purchase with the Optane Memory H10 as one of the storage options.
The HP Spectre x360 13t has only one M.2 type-M slot, so in order to test multi-drive caching configurations or anything involving SATA, we made use of the Coffee Lake and Kaby Lake systems Intel provided for previous Optane Memory releases. For application benchmarks like SYSmark and PCMark, the scores are heavily influenced by the differences in CPU power and RAM between these machines so we have to list three sets of scores for each storage configuration tested. However, our AnandTech Storage Bench IO trace tests and our synthetic benchmarks using FIO produce nearly identical results across all three of these systems, so we can make direct comparisons and each test only needs to list one set of scores for each storage configuration.
Intel-provided Optane Memory Review Systems | |||
Platforn | Kaby Lake | Coffee Lake | Whiskey Lake |
CPU | Intel Core i5-7400 | Intel Core i7-8700K | Intel Core i7-8565U |
Motherboard | ASUS PRIME Z270-A | Gigabyte Aorus H370 Gaming 3 WiFi | HP Spectre x360 13t |
Chipset | Intel Z270 | Intel H370 | |
Memory | 2x 4GB DDR4-2666 | 2x 8GB DDR4-2666 | 16GB DDR4-2400 |
Case | In Win C583 | In Win C583 | |
Power Supply | Cooler Master G550M | Cooler Master G550M | HP 65W USB-C |
Display Resolution |
1920x1200 (SYSmark) 1920x1080 (PCMark) |
1920x1080 | 1920x1080 |
OS | Windows 10 64-bit, version 1803 |
Intel's Optane Memory caching software is Windows-only, so our usual Linux-based synthetic testing with FIO had to be adapted to run on Windows. The configuration and test procedure is as close as practical to our usual methodology, but a few important differences mean the results in this review are not directly comparable to those from our usual SSD reviews or the results posted in Bench. In particular, it is impossible to perform a secure erase or NVMe format from within Windows except in the rare instance where a vendor provides a tool that only works with their drives. Our testing usually involves erasing the drive between major phases in order to restore performance without waiting for the SSD's background garbage collection to finish cleaning up and freeing up SLC cache. For this review's Windows-based synthetic benchmarks, the tests that write the least amount of data were run first, and those that require filling the entire drive were saved for last.
Optane Memory caching also requires using Intel's storage drivers. Our usual procedure for Windows-based tests is to use Microsoft's own NVMe driver rather than bother with vendor-specific drivers. The tests of Optane caching configurations in this review were conducted with Intel's drivers, but all single-drive tests (including tests of just one side of the Optane Memory H10) use the Windows default driver.
Our usual Skylake testbed is setup to test NVMe SSDs in the primary PCIe x16 slot connected to the CPU. Optane Memory caching requires that the drives be connected through the chipset, so there's a small possibility that congestion on the x4 DMI link could have an effect on the fastest drives, but the H10 is unlikely to come close to saturating this connection.
We try to include detailed power measurements alongside almost all of our performance tests, but this review is missing most of those. Our current power measurement equipment is unable to supply power to a M.2 slot in a notebook and requires a regular PCIe x4 slot for the power injection fixture. We have new equipment on the way from Quarch to remedy this limitation and will post an article about the upgrade after taking the time to re-test the drives in this review with power measurement on the HP notebook.
Application Benchmarks
With a complex multi-layer storage system like the Intel Optane Memory H10, the most accurate benchmarks will be tests that use real-world applications. BAPCo's SYSmark 2018 and UL's PCMark 10 are two competing suites of automated application benchmarks. Both share the general goal of assigning a score to represent total system performance, plus several subscores covering different common use cases. PCMark 10 is the shorter test to run and it provides a more detailed breakdown of subscores. It is also much more GPU-heavy with 3D rendering included in the standard test suite and some 3DMark tests included in the Extended test. SYSmark 2018 has the advantage of using the full commercial versions of popular applications including Microsoft Office and Adobe Creative Suite, and it integrates with a power meter to record total system energy usage over the course of the test.
The downside of these tests is that they cover only the most common everyday use cases, and do not simulate any heavy multitasking. None of their subtests are particularly storage-intensive, so most scores only vary slightly when changing between fast and slow SSDs.
BAPCo SYSmark 2018
BAPCo's SYSmark 2018 is an application-based benchmark that uses real-world applications to replay usage patterns of business users, with subscores for productivity, creativity and responsiveness. Scores represnt overall system performance and are calibrated against a reference system that is defined to score 1000 in each of the scenarios. A score of, say, 2000, would imply that the system under test is twice as fast as the reference system.
Creativity | Productivity | Responsiveness | Overall |
The Kaby Lake desktop and Whiskey Lake notebook trade places depending on the subtest; sometimes the notebook is ahead thanks to its extra RAM, and sometimes the desktop is ahead thanks to its higher TDP. These differences usually have a bigger impact than choice of storage, though the Responsiveness test does show that a hard drive alone is inadequate. The Optane Memory H10's score with caching on is not noticeably better than when using the QLC portion alone, and even the hard drive with an Optane cache is fairly competitive with the all-solid state storage configurations.
Energy Usage
The SYSmark energy usage scores measure total system power consumption, excluding the display. Our Kaby Lake test system idles at around 26 W and peaks at over 60 W measured at the wall during the benchmark run. SATA SSDs seldom exceed 5 W and idle at a fraction of a watt, and the SSDs spend most of the test idle. This means the energy usage scores will inevitably be very close. The notebook uses substantially less power despite this measurement including the display. None of the really power-hungry storage options (hard drives, Optane 900P) can fit in this system, so the energy usage scores are also fairly close together.
The Optane Memory H10 was the most power-hungry M.2 option, and leaving the Optane cache off saves a tiny bit of power but not enough to catch up with the good TLC-based drives. The Optane SSD 800P has better power efficiency than most of the flash-based drives, but its low capacity is a hindrance for real-world use.
UL PCMark 10
Subscore: |
The Optane cache provides enough of a boost to PCMark 10 Extended scores to bring the H10 into the lead among the M.2 SSDs tested on the Whiskey Lake notebook. The Essentials subtests show the most impact from the Optane storage while the more compute-heavy tasks are relatively unaffected, with the H10 performing about the same with or without caching enabled.
Whole-Drive Fill
This test starts with a freshly-erased drive and fills it with 128kB sequential writes at queue depth 32, recording the write speed for each 1GB segment. This test is not representative of any ordinary client/consumer usage pattern, but it does allow us to observe transitions in the drive's behavior as it fills up. This can allow us to estimate the size of any SLC write cache, and get a sense for how much performance remains on the rare occasions where real-world usage keeps writing data after filling the cache.
During a sustained write, the Optane cache on the Intel Optane Memory H10 doesn't change the situation much from how the QLC-only Intel 660p behaves—the Optane cache on its own is only good for about 350MB/s. The SLC write cache on the NAND side is a more important factor that helps sustain high write speed far beyond the 32GB size of the Optane cache. But eventually, all the caches fill up and the very slow write speed of raw QLC takes over.
Average Throughput for last 16 GB | Overall Average Throughput |
The overall average write speed when completely filling the Optane Memory H10 is unsurprisingly lower than any of the other drives in this batch. The 1TB Intel 660p was already a bit slower than a 7200RPM hard drive, and our H10 sample has half as much QLC to work with.
Working Set Size
The Optane cache on the H10 is 32GB, but when testing random reads it appears to only be good for about 6-8GB working sets before the cache starts thrashing and performance drops down to roughly what a QLC-only drive can offer. It appears that Intel may be reserving a large portion of the Optane cache to serve as a write buffer, and this might be detrimental to the most read-intensive workloads.
AnandTech Storage Bench - The Destroyer
The Destroyer is an extremely long test replicating the access patterns of very IO-intensive desktop usage. A detailed breakdown can be found in this article. Like real-world usage, the drives do get the occasional break that allows for some background garbage collection and flushing caches, but those idle times are limited to 25ms so that it doesn't take all week to run the test. These AnandTech Storage Bench (ATSB) tests do not involve running the actual applications that generated the workloads, so the scores are relatively insensitive to changes in CPU performance and RAM from a new testbed, but the jump to a newer version of Windows and newer storage drivers can have an impact.
We quantify performance on this test by reporting the drive's average data throughput, the average latency of the I/O operations, and the total energy used by the drive over the course of the test.
The Intel Optane Memory H10 actually performs better overall on The Destroyer with caching disabled and the Optane side of the drive completely inactive. This test doesn't leave much time for background optimization of data placement, and the total amount of data moved is vastly larger than what fits into a 32GB Optane cache. The 512GB of QLC NAND doesn't have any performance to spare for cache thrashing.
The QLC side of the Optane Memory H10 has poor average and 99th percentile latency scores on its own, and throwing in an undersized cache only makes it worse. Even the 7200RPM hard drive scores better.
The average read latencies for the Optane Memory H10 are worse than all the TLC-based drives, but much better than the hard drive with or without an Optane cache in front of it. For writes, the H10's QLC drags it into last place once the SLC cache runs out.
The Optane cache does help the H10's 99th percentile read latency, bringing it up to par with the Crucial MX500 SATA SSD and well ahead of the larger QLC-only 1TB 660p. The 99th percentile write latency is horrible, but even with the cache thrashing causing excess writes, the H10 isn't quite as badly off as the DRAMless Toshiba RC100.
AnandTech Storage Bench - Heavy
Our Heavy storage benchmark is proportionally more write-heavy than The Destroyer, but much shorter overall. The total writes in the Heavy test aren't enough to fill the drive, so performance never drops down to steady state. This test is far more representative of a power user's day to day usage, and is heavily influenced by the drive's peak performance. The Heavy workload test details can be found here. This test is run twice, once on a freshly erased drive and once after filling the drive with sequential writes.
On the Heavy test, the caching unambiguously helps the Intel Optane Memory H10, bringing its average data rate up into the range of decent TLC-based NVMe SSDs, when the test is run on an empty drive. The full-drive performance is still better with the cache than without, but ultimately the post-SLC behavior of the QLC NAND cannot be hidden by the Optane. None of the TLC-based drives slow down when full as much as the QLC drives do.
The average and 99th percentile latency scores for the H10 are competitive with TLC drives only when the test is run on an empty drive. When the Heavy test is run on a full drive with a full SLC cache and cold Optane cache, latency is worse than even the hard drive with an Optane cache. The average latency for the H10 in the full-drive case is still substantially better than using the QLC portion alone, but the Optane cache doesn't help the 99th percentile latency at all.
Average read latencies from the H10 are significantly worse when the Heavy test is run on a full drive, but it's still slightly better than the SATA SSD. The average write latencies are where the QLC stands out, with a full H10 scoring worse than the hard drive, and with the Optane caching disabled write latency is ten times higher than for a TLC SSD.
The 99th percentile read latency of the H10 with Optane caching off is a serious problem during the full-drive test run, but using the Optane cache brings read QoS back into the decent range for SSDs. The 99th percentile write latency is bad without the Optane cache and worse with it.
AnandTech Storage Bench - Light
Our Light storage test has relatively more sequential accesses and lower queue depths than The Destroyer or the Heavy test, and it's by far the shortest test overall. It's based largely on applications that aren't highly dependent on storage performance, so this is a test more of application launch times and file load times. This test can be seen as the sum of all the little delays in daily usage, but with the idle times trimmed to 25ms it takes less than half an hour to run. Details of the Light test can be found here. As with the ATSB Heavy test, this test is run with the drive both freshly erased and empty, and after filling the drive with sequential writes.
The Intel Optane Memory H10 is generally competitive with other low-end NVMe drives when the Light test is run on an empty drive, though the higher performance of the QLC portion on its own indicates that the H10's score is probably artificially lowered by starting with a cold Optane cache. The full-drive performance is worse than almost all of the TLC-based SSDs, but is still significantly better than a hard drive without any Optane cache.
The average and 99th percentile latencies from the Optane Memory H10 are competitive with TLC NAND when the test is run on an empty drive, and even with a full drive the latency scores remain better than a mechanical hard drive.
The average write latency in the full-drive run is the only thing that sticks out and identifies the H10 as clearly different than other entry-level NVMe drives, but the TLC-based DRAMless Toshiba RC100 is even worse in that scenario.
Unlike the average latencies, both the read and write 99th percentile latency scores for the Optane H10 show that it struggles greatly when full. The Optane cache is not nearly enough to make up for running out of SLC cache.
Random Read Performance
Our first test of random read performance uses very short bursts of operations issued one at a time with no queuing. The drives are given enough idle time between bursts to yield an overall duty cycle of 20%, so thermal throttling is impossible. Each burst consists of a total of 32MB of 4kB random reads, from a 16GB span of the disk. The total data read is 1GB.
The burst random read test easily fits within the Optane cache on the Optane Memory H10, so it outperforms all of the flash-based SSDs, but is substantially slower than the pure Optane storage devices.
Our sustained random read performance is similar to the random read test from our 2015 test suite: queue depths from 1 to 32 are tested, and the average performance and power efficiency across QD1, QD2 and QD4 are reported as the primary scores. Each queue depth is tested for one minute or 32GB of data transferred, whichever is shorter. After each queue depth is tested, the drive is given up to one minute to cool off so that the higher queue depths are unlikely to be affected by accumulated heat build-up. The individual read operations are again 4kB, and cover a 64GB span of the drive.
On the longer random read test that covers a wider span of the disk than the Optane cache can manage, the H10's performance is on par with the TLC-based SSDs.
The Optane cache provides little benefit over pure QLC storage at lower queue depths, but at the higher queue depths the H10 with caching enabled starts to develop a real lead over the QLC portion on its own. Unfortunately, but the time queue depths are this high, the flash-based SSDs have all surpassed the H10's random read throughput.
Random Write Performance
Our test of random write burst performance is structured similarly to the random read burst test, but each burst is only 4MB and the total test length is 128MB. The 4kB random write operations are distributed over a 16GB span of the drive, and the operations are issued one at a time with no queuing.
The burst random write performance of the H10 with caching enabled is better than either half of the drive can manage on its own, but far less than the sum of its parts. A good SLC write cache on a TLC drive is still better than the Optane caching on top of QLC.
As with the sustained random read test, our sustained 4kB random write test runs for up to one minute or 32GB per queue depth, covering a 64GB span of the drive and giving the drive up to 1 minute of idle time between queue depths to allow for write caches to be flushed and for the drive to cool down.
On the longer random write test that covers a much wider span than the Optane cache can handle, the Optane Memory H10 falls behind all of the flash-based competition. The caching software ends up creating more work that drags performance down far below what the QLC portion can manage with just its SLC cache.
Random write performance on the Optane Memory H10 is unsteady but generally trending downward as the test progresses. Two layers of caching getting in each others way is not a good recipe for consistent sustained performance.
Sequential Read Performance
Our first test of sequential read performance uses short bursts of 128MB, issued as 128kB operations with no queuing. The test averages performance across eight bursts for a total of 1GB of data transferred from a drive containing 16GB of data. Between each burst the drive is given enough idle time to keep the overall duty cycle at 20%.
The burst sequential read performance of the Optane Memory H10 is much lower than what the high-end TLC-based drives provide, but it is competitive with the other low-end NVMe drives that are limited to PCIe 3 x2 links. The Optane Memory caching is only responsible for about a 10% speed increase over the raw QLC speed, so this is obviously not one of the scenarios where the caching drivers can effectively stripe access between the Optane and NAND.
Our test of sustained sequential reads uses queue depths from 1 to 32, with the performance and power scores computed as the average of QD1, QD2 and QD4. Each queue depth is tested for up to one minute or 32GB transferred, from a drive containing 64GB of data. This test is run twice: once with the drive prepared by sequentially writing the test data, and again after the random write test has mixed things up, causing fragmentation inside the SSD that isn't visible to the OS. These two scores represent the two extremes of how the drive would perform under real-world usage, where wear leveling and modifications to some existing data will create some internal fragmentation that degrades performance, but usually not to the extent shown here.
On the longer sequential read test, the Optane caching is still not effectively combining the performance of the Optane and NAND halves of the H10. However, when reading back data that was not written sequentially, the Optane cache is a significant help.
The Optane cache is a bit of a hindrance to sequential reads at low queue depths on this test, but at QD8 and higher it provides some benefit over using just the QLC.
Sequential Write Performance
Our test of sequential write burst performance is structured identically to the sequential read burst performance test save for the direction of the data transfer. Each burst writes 128MB as 128kB operations issued at QD1, for a total of 1GB of data written to a drive containing 16GB of data.
The burst sequential write speed of 32GB of Optane on its own is quite poor, so this is a case where the QLC NAND is significantly helping the Optane on the H10. The SLC write cache on the H10's QLC side is competitive with those on the TLC-based drives, but when the caching software gets in the way the H10 ends up with SATA-like performance.
Our test of sustained sequential writes is structured identically to our sustained sequential read test, save for the direction of the data transfers. Queue depths range from 1 to 32 and each queue depth is tested for up to one minute or 32GB, followed by up to one minute of idle time for the drive to cool off and perform garbage collection. The test is confined to a 64GB span of the drive.
The story is pretty much the same on the longer sequential write test, though some of the other low-end NVMe drives have fallen far enough that the Optane Memory H10's score isn't a complete embarrassment. However, the QLC portion on its own is still doing a better job of handling sustained sequential writes than the caching configuration.
There's no clear trend in performance for the H10 during the sustained sequential write test. It is mostly performing between the levels of the QLC and Optane portions, which means the caching software is getting in the way rather than allowing the two halves to work together and deliver better performance than either one individually. It's possible that with more idle time to clear out the Optane and SLC caches we would see drastically different behavior here.
Mixed Random Performance
Our test of mixed random reads and writes covers mixes varying from pure reads to pure writes at 10% increments. Each mix is tested for up to 1 minute or 32GB of data transferred. The test is conducted with a queue depth of 4, and is limited to a 64GB span of the drive. In between each mix, the drive is given idle time of up to one minute so that the overall duty cycle is 50%.
The performance of the Optane Memory H10 on the mixed random IO test is worse than either half of the drive provides on its own. The test covers a wider span than the 32GB Optane cache can handle, so the caching software's attempts to help end up being detrimental.
The QLC portion of the H10 performs similarly to the Optane caching configuration during the read-heavy half of the test, though the caching makes performance less consistent. During the write heavy half of the test, the QLC-only configuration picks up significant speed over the Optane caching setup, until its SLC cache starts to run out at the very end.
Mixed Sequential Performance
Our test of mixed sequential reads and writes differs from the mixed random I/O test by performing 128kB sequential accesses rather than 4kB accesses at random locations, and the sequential test is conducted at queue depth 1. The range of mixes tested is the same, and the timing and limits on data transfers are also the same as above.
The Optane Memory H10 averages a bit better than SATA SSDs on the mixed sequential IO test, but there's a significant gap between the H10 and the high-end TLC-based drives. This is another scenario where the Optane caching software can't find a way to consistently help, and the H10's overall performance is a bit lower than it would have been relying on just the QLC NAND with its SLC cache.
The caching software contributes to inconsistent performance for the Optane Memory H10 but the general trend is toward lower performance as the workload becomes more write heavy. The QLC portion on its own is able to increase speed during the second half of the test because it is quite effective at combining writes.
Conclusion
The idea behind the Optane Memory H10 is quite intriguing. QLC NAND needs a performance boost to be competitive against mainstream TLC-based SSDs, and Intel's 3D XPoint memory is still by far the fastest non-volatile storage on the market. Unfortunately, there are too many factors weighing down the H10's potential. It's two separate SSDs on one card, so the NAND side of the drive still needs some DRAM that adds to the cost. The caching is entirely software managed, so the NAND SSD controller and the Optane controller cannot coordinate with each other and Intel's caching software sometimes struggles to make good use of both portions of the drive simultaneously.
Some of these challenges are exacerbated by benchmarking conditions; our test suite was designed with SLC write caching in mind but not two layers of cache that are sometimes functioning more like a RAID-0. None of our synthetic benchmarks managed to trigger that bandwidth aggregation between the Optane and NAND portions of the H10. Intel cautions that they have only optimized their caching algorithms for real-world storage patterns, and it is easy to see how some of our tests have differences that may be very significant. (In particular, many of our tests only give the system the opportunity to use block-level caching, but Intel's software can also perform file-level caching.) But this only emphasizes that the Optane Memory H10 is not a one size fits all storage solution.
For the heaviest, most write-intensive workloads, putting a small Optane cache in front of the QLC NAND only postpones the inevitable performance drops. In some cases, trying to keep the right data in the cache causes more performance issues than it solves. However, the kind of real-world workloads that generate that much IO are unlikely to run well on a 15W notebook CPU anyways. The Optane cache doesn't magically transform a low-end SSD into a top of the line drive, and the Optane Memory H10 is probably never going to be a good choice for desktops that can easily accommodate a wider range of storage options than a thin ultrabook.
On lighter workloads that are more typical of what an ultrabook is good for, the Optane Memory H10 is generally competitive with other low-end NVMe offerings and in good conditions it can be more responsive than any NAND flash-only drive. For everyday use, the H10 is certainly preferable over a QLC-only drive, but against TLC-based drives it's a tough sell. We haven't had the chance to perform detailed power measurements of the Optane Memory H10, but there's little chance it can provide better battery life than the best TLC-based SSDs.
If Intel is serious about making QLC+Optane caching work well enough to compete against TLC-only drives, they'll have to do better than the Optane Memory H10. TLC-only SSDs will almost always have a more consistent performance profile than a tiered setup. The Optane cache on the H10 doesn't soften the rough edges enough to make it suitable for heavy workloads, and it doesn't enhance the performance on light workloads enough to give the H10 a significant advantage over the best TLC drives. When the best-case performance of even a QLC SSD is solidly in "fast enough" territory thanks to SLC caching, the focus should be on improving the worst case, not on optimizing use cases that already feel almost instantaneous.
Optane has found great success in some segments of the datacenter storage market, but in the consumer market it's still looking for the right niche. QLC NAND is also still relatively unproven, though recently it has finally started to deliver on the promise of meaningfully lower prices. The combination of QLC and Optane might still be able to produce an impressive consumer product, but it will take more work from Intel than this relatively low-effort product.