Original Link: https://www.anandtech.com/show/9396/samsung-sm951-nvme-256gb-pcie-ssd-review
Samsung SM951-NVMe (256GB) PCIe SSD Review
by Kristian Vättö on June 25, 2015 9:40 AM ESTPCIe and especially NVMe SSDs are without a doubt the hot topic in the SSD industry at the moment. There are still only a handful of drives on the retail market, but as we saw at Computex a few weeks ago, everyone is working closely on their PCIe designs and we should see more entries to the market later this year with a big wave of PCIe SSDs arriving in the first half of 2016.
Samsung has always been an early adopter in the SSD space. The company was the first one on the market with a PCIe 2.0 x4 M.2 SSD the (XP941) back in late 2013, and before that it was the first one to adopt TLC NAND in 2012. Earlier this year Samsung's second generation client PCIe drive, the SM951, made an appearance in a Lenovo laptop, but to everyone's surprise the drive wasn't NVMe compatible like Samsung had announced earlier. After discussing with Samsung, the company said they it has an NVMe client drive in development, but it declined to provide any reasoning as to why the SM951 still used the AHCI driver stack.
To our surprise, Ganesh found an NVMe-enabled Samsung M.2 SSD inside Intel's Broadwell-U NUC a while back. This was rather confusing at first because Samsung had specifically told us that the SM951 doesn't support NVMe, but after a closer look and a series of emails with Samsung the drive turned out to be an NVMe version of the SM951, or SM951-NVMe as Samsung calls it.
Distinguishing the AHCI and NVMe version from each other isn't very simple as the difference lies in a single character in the model number. The AHCI version carries the code MZ-HPVxxx0 (where xxx is the capacity in gigabytes), whereas the NVMe version is called MZ-VPVxxx0. Since both versions of the SM951 are technically OEM-only, the close naming isn't really an issue, but if you are shopping for the SM951 I recommend that you take a close look at the part number before pulling the trigger to ensure that you get the version you are looking for.
SM951-NVMe on the left, SM951-AHCI on the right
On the hardware side the AHCI and NVMe versions of the SM951 are a match. Both utilize Samsung's S4LN058A01-8030 controller dubbed as UBX, which is a PCIe 3.0 x4 controller that apparently supports both AHCI and NVMe driver stacks. That isn't surprising, though, because nearly all client-grade NVMe controllers I know are capable of supporting both -- it's just a matter of developing two separate firmware builds. The firmware development is likely the reason why the SM951-AHCI was the first one to market because Samsung already had the basic AHCI firmware from its XP941 and SATA drives, whereas the SM951-NVMe needed more development from scratch given how different and more efficient the NVMe command set is.
Samsung SM951 NVMe Specifications | ||||
Capacity | 128GB | 256GB | 512GB | |
Form Factor | M.2 2280 | |||
Controller | Samsung S4LN058A01 (PCIe 3.0 x4 NVMe) | |||
NAND | Samsung 16nm 64Gbit MLC | |||
Sequential Read | 2,000MB/s | 2,150MB/s | 2,150MB/s | |
Sequential Write | 650MB/s | 1,260MB/s | 1,550MB/s | |
4KB Random Read | 300K IOPS | 300K IOPS | 300K IOPS | |
4KB Random Write | 83K IOPS | 100K IOPS | 100K IOPS | |
Encryption | N/A |
Similar to the AHCI version, the SM951-NVMe only comes in capacities of up to 512GB. The reason lies in NAND because the SM951 utilizes 64Gbit dies, and with only four NAND package placements on the M.2 2280 PCB the maximum capacity with 16 dies per package works out to be 512GB (8GB x 16 x 4). It seems that Samsung doesn't have a high volume 128Gbit MLC die at this point, although we will likely see one with third generation V-NAND later this year. The first generation V-NAND die is 128Gbit, but since it only has 24 layers it's not cost efficient for a client drive, especially not for an OEM-specific one given how cost sensitive PC OEMs are.
TechInsights found out that the NAND in the SM951 (both AHCI and NVMe) is actually 16nm. While I was aware of the change in generation character of the part number, I believed that it would just be a second generation of Samsung's 19nm die because to me it didn't make any sense that Samsung would build a 16nm 64Gbit die. I'm working on an article comparing all the modern 15-16nm NAND processes, so stay tuned for more in-depth analysis of Samsung's 16nm node.
Boot Support
One of the major questions with every PCIe SSD is whether it is bootable. Back when the XP941 became available the situation was rather messy because motherboard OEMs had not prepared for PCIe SSDs yet, which require BIOS/UEFI support from their side in order to show up in the boot menu. Fortunately, most OEMs fixed this for 9-series motherboards and now most models have a BIOS update available with proper support for PCIe and NVMe SSDs.
In short, the SM951 NVMe is bootable in my ASUS Z97 Deluxe with the latest 2401 BIOS. I don't have any other 9-series motherboards at hand, but I suspect that any motherboard with advertised NVMe support and appropriate BIOS will boot from the SM951 NVMe.
For tower Mac Pro users the story isn't as pleasant, though. I put the SM951 NVMe inside my 2009 Mac Pro, but OS X wouldn't even recognize the drive. Despite the fact that the custom Apple SSD inside the MacBook is NVMe based, I suspect that the current version of OS X doesn't carry a general NVMe driver, and even if it did the Mac Pro and its chipset might simply be too old to support NVMe, which honestly isn't surprising for a +5-year-old system. In any case, Mac Pro users can still buy and boot from the AHCI version of SM951, but I wouldn't hold my breath for any NVMe support in the future.
Availability
The SM951-NVMe is an OEM part, meaning that availability is very restricted. The drive is listed by a handful of online retailers, but none of them seem to have it in stock yet. RamCity is expecting stock in mid-July, but told us that even that is uncertain because its distributors are still saying that the NVMe version is in sampling stage with no schedule for high volume availability. We got our 256GB sample directly from Samsung, hence the early access, as it seems that there is no way to buy the SM951-NVMe at this point. I will provide an update when I hear more about the availability.
AnandTech 2015 SSD Test System | |
CPU | Intel Core i7-4770K running at 3.5GHz (Turbo & EIST enabled, C-states disabled) |
Motherboard | ASUS Z97 Deluxe (BIOS 2205) |
Chipset | Intel Z97 |
Chipset Drivers | Intel 10.0.24+ Intel RST 13.2.4.1000 |
Memory | Corsair Vengeance DDR3-1866 2x8GB (9-10-9-27 2T) |
Graphics | Intel HD Graphics 4600 |
Graphics Drivers | 15.33.8.64.3345 |
Desktop Resolution | 1920 x 1080 |
OS | Windows 8.1 x64 |
- Thanks to Intel for the Core i7-4770K CPU
- Thanks to ASUS for the Z97 Deluxe motherboard
- Thanks to Corsair for the Vengeance 16GB DDR3-1866 DRAM kit, RM750 power supply, Hydro H60 CPU cooler and Carbide 330R case
Thermal Throttling Revisited
When we first tested the SM951-AHCI in February, I noted that the drive seems to be suffering from thermal throttling when subjected to sustained workloads, especially sequential writes. I promised to run tests with a heatsink attached to see what the performance would be without any thermal limitations and now I have some results to present.
For these tests I used the stock 512GB SM951-AHCI and borrowed the M.2 to PCIe adapter with a heatsink from Plextor's M6e Black Edition. Unfortunately I had to send my M6e samples back before I could test the SM951-NVMe, but the purpose of these tests is more to show the impact of thermal throttling in actual client workloads rather than demonstrate the maximum peak performance.
Samsung SM951-AHCI 512GB Performance | |||
With Heatsink | Without Heatsink | Performance Delta | |
The Destroyer (Data Rate) | 471.53MB/s | 455.65MB/s | -3.4% |
The Destroyer (Latency) | 1323.6µs | 1388.4µs | -4.9% |
Heavy (Data Rate) | 802.42MB/s | 802.17MB/s | 0.0% |
Heavy (Latency) | 180.26µs | 181.39µs | -0.6% |
Light (Data Rate) | 1,250MB/s | 1,240MB/s | -1.0% |
Light (Latency) | 69.08µs | 69.19µs | -0.2% |
It's clear that the impact of thermal throttling in real world workloads is insignificant. In a worst case scenario where the drive is under a heavy IO workload the performance loss can be 5%, but in anything less intensive the difference is within the margin of error. Even though we truncate idle times to 25µs, it's enough to lighten the workload and reduce thermal throttling compared to a sustained synthetic workload.
Under a sustained 4KB random write workload the difference is more significant as without the heatsink the SM951-AHCI averages 7,878 IOPS, whereas the heatsink bumps that up to 10,873 IOPS.
The same goes for sequential write where throttling is evident and even more significant compared to the random write workload. Without the heatsink the SM951 can sustain peak throughput for about two minutes, which may not sound long but at 1.5GB/s that would translate to 180GB of data written and obviously such massive transfers are very rare.
To sum things up, there is no need to worry about thermal throttling under typical client workloads. There won't be any notable performance loss unless you subject the drive under an intensive sustained workload, which may be relevant to some professional users (e.g. high-end video editing), but not for the typical enthusiast and power user. If you want to ensure that your SM951 operates at full performance at all times, it's not a bad idea to get an adapter with a heatsink, but there is no loss in running the drive without one.
Performance Consistency
We've been looking at performance consistency since the Intel SSD DC S3700 review in late 2012 and it has become one of the cornerstones of our SSD reviews. Back in the days many SSD vendors were only focusing on high peak performance, which unfortunately came at the cost of sustained performance. In other words, the drives would push high IOPS in certain synthetic scenarios to provide nice marketing numbers, but as soon as you pushed the drive for more than a few minutes you could easily run into hiccups caused by poor performance consistency.
Once we started exploring IO consistency, nearly all SSD manufacturers made a move to improve consistency and for the 2015 suite, I haven't made any significant changes to the methodology we use to test IO consistency. The biggest change is the move from VDBench to Iometer 1.1.0 as the benchmarking software and I've also extended the test from 2000 seconds to a full hour to ensure that all drives hit steady-state during the test.
For better readability, I now provide bar graphs with the first one being an average IOPS of the last 400 seconds and the second graph displaying the standard deviation during the same period. Average IOPS provides a quick look into overall performance, but it can easily hide bad consistency, so looking at standard deviation is necessary for a complete look into consistency.
I'm still providing the same scatter graphs too, of course. However, I decided to dump the logarithmic graphs and go linear-only since logarithmic graphs aren't as accurate and can be hard to interpret for those who aren't familiar with them. I provide two graphs: one that includes the whole duration of the test and another that focuses on the last 400 seconds of the test to get a better scope into steady-state performance.
Steady-state performance isn't mind blowing, although it's a little unfair to compare the 256GB SM951-NVMe against a 512GB SM951-AHCI because available NAND throughput can play a major role in steady-state performance with a properly designed controller and firmware.
The mediocre steady-state performance is replaced by outstanding consistency, though. During the last 400 seconds, the SM951-NVMe has standard deviation of only 1.12, whereas the next drives are in the order of hundreds with many drives surpassing 1,000.
Default |
As there's practically no variation in performance, the graph is just a straight line after the initial cliff. There are zero up and down swings, which is something I've yet to see even in an enterprise SSD. Samsung has always done well in IO consistency, but in all honesty the SM951-NVMe sets a new bar for consistency. On the downside, I would rather take some variation with higher performance because especially for client workloads such level of consistency isn't really needed, but increased performance can always help with intensive IO workloads. Nevertheless, I'm very happy with the direction Samsung is taking because we still see some controller vendors not paying enough attention to consistency, but it's clearly a high priority for Samsung.
Unfortunately I couldn't run any tests with added over-provisioning because the hdparm commands I use for limiting the capacity do not work with NVMe drives. There are similar commands for NVMe drives too, but for now there is no publicly available utility for issuing those commands.
Default |
AnandTech Storage Bench - The Destroyer
The Destroyer has been an essential part of our SSD test suite for nearly two years now. It was crafted to provide a benchmark for very IO intensive workloads, which is where you most often notice the difference between drives. It's not necessarily the most relevant test to an average user, but for anyone with a heavier IO workload The Destroyer should do a good job at characterizing performance. For full details of this test, please refer to this article.
In our The Destroyer trace, the SM951 NVMe is faster than the AHCI version despite having only half the NAND, but it still gets beaten by the SSD 750 (although the SSD 750 has more NAND as well). As I mentioned in the review, the SSD 750 has excellent small IO performance under intensive IO loads, resulting in much lower latency than what the SM951 offers, but since it performs more poorly with sequential IOs the average data rate is equivalent to the SM951 NVMe. What's surprising, though, is the fact that the SM951 AHCI that was pulled from the Lenovo laptop is in fact considerably faster than the stock SM951 we received straight from Samsung. I even ran the trace twice the ensure that it's not a benchmark anomaly, but maybe there is something wrong with my sample given that even the XP941 and several SATA 6Gbps drives outperform it.
The SM951 NVMe also has a higher share of high latency IOs than the SSD 750, but that's quite typical to smaller capacity Samsung drives.
AnandTech Storage Bench - Heavy
While The Destroyer focuses on sustained and worst-case performance by hammering the drive with nearly 1TB worth of writes, the Heavy trace provides a more typical enthusiast and power user workload. By writing less to the drive, the Heavy trace doesn't drive the SSD into steady-state and thus the trace gives us a good idea of peak performance combined with some basic garbage collection routines. For full details of the test, please refer to the this article.
As the SM951 has been better optimized for typical client workloads than the SSD 750, it outperforms the Intel drive by a healthy margin. We don't really see much difference between the NVMe and AHCI versions, though, as the NVMe version has only marginally lower latency than it's AHCI sibling.
AnandTech Storage Bench - Light
The Light trace is designed to be an accurate illustration of basic usage. It's basically a subset of the Heavy trace, but we've left out some workloads to reduce the writes and make it more read intensive in general. Please refer to this article for full details of the test.
Under light workloads the NVMe version manages to pull a bigger lead with close to 20% reduction in average latency and 10% increase in average data rate. This is also the benchmark where the SM951 really shows its client focus because the SSD 750 is considerably slower in both data rate and latency.
Random Read Performance
For full details of how we conduct our Iometer tests, please refer to this article.
This is the graph I've been dying to see ever since I first heard about NVMe. Random read performance at low queue depths was mostly bottlenecked by AHCI latency because at QD1 the controller can only read from one NAND die (it's asked to read one 4KB chunk of data at a time), meaning that a tremendous share of the latency was caused by the command overhead. As the NVMe command set is much simpler and the whole IO stack is lighter, it opens the doors for improved low queue depth performance, which is exactly what we are seeing with the SM951 NVMe.
Default |
At QD1 the SM951 NVMe offers about 50MB/s, whereas the best AHCI drives I've seen hover around 30-35MB/s, resulting in about 50% gains. Performance at QD2 and QD4 is also better than what other drives offer and in general the SM951 NVMe has excellent random read performance including the high QDs as well.
Random Write Performance
NVMe doesn't present similar gains to random write performance, though. This is an area where Intel clearly has an advantage, but given that the SSD 750 carries an 18-channel controller that is hardly a surprise. Moreover, because the SSD 750 features full power loss protection Intel can cache more user data in the DRAM buffer without the risk of data loss, which can further improve random write performance as IOs can be combined more efficiently. Intel's custom driver may also help with random write performance because the native Microsoft driver has some write performance issues due to Force Unit Access (basically FUA won't consider write to be complete until it has been written to its final medium i.e. NAND, whereas Intel's driver can consider write to be complete when it reaches the DRAM buffer).
Default |
Sequential Read Performance
For full details of how we conduct our Iometer tests, please refer to this article.
In sequential read performance the SM951 NVMe is stronger than the SSD 750, but it's actually outperformed by the AHCI version. Given that the performance doesn't scale at all and doesn't reach the stated 2,200MB/s, I suspect there is some thermal throttling going on, which are not present in the SSD 750 and SM951 AHCI with heat sinks.
Default |
Sequential Write Performance
The same throttling issue appears to be present in sequential write test where the SSD 750 is faster thanks to being able to scale performance with queue depth, whereas with the SM951 performance actually declines due to throttling.
Default |
Mixed Random Read/Write Performance
For full details of how we conduct our Iometer tests, please refer to this article.
In mixed performance the SM951 NVMe presents a good boost to performance over the AHCI version, although it still can't even get close to the SSD 750. This is generally an area where I would like to see improvement from Samsung and basically every SSD OEM.
Default |
Mixed Sequential Read/Write Performance
Unfortunately the good mixed random performance doesn't translate to mixed sequential performance. The SM951 NVMe takes a quite considerable hit compared to the AHCI version, although it still offers better performance than any of the SATA 6Gbps drives on the market.
Default |
The reason lies in the fact that performance only drops as the share of writes is increased. I suspect there might be some throttling going on, or if not then the firmware isn't properly optimized because other Samsung drives have a nice "bathtub" curve.
ATTO - Transfer Size vs Performance
I'm keeping our ATTO test around because it's a tool that can easily be run by anyone and it provides a quick look into performance scaling across multiple transfer sizes. I'm providing the results in a slightly different format because the line graphs didn't work well with multiple drives and creating the graphs was rather painful since the results had to be manually inserted cell be cell as ATTO doesn't provide a 'save as CSV' functionality.
AS-SSD Incompressible Sequential Performance
I'm also keeping AS-SSD around as it's freeware like ATTO and can be used by our readers to confirm that their drives operate properly. AS-SSD uses incompressible data for all of its transfers, so it's also a valuable tool when testing SandForce based drives that perform worse with incompressible data.
Final Words
In terms of performance, the NVMe version of the SM951 offers an upgrade over its AHCI sibling. The average data rate (i.e. large IO performance) isn't dramatically better compared to the AHCI version, but when it comes to small IO latency the SM951 and NVMe in general show their might. Typically the NVMe version offers about 10-20% improvement in average latency over the AHCI version, which is a healthy boost in performance given that the two utilize identical hardware.
It's obvious that the SM951-NVMe has been designed for mainstream client workloads. In our Heavy and Light traces it sets new records, but in the most IO intensive The Destroyer trace the SM951-NVMe is outperformed by the SSD 750. While Intel specifically built a client-oriented firmware for the SSD 750, the company made it clear that it focused on sustained random IO performance rather than high peak throughput, and the tradeoff pays off as long as the IO workload is intensive enough (think multiple VMs for instance). Another area where the SSD 750 beats the SM951-NVMe by a substantial margin is steady-state performance, which contributes heavily to The Destroyer benchmark since the trace effectively puts the drive into steady-state.
Speaking of steady-state performance, there are two things I was specifically happy to see in the SM951-NVMe. The first one is the unbelievable IO consistency, which isn't that significant for a client drive but if Samsung can pull off something equivalent (with higher performance, of course) in the enterprise space, then I'll be excited. It never hurts to have that level of consistency in a client drive either, but the it just isn't used to its full potential since client SSDs and workloads are more about peak than sustained performance, which is the opposite of enterprise workloads.
The second part is low queue depth random read performance. This is the area where we haven't seen much improvement in the past few years because ultimately the bottlenecks have been AHCI overhead and NAND latency. Fixing the latter requires a new type of non-volatile memory (e.g. ReRAM, MRAM or NRAM) with significantly lower read latency, but that isn't on the horizon until around 2020. In the mean time, the only way to improve random read latency is to cut the driver stack overhead, which is exactly the purpose of NVMe. The reason why I'm so excited about low queue depth random read performance is the fact that they account for a large of the total IOs in typical client workloads (especially the less intensive ones), so any improvement will translate to better user experience and performance, which is ultimately what a consumer is looking for.
Despite all this, I have to admit that I walk away a little disappointed. A 10-20% performance improvement isn't marginal, but after all the hype about NVMe I was expecting a little more. I have a strong feeling that NVMe is capable of much more, but the technology needs time to mature. From what I have talked to SSD OEMs, the generic NVMe driver that Microsoft includes in Windows 8.1 has some severe shortcomings, which is why nearly everyone has their own custom driver at least for now. I think Samsung and the SM951-NVMe desperately need that to unleash the full potential of the drive and I sure hope that the retail version of the drive will feature one.
All in all, the SSD 750 remains as the best option for very IO intensive workloads, but for a more typical enthusiast the SM951-NVMe provides better performance, although not substantially better than the AHCI version. If you need an SSD today, I wouldn't wait for the NVMe version because the availability is a mystery to all and you may end up waiting possibly months. Nevertheless, if the SM951-NVMe was easily available and reasonably priced, I would give it our "Recommended by AnandTech" award, but for now one can only drool after it.