Original Link: https://www.anandtech.com/show/8979/samsung-sm951-512-gb-review
Samsung SM951 (512GB) PCIe SSD Review
by Kristian Vättö on February 24, 2015 8:00 AM ESTThe PCIe SSD revolution is upon us. So far nearly every controller vendor has shown off its PCIe SSD controller design and the latest news I've heard is that we'll be seeing a large number of PCIe SSDs from numerous manufacturers in the second half of 2015 (watch out for Computex and Flash Memory Summit). Samsung got a head start in 2013 with the introduction of the XP941 and to-date the company is still the only manufacturer that is shipping a PCIe 2.0 x4 client SSD in volume. There are a couple of Marvell based PCIe 2.0 x2 products on the market from SanDisk and Plextor, but none that can truly challenge the XP941 in performance.
Despite being an OEM-only product, the XP941 has been relatively popular among enthusiasts. The performance upgrade over a SATA 6 Gbps drive is significant enough that it has been worth the premium for a user with IO intensive workload. The truth is that SATA 6 Gbps has been saturated for quite some time already, so PCIe and namely the XP941 has been the only way to improve single-drive IO performance affordably (faster PCIe SSD exist, but due to their enterprise focus the prices make them unreachable for the majority).
Today we have the successor of the XP941 in the house. The SM951 made its first appearance at Samsung SSD Global Summit in July 2014 where it was touted to be the first client SSD with NVMe support. Unfortunately, Samsung changed its initial plans and the SM951 as it's known today does not support NVMe, but it still provides an upgrade from PCIe Gen2 to Gen3, which theoretically doubles the available bandwidth. Samsung is tight-lipped about the reasoning behind the decision to dump NVMe support, but from what I understand the current chipsets don't have proper NVMe support by default. It's likely that Samsung's PC OEM partners wanted to stick with known AHCI command set for improved compatibility, so Samsung decided to push the introduction of client NVMe SSD a bit further back.
Some motherboard manufacturers have gone through the extra steps to update their BIOSes with NVMe support, but I haven't been able to get a detailed answer of what exactly needs to be changed to enable NVMe on current chipsets. Anyway, the SM951 is not NVMe enabled and will not gain NVMe support later either, so for now there isn't a single client-oriented NVMe SSD. Samsung has, however, stated that the company is working on a client NVMe SSDs, which means we may seen one soon after all.
The SM951 is an OEM-only product just like its predecessor. Currently the drive is not available through retail channel yet, but Lenovo uses the drive in its ThinkPad X1 Carbon laptop, which is how we got our early review sample. The drive will be available through RamCity within the next few months and the latest I've heard is that the first batch should be delivered in late May. I was told the pricing will be about 10% higher than what the XP941 currently sells for, which would translate to a bit over a dollar per gigabyte. The pricing will ultimately depend on Samsung's production capacity and demand, so it's too early to quote any exact prices and I will provide an update once the SM951 is available for order and the final pricing is out. Furthermore, since our sample came through Lenovo and carries a Lenovo specific firmware, I will also be reviewing the 'vanilla' version from RamCity to make sure our results reflect the model that is available for purchase.
Samsung SM951 Specifications | ||||
Capacity | 128GB | 256GB | 512GB | |
Form Factor | M.2 2280 (double-sided) | |||
Controller | Samsung S4LN058A01 (PCIe 3.0 x4 AHCI) | |||
NAND | Samsung 19nm 64Gbit MLC | |||
Sequential Read | 2,000MB/s | 2,150MB/s | 2,150MB/s | |
Sequential Write | 600MB/s | 1,200MB/s | 1,500MB/s | |
4KB Random Read | 90K IOPS | 90K IOPS | 90K IOPS | |
4KB Random Write | 70K IOPS | 70K IOPS | 70K IOPS | |
L1.2 Power Consumption | 2mW | 2mW | 2mW | |
Idle Power Consumption | 50mW | |||
Active Power Consumption | 6.5W | |||
Encryption | N/A |
Similar to the XP941, the SM951 comes in the M.2 2280 form factor and is available in capacities of 128GB, 256GB and 512GB. The lack of a 1TB model is another change from the original product plan, but it's entirely possible that a 1TB SKU will follow later. Performance wise Samsung claims up to 2.15GB/s read and 1500MB/s, which is nearly twice the throughput of the XP941. However, it's nowhere near the maximum bandwidth of the PCIe 3.0 x4 bus, though, which should be about 3.2GB/s (PCIe only has ~80% efficiency with overhead after the 128b/132b scheme used by PCIe 3.0).
Because the SM951 is an OEM product, the warranty and endurance limitation are specified by the reseller instead of Samsung. We will know more when the drive is available, but I would expect RamCity to offer the same three-year warranty and 72TB endurance as it does for the XP941.
In addition to PCIe 3.0, the SM951 adds support for PCIe L1.2 power state. That is essentially a PCIe version of DevSleep (but it's not limited to just storage devices) and it allows for power consumption as low as 10µW per lane. In the case of the SM951 the L1.2 power consumption is 2mW, which that translates to 500µW per lane, so there seems to be room for further improvement, but the important news is that the L1.2 power state brings the slumber power consumption to the same level as DevSleep. The L1.2 also has lower exit latency at 70µs (i.e. how long it takes for the drive to be fully powered on again), whereas the DevSleep requirement is 20ms.
Surprisingly, the SM951 doesn't make the transition to 3D V-NAND like the rest of Samsung SSDs we've seen lately. It's still utilizing planar NAND, which I believe is the same 19nm 64Gbit MLC NAND as in the XP941. There have been some reports claiming that the SM951 uses 16nm NAND based on the change in the generation character (i.e. the last character, which was C in the XP941) of the part number, but because the capacity per die is 64Gbit I'm very doubtful that the process node has changed. It wouldn't make sense to build a 64Gbit die at 16nm process because the peripheral circuitry does not scale as well as the memory array does, which would result in very low array efficiency. In other words, a 128Gbit die at 16nm would be substantially more economical than 64Gbit, hence I believe that the NAND in the SM951 is merely a second iteration of 19nm 64Gbit die. Besides, Samsung already has 3D NAND technology and is pushing it very aggressively, so investing on a new planar NAND node wouldn't be too logical either.
Bootable? Yes
When the XP941 was released, the number one issue with the drive was the lack of boot support. Because the XP941 was never designed for retail, it didn't have its own legacy drivers that load prior to the motherboard BIOS to enable boot support on any system, and hence the XP941 required a BIOS update from the motherboard manufacturer in order to be used as a boot drive. To date, most Z97 and X99 based motherboards have a BIOS that supports booting from the XP941 (RamCity has an extensive list on the topic), although unfortunately AMD and older Intel chipsets are not supported.
I can confirm that the SM951 is also bootable on a supported motherboard. I tested this with an ASUS Z97 Deluxe using the latest 2205 BIOS and the SM951 shows up like any other drive in the boot menu. I suspect that any motherboard that can boot from the XP941 will also work with the SM951, but obviously I can't guarantee that at this point.
I also verified that the SM951 is bootable in tower Mac Pros (2012 and earlier).
AnandTech 2015 Client SSD Suite
The core of our SSD test suite has remained unchanged for nearly four years now. While we have added new benchmarks, such as performance consistency and Storage Bench 2013, in response to the evolution of the SSD industry, we haven't done a major overhaul to take our testing to the next level. That all changes today with the introduction of our 2015 Client SSD Suite.
Just to be clear, there weren't any flaws in the way we did testing in the past -- there were simply some shortcoming that I've been wanting to fix for a while now, but like any big upgrade it's not done overnight. There are four key areas where I focused in the 2015 Suite and these are modernizing our testbed, depth of information, readability and power consumption.
Our old testbed was old, really old. We were using a Sandy Bridge based system with Intel Rapid Storage Technology 10.2 drivers from 2011, so it doesn't take a genius to figure out that our system was desperately in need of a refresh. The 2015 testbed is the latest of the latest with an Intel Haswell CPU and ASUS Z97 motherboard. For the operating system, we have upgraded from Windows 7 to Windows 8.1 with native NVMe driver, which ensures that our setup is fully prepared for the wave of PCIe NVMe SSDs arriving in the second half of 2015. We are also using the latest Intel Rapid Storage Technology drivers now, which should provide a healthy boost over the old ones we were using before. I've included the full specs of the new system below.
AnandTech 2015 SSD Test System | |
CPU | Intel Core i7-4770K running at 3.5GHz (Turbo & EIST enabled, C-states disabled) |
Motherboard | ASUS Z97 Deluxe (BIOS 2205) |
Chipset | Intel Z97 |
Chipset Drivers | Intel 10.0.24+ Intel RST 13.2.4.1000 |
Memory | Corsair Vengeance DDR3-1866 2x8GB (9-10-9-27 2T) |
Graphics | Intel HD Graphics 4600 |
Graphics Drivers | 15.33.8.64.3345 |
Desktop Resolution | 1920 x 1080 |
OS | Windows 8.1 x64 |
- Thanks to Intel for the Core i7-4770K CPU
- Thanks to ASUS for the Z97 Deluxe motherboard
- Thanks to Corsair for the Vengeance 16GB DDR3-1866 DRAM kit, RM750 power supply, Hydro H60 CPU cooler and Carbide 330R case
The second improvement we have made is regarding the depth of information. Every now and then I found myself in a situation where I couldn't explain why one drive was faster than the other in our Storage Bench tests, so the 2015 Suite includes additional Iometer tests at various queue depths to help us understand the drive and its performance better. I'm also reporting more data from the Storage Bench traces to better characterize the drive and providing new metrics that I think are more relevant to client usage than some of the metrics we have used in the past. The goal of the 2015 Suite is to leave no stone unturned when it comes to explaining performance and I'm confident that the new Suite does an excellent job at that.
However, the increase in depth of information creates a readability problem. I know some of you prefer to have easy and quick to read graphs, but it's hard to present a mountain of data in a format that's convenient to read. To give you the best of both worlds, I'm providing both the easy and quick to read graphs as well as the full data for those who want to dig in a bit deeper. That way the benchmarks will remain comfortable to skim through in case you don't have a lot of time on your hands, but alternatively you will get access to far more data than in the past.
Last but not least, I'm taking power testing to a whole new level in our 2015 Suite. In the past, power consumption was merely a few graphs near to the end of the article and to be honest the tests we ran didn't give the full scope of the drive's power behavior. In our 2015 Suite, power is just as important as performance is because I'm practically testing and reporting power consumption in every benchmark (though for now this is limited to SATA drives). In the end, the majority of SSDs are employed in laptops and power consumption can actually be far more critical than performance, so making power consumption testing a first class citizen makes perfect sense.
A Word About Storage Benches and Real World Tests
While I'm introducing numerous new benchmarks and performance metrics, our Storage Bench traces have remained unchanged. The truth is that workloads rarely undergo a dramatic change, so I had no reason to create a bunch of new traces that would ultimately be more or less the same that we have already used for years. That's why I also dropped the year nomenclature from the Storage Benches because a trace from 2011 is still perfectly relevant today and keeping the year might have given some readers a picture that our testing is outdated. Basically, the three traces are now called The Destroyer, Heavy and Light with the first one being our old 2013 Storage Bench and the two latter ones being part of our 2011 Storage Bench.
I know some of you have criticized our benchmarks due to the lack of real world application tests, but the unfortunate truth is that it's close to impossible to build a reliable test suite that can be executed in real time. Especially if you want to test something else than just boot and application launch times, there is simply too many tasks in the background that cannot be properly controlled to guarantee valid results. I think it has become common knowledge that any modern SSD is good enough for an average user and that the differences in basic web-centric workloads are negligible, so measuring the time it takes to launch Chrome isn't an exciting test to be honest.
In late 2013, I spent a tremendous amount of time trying to build a real world test suite with a heavier workload, but I kept hitting the same obstacle over and over again: multitasking. One of the most basic principles of benchmarking is reproducibility, meaning that the same test can be run over and over again without significant unexplainable fluctuation in the results. The issue I faced with multitasking was that once I started adding background operations, such as VMs, large downloads and backups like a heavier user would have in the background, my results were no longer explainable as I had lost the control of what was accessing the drive. The swings were significant enough that the results wouldn't hold any ground, which is why you never saw any fruit of my endeavors.
As a result, I decided to drop off real world testing (at least for now) and go back to traces, which we have been using for years and know that they are reliable, although not a perfect way to measure performance. Unfortunately there is still no TRIM support in the playback and to speed up the trace playback we've cut the idle times to a maximum of 25 milliseconds. Despite the limitations, I do believe that traces are the best to measure meaningful real world performance because the IO trace is still straight from a real world workload, which cannot be properly replicated with any synthetic benchmark tool (like Iometer).
Performance Consistency
We've been looking at performance consistency since the Intel SSD DC S3700 review in late 2012 and it has become one of the cornerstones of our SSD reviews. Back in the days many SSD vendors were only focusing on high peak performance, which unfortunately came at the cost of sustained performance. In other words, the drives would push high IOPS in certain synthetic scenarios to provide nice marketing numbers, but as soon as you pushed the drive for more than a few minutes you could easily run into hiccups caused by poor performance consistency.
Once we started exploring IO consistency, nearly all SSD manufacturers made a move to improve consistency and for the 2015 suite, I haven't made any significant changes to the methodology we use to test IO consistency. The biggest change is the move from VDBench to Iometer 1.1.0 as the benchmarking software and I've also extended the test from 2000 seconds to a full hour to ensure that all drives hit steady-state during the test.
For better readability, I now provide bar graphs with the first one being an average IOPS of the last 400 seconds and the second graph displaying the standard deviation during the same period. Average IOPS provides a quick look into overall performance, but it can easily hide bad consistency, so looking at standard deviation is necessary for a complete look into consistency.
I'm still providing the same scatter graphs too, of course. However, I decided to dump the logarithmic graphs and go linear-only since logarithmic graphs aren't as accurate and can be hard to interpret for those who aren't familiar with them. I provide two graphs: one that includes the whole duration of the test and another that focuses on the last 400 seconds of the test to get a better scope into steady-state performance.
In steady-state performance the SM951 provides a substantial ~70% upgrade over the XP941 and brings performance nearly to the same level with the 850 Pro. Given that the 850 Pro uses faster V-NAND, the steady-state performance is a pleasant surprise and shows that the SM951 is more than a marginal bump from the XP941. Obviously, drives with more default over-provisioning (i.e. Extreme Pro and Neutron XT) provide higher steady-state performance, but Samsung is doing very well with the default 7% over-provisioning.
The consistency of the SM951 is also great. The Neutron XT is a living proof of a drive with high average IOPS, but horrible consistency because as we can see in the graph above its standard deviation is up to dozens of times higher compared to the other drives. That's just not acceptable for a modern drive, especially because there are many drives that can consistently provide high IOPS.
Default | |||||||||
25% Over-Provisioning |
For a dozen seconds or so, the SM951 is actually able to burst out 100K IOPS, but the performance soon drops to below 10K IOPS and eventually evens out at ~7.5K IOPS. The SM951 is very consistent and doesn't experience any notable IOPS drops, whereas the XP941 regularly drops to a few hundred IOs per second. Increasing the over-provisioning to 25% brings the IOPS to about 35K, which is very decent and again much better than the XP941 that still has odd drops in performance.
Default | |||||||||
25% Over-Provisioning |
AnandTech Storage Bench - The Destroyer
The Destroyer has been an essential part of our SSD test suite for nearly two years now. It was crafted to provide a benchmark for very IO intensive workloads, which is where you most often notice the difference between drives. It's not necessarily the most relevant test to an average user, but for anyone with a heavier IO workload The Destroyer should do a good job at characterizing performance.
AnandTech Storage Bench - The Destroyer | ||||||||||||
Workload | Description | Applications Used | ||||||||||
Photo Sync/Editing | Import images, edit, export | Adobe Photoshop CS6, Adobe Lightroom 4, Dropbox | ||||||||||
Gaming | Download/install games, play games | Steam, Deus Ex, Skyrim, Starcraft 2, BioShock Infinite | ||||||||||
Virtualization | Run/manage VM, use general apps inside VM | VirtualBox | ||||||||||
General Productivity | Browse the web, manage local email, copy files, encrypt/decrypt files, backup system, download content, virus/malware scan | Chrome, IE10, Outlook, Windows 8, AxCrypt, uTorrent, AdAware | ||||||||||
Video Playback | Copy and watch movies | Windows 8 | ||||||||||
Application Development | Compile projects, check out code, download code samples | Visual Studio 2012 |
The table above describes the workloads of The Destroyer in a bit more detail. Most of the workloads are run independently in the trace, but obviously there are various operations (such as backups) in the background.
AnandTech Storage Bench - The Destroyer - Specs | ||||||||||||
Reads | 38.83 million | |||||||||||
Writes | 10.98 million | |||||||||||
Total IO Operations | 49.8 million | |||||||||||
Total GB Read | 1583.02 GB | |||||||||||
Total GB Written | 875.62 GB | |||||||||||
Average Queue Depth | ~5.5 | |||||||||||
Focus | Worst case multitasking, IO consistency |
The name Destroyer comes from the sheer fact that the trace contains nearly 50 million IO operations. That's enough IO operations to effectively put the drive into steady-state and give an idea of the performance in worst case multitasking scenarios. About 67% of the IOs are sequential in nature with the rest ranging from pseudo-random to fully random.
AnandTech Storage Bench - The Destroyer - IO Breakdown | |||||||||||
IO Size | <4KB | 4KB | 8KB | 16KB | 32KB | 64KB | 128KB | ||||
% of Total | 6.0% | 26.2% | 3.1% | 2.4% | 1.7% | 38.4% | 18.0% |
I've included a breakdown of the IOs in the table above, which accounts for 95.8% of total IOs in the trace. The leftover IO sizes are relatively rare in between sizes that don't have a significant (>1%) share on their own. Over a half of the transfers are large IOs with one fourth being 4KB in size.
AnandTech Storage Bench - The Destroyer - QD Breakdown | ||||||||||||
Queue Depth | 1 | 2 | 3 | 4-5 | 6-10 | 11-20 | 21-32 | >32 | ||||
% of Total | 50.0% | 21.9% | 4.1% | 5.7% | 8.8% | 6.0% | 2.1% | 1.4 |
Despite the average queue depth of 5.5, a half of the IOs happen at queue depth of one and scenarios where the queue depths is higher than 10 are rather infrequent.
The two key metrics I'm reporting haven't changed and I'll continue to report both data rate and latency because the two have slightly different focuses. Data rate measures the speed of the data transfer, so it emphasizes large IOs that simply account for a much larger share when looking at the total amount of data. Latency, on the other hand, ignores the IO size, so all IOs are given the same weight in the calculation. Both metrics are useful, although in terms of system responsiveness I think the latency is more critical. As a result, I'm also reporting two new stats that provide us a very good insight to high latency IOs by reporting the share of >10ms and >100ms IOs as a percentage of the total.
The SM951 takes the lead easily and provides ~34% increase in data rate over the XP941. The advantage over some of the slower SATA 6Gbps drives is nearly threefold, which speaks for the performance benefit that PCIe and especially PCIe 3.0 provide.
The latency benefit isn't as significant, which suggests that the SM951 provides substantial boost in large IO performance, but the performance at small IO sizes isn't dramatically better.
Despite the lowest average latency, the SM951 actually has the most >10ms IO with nearly 2% of the IOs having higher latency than 10ms. I did some thermal throttling testing (see the dedicated page for full results) and the SM951 seems to throttle fairly aggressively, so my hypothesis is that the high number is due to throttling, which limits the drive's throughput momentarily (and hence increases the latency) to cool down the drive.
However, the SM951 has the least >100ms IOs, which means that despite the possible throttling the maximum service times stay between 10ms and 100ms.
AnandTech Storage Bench - Heavy
While The Destroyer focuses on sustained and worst-case performance by hammering the drive with nearly 1TB worth of writes, the Heavy trace provides a more typical enthusiast and power user workload. By writing less to the drive, the Heavy trace doesn't drive the SSD into steady-state and thus the trace gives us a good idea of peak performance combined with some basic garbage collection routines.
AnandTech Storage Bench - Heavy | ||||||||||||
Workload | Description | Applications Used | ||||||||||
Photo Editing | Import images, edit, export | Adobe Photoshop | ||||||||||
Gaming | Pllay games, load levels | Starcraft II, World of Warcraft | ||||||||||
Content Creation | HTML editing | Dreamweaver | ||||||||||
General Productivity | Browse the web, manage local email, document creation, application install, virus/malware scan | Chrome, IE10, Outlook, Windows 8, AxCrypt, uTorrent, AdAware | ||||||||||
Application Development | Compile Chromium | Visual Studio 2008 |
The Heavy trace drops virtualization from the equation and goes a bit lighter on photo editing and gaming, making it more relevant to the majority of end-users.
AnandTech Storage Bench - Heavy - Specs | ||||||||||||
Reads | 2.17 million | |||||||||||
Writes | 1.78 million | |||||||||||
Total IO Operations | 3.99 million | |||||||||||
Total GB Read | 48.63 GB | |||||||||||
Total GB Written | 106.32 GB | |||||||||||
Average Queue Depth | ~4.6 | |||||||||||
Focus | Peak IO, basic GC routines |
The Heavy trace is actually more write-centric than The Destroyer is. A part of that is explained by the lack of virtualization because operating systems tend to be read-intensive, be that a local or virtual system. The total number of IOs is less than 10% of The Destroyer's IOs, so the Heavy trace is much easier for the drive and doesn't even overwrite the drive once.
AnandTech Storage Bench - Heavy - IO Breakdown | |||||||||||
IO Size | <4KB | 4KB | 8KB | 16KB | 32KB | 64KB | 128KB | ||||
% of Total | 7.8% | 29.2% | 3.5% | 10.3% | 10.8% | 4.1% | 21.7% |
The Heavy trace has more focus on 16KB and 32KB IO sizes, but more than half of the IOs are still either 4KB or 128KB. About 43% of the IOs are sequential with the rest being slightly more full random than pseudo-random.
AnandTech Storage Bench - Heavy - QD Breakdown | ||||||||||||
Queue Depth | 1 | 2 | 3 | 4-5 | 6-10 | 11-20 | 21-32 | >32 | ||||
% of Total | 63.5% | 10.4% | 5.1% | 5.0% | 6.4% | 6.0% | 3.2% | 0.3% |
In terms of queue depths the Heavy trace is even more focused on very low queue depths with three fourths happening at queue depth of one or two.
I'm reporting the same performance metrics as in The Destroyer benchmark, but I'm running the drive in both empty and full states. Some manufacturers tend to focus intensively on peak performance on an empty drive, but in reality the drive will always contain some data. Testing the drive in full state gives us valuable information whether the drive loses performance once it's filled with data.
The SM951 performs even strongly in our Heavy trace and presents nearly 100% improvement in data rate over the XP941. In full state the SM951 loses a bit of its performance, but that's normal and the drop isn't any bigger than in other drives. Despite the lack of NVMe, it's starting to be clear that the SM951 is significantly faster than its predecessor and any SATA 6Gbps SSD.
The average latency is also cut in less than half, which is actually a more substantial improvement than going from a SATA 6Gbps drive to the XP941.
The share of high latency IOs is also the lowest with only 0.06% of the IOs having a higher than 10ms service time.
AnandTech Storage Bench - Light
The Light trace is designed to be an accurate illustration of basic usage. It's basically a subset of the Heavy trace, but we've left out some workloads to reduce the writes and make it more read intensive in general.
AnandTech Storage Bench - Light - Specs | ||||||||||||
Reads | 372,630 | |||||||||||
Writes | 459,709 | |||||||||||
Total IO Operations | 832,339 | |||||||||||
Total GB Read | 17.97 GB | |||||||||||
Total GB Written | 23.25 GB | |||||||||||
Average Queue Depth | ~4.6 | |||||||||||
Focus | Basic, light IO usage |
The Light trace still has more writes than reads, but a very light workload would be even more read-centric (think web browsing, document editing, etc). It has about 23GB of writes, which would account for roughly two or three days of average usage (i.e. 7-11GB per day).
AnandTech Storage Bench - Light - IO Breakdown | |||||||||||
IO Size | <4KB | 4KB | 8KB | 16KB | 32KB | 64KB | 128KB | ||||
% of Total | 6.2% | 27.6% | 2.4% | 8.0% | 6.5% | 4.8% | 26.4% |
The IO distribution of the Light trace is very similar to the Heavy trace with slightly more IOs being 128KB. About 70% of the IOs are sequential, though, so that is a major difference compared to the Heavy trace.
AnandTech Storage Bench - Light - QD Breakdown | ||||||||||||
Queue Depth | 1 | 2 | 3 | 4-5 | 6-10 | 11-20 | 21-32 | >32 | ||||
% of Total | 73.4% | 16.8% | 2.6% | 2.3% | 3.1% | 1.5% | 0.2% | 0.2% |
Over 90% of the IOs have a queue depth of one or two, which further proves the importance of low queue depth performance.
The SM951 yet again provides roughly twice the data rate compared to the XP941 and with a full drive the difference is even more significant.
The same goes for average latency where the SM951's score is about one third of the XP941's. The SM951 can without a doubt boost performance with lighter IO loads as well, although in very light workloads the bottleneck tends to be the speed of user input (think about document creation for instance).
Random Read Performance
One of the major changes in our 2015 test suite is the synthetic Iometer tests we run. In the past we used to test just one or two queue depths, but real world workloads always contain a mix of different queue depths as shown by our Storage Bench traces. To get the full scope in performance, I'm now testing various queue depths starting from one and going all the way to up to 32. I'm not testing every single queue depth, but merely how the throughput scales with the queue depth. I'm using exponential scaling, meaning that the tested queue depths increase in powers of two (i.e. 1, 2, 4, 8...).
Read tests are conducted on a full drive because that is the only way to ensure that the results are valid (testing with an empty drive can substantially inflate the results and in reality the data you are reading is always valid rather than full of zeros). Each queue depth is tested for three minutes and there is no idle time between the tests.
I'm also reporting two metrics now. For the bar graph, I've taken the average of QD1, QD2 and QD4 data rates, which are the most relevant queue depths for client workloads. This allows for easy and quick comparison between drives. In addition to the bar graph, I'm including a line graph, which shows the performance scaling across all queue depths. To keep the line graphs readable, each drive has its own graph, which can be selected from the drop-down menu.
I'm also plotting power for SATA drives and will be doing the same for PCIe drives as soon as I have the system set up properly. Our datalogging multimeter logs power consumption every second, so I report the average for every queue depth to see how the power scales with the queue depth and performance.
While the other SSDs hover at 60-90MB/s for random reads, the SM951 provides a rather noticeable upgrade at 108MB/s.
Looking at the performance more closely reveals that the SM951 delivers better performance at all queue depths, although obviously the difference is at high queue depths where the SM951 can take advantage of the faster PCIe interface. The SM951 actually does over 150K IOPS when the MB/s is translated into throughput.
Random write performance is equally strong. The line graphs shows how the SM951 shifts the whole curve up, implying a performance increase at all queue depths. Especially the performance at queue depths of 1 and 2 are noticeably better than on other drives.
Sequential Read Performance
Our sequential tests are conducted in the same manner as our random IO tests. Each queue depth is tested for three minutes without any idle time in between the tests and the IOs are 4K aligned similar to what you would experience in a typical desktop OS.
The sequential tests really show the benefit of PCIe 3.0 because the SM951 is approximately three times faster than a SATA 6Gbps drive and provides a ~40% upgrade over the XP941.
The line graph does reveal some anomalies, though. First it looks like the performance scales nicely, but after queue depth of two the performance actually degrades. I'm doing some thermal throttling tests later in the article and it's very likely that the read results are affected by that because I was able to reach 2250MB/s at queue depth of 32 with a cold drive (i.e. I went straight to QD32 without doing the scaling first that heats up the drive).
Sequential Write Performance
Sequential write testing differs from random testing in the sense that the LBA span is not limited. That's because sequential IOs don't fragment the drive, so the performance will be at its peak regardless.
Sequential performance is also great, but seems to be affected by thermal throttling even more. That said, I did reach over 1500MB/s with a cold drive.
Mixed Random Read/Write Performance
Mixed read/write tests are also a new addition to our test suite. In real world applications a significant portion of workloads are mixed, meaning that there are both read and write IOs. Our Storage Bench benchmarks already illustrate mixed workloads by being based on actual real world IO traces, but until now we haven't had a proper synthetic way to measure mixed performance.
The benchmark is divided into two tests. The first one tests mixed performance with 4KB random IOs at six different read/write distributions starting at 100% reads and adding 20% of writes in each phase. Because we are dealing with a mixed workload that contains reads, the drive is first filled with 128KB sequential data to ensure valid results. Similarly, because the IO pattern is random, I've limited the LBA span to 16GB to ensure that the results aren't affected by IO consistency. The queue depth of the 4KB random test is three.
Again, for the sake of readability, I provide both an average based bar graph as well as a line graph with the full data on it. The bar graph represents an average of all six read/write distribution data rates for quick comparison, whereas the line graph includes a separate data point for each tested distribution.
Quite surprisingly the SM951 and Samsung drives in general don't do very well with mixed data.
The reason lies in the fact that the performance of Samsung drives plummets when the share of writes is increased. At 80/20 read/write, the Samsung drives manage to do pretty well, but after that the performance declines to about 40MB/s. What's odd is that the performance is also bad with 100% writes, whereas with other drives we usually see a spike here. I'm guessing there's some garbage collection going on here that causes the performance degradation.
Mixed Sequential Read/Write Performance
The sequential mixed workload tests are also tested with a full drive, but I've not limited the LBA range as that's not needed with sequential data patterns. The queue depth for the tests is one.
With 128KB sequential data, however, the SM951 is the king of the hill. There's a clear difference between PCIe and SATA based drives, although it's worthy to note that the difference is mostly due to PCIe drives having much higher throughput at 100% reads and writes (i.e. the infamous bathtub curve).
ATTO - Transfer Size vs Performance
I'm keeping our ATTO test around because it's a tool that can easily be run by anyone and it provides a quick look into performance scaling across multiple transfer sizes. I'm providing the results in a slightly different format because the line graphs didn't work well with multiple drives and creating the graphs was rather painful since the results had to be manually inserted cell be cell as ATTO doesn't provide a 'save as CSV' functionality.
The SM951 does much better at all IO sizes than the XP941 and especially read performance scales much better.
AS-SSD Incompressible Sequential Performance
I'm also keeping AS-SSD around as it's freeware like ATTO and can be used by our readers to confirm that their drives operate properly. AS-SSD uses incompressible data for all of its transfers, so it's also a valuable tool when testing SandForce based drives that perform worse with incompressible data.
Our sequential Iometer tests already showed that the SM951 is fast and AS-SSD provides further proof that the drive can easily reach ~1500MB/s.
Thermal Throttling
In the previous pages I mentioned I have suspicions that some of the results have been affected by thermal throttling. To confirm my hypothesis, I took my datalogging multimeter and taped its thermal probe on top of the SM951's controller. Then I ran a 128KB sequential write test at queue depth of 32 and plotted the results in the graph below.
Now it's pretty clear why the performance seemed a bit low in the sequential tests. It takes less than two minutes for the drive to begin throttling itself and the performance drops to ~75MB/s. Because the SM951 is an M.2 drive, it doesn't have a chassis or heatsink to help with the heat dissipation, which combined with the fact that the SM951 is more power hungry than most SATA 6Gbps drives results in throttling issues. That said, the drive shouldn't throttle under normal usage because a continuous two-minute transfer isn't very common, but in some more IO intensive workloads with long transfers (e.g. video editing) there's a chance that performance will be affected by thermal issues.
In any case, I strongly recommend having a decent amount of airflow inside the case. My system only has two case fans (one front and one rear) and I run it with the side panels off for faster accessibility, so mine isn't an ideal setup for maximum airflow.
TRIM Validation
The move from Windows 7 to 8.1 introduced some problems with the methodology we have previously used to test TRIM functionality, so I had to come up with a new way to test. I tested a couple of different methods, but ultimately I decided to go with the easiest one that can actually be used by anyone. The software is simply called trimcheck and it was made by a developer that goes by the name CyberShadow in GitHub.
Trimcheck tests TRIM by creating a small, unique file and then deleting it. Next the program will check whether the data is still accessible by reading the raw LBA locations. If the data that is returned by the drive is all zeros, it has received the TRIM command and TRIM is functional.
In the case of the SM951, TRIM appears to be working properly.
Final Words
When the news that the SM951 isn't NVMe enabled hit the Internet, there was a lot of disappointment around. Understandably many were expecting that the SM951 would merely be an evolutionary step from the XP941 because AHCI command set would still limit the full potential of the PCIe interface, but Samsung proved us all wrong. The SM951 is far from being a marginal step up from the XP941 because in most of our tests the SM951 beats the XP941 by a 50-100% margin. As a matter of fact, the upgrade from XP941 to SM951 is bigger than going from a SATA 6Gbps SSD to the XP941. Despite the lack of NVMe, there's no arguing about the fact that the SM951 is the fastest client SSD and by a very healthy margin.
From a performance perspective I have absolutely no complaints aside from thermal throttling. I wouldn't consider it to be a major issue because regardless of some throttling in synthetic tests, the SM951 is easily the highest performing drive. The Destroyer test takes about 10 hours to run on modern drives, so if throttling was a real issue it would show up more clearly in the results too. Besides, my half-open testbed isn't ideal for airflow either, but since I haven't encountered noticeable throttling in the past I wanted to mention it in case anyone runs into performance issues with the drive.
Right now the biggest issue with the drive is its nearly nonexistent availability, though. If you want to get your hands on the drive today, the only known way to do that is to buy Lenovo's ThinkPad X1 Carbon that is configured with a PCIe SSD. The cheapest configuration with a 512GB SM951 comes in at $1,709.10, so there's practically no sane way to get access to the drive (unless, of course, you want the X1 Carbon laptop as well and are willing to pay the price).
That brings us to the next subject. Since retail availability isn't expected until late May at the earliest, there's a chance that the SM951 will no longer be the fastest SSD once it's actually available for purchase. At CES last month, several SSD vendors told me that they should have PCIe SSDs ready for Computex, which is in early June, i.e. right after the SM951 is scheduled to start shipping.
If the SM951 was available today, I would have no reason not to give it our "Recommended by AnandTech" award. Being hands down the fastest client SSD on the market is enough justification for the award, but because the drive won't be shipping for several months I can't be sure that I'm still recommending the SM951 once it's available. For now the only thing we can do is wait, but at least we can do it in peace by knowing that the future is quick and bright.