Original Link: https://www.anandtech.com/show/9165/silicon-motion-sm2256-reference-design-ssd-review-tlc-for-everyone



The SSD industry has been talking about TLC NAND for over three years now. We published our first post, Understanding TLC NAND, back in early 2012, but in three years we have actually seen very little TLC NAND making it to the SSD market. Samsung was an early adopter by introducing the SSD 840 in September 2012, but Samsung has always been a special case as its SSD business is fully vertically integrated. When you design and manufacture everything in-house, it's obvious that you will have a technological advantage when it comes to adopting new technologies.

TLC is tightly linked with both controller technology and NAND production because TLC inherently has a higher error rate, which needs a stronger controller (although admittedly Samsung has had some issues with TLC). The lack of proper controller is the reason why other NAND vendors haven't invested as heavily on TLC as Samsung because Micron, SanDisk and the like have to rely on whatever third party controllers are available on the market. Without a high volume product to put your TLC NAND into, it means that there's no reason to produce TLC in a large scale, which defeats the cost savings that TLC bring. Micron tried to promote its 25nm TLC NAND for a while a few years ago, but it quickly realized that the available SSD controllers aren't capable of creating a reliable product -- at least not one that would bring any cost savings since the drive would need serious over-provisioning for endurance and ECC parity.

Due to the lack of controller support, nobody other than Samsung and SanDisk have a TLC SSD on the market, although SanDisk had to rely on heavy over-provisioning with 5:1 parity ratio since the Marvell controller used in the Ultra II wasn't designed with TLC in mind. Silicon Motion's SM2256 will be the first commercially available controller and firmware combo with TLC support and today we are taking an early look of the platform in the form of a reference design sample. ADATA already announced its SP550 SSD that will be based on the new SM2256 controller and available later in the summer, but given how many OEMs rely on Silicon Motion's controllers nowadays we will likely see a large number of SM2256 based TLC drives entering the market by the end of the year.

Architecturally the SM2256 shares the same core design as its predecessor SM2246EN. The design is modular, which allows Silicon Motion to change parts of the controller without redoing the rest. It features the same single 32-bit Argonaut RISC processor core as the SM2246EN, which is quite unique because we have seen many SSD controller vendors moving towards multi-core ARM architectures. A single custom core obviously brings efficiency gains and we've witnessed those in the SM2246EN, but the downside of such limited CPU power is sustained performance when the controller has to perform garbage collection at the same time as processing host IOs. 

The only dramatic change is in the error correction circuitry as the SM2256 supports Low Density Parity Check (LDPC) error correction codes instead of more common and less powerful BCH ECC. Silicon Motion calls its ECC technology as NANDXtend, and it's a combination of LDPC hard and soft decode along with RAID5-like data recovery. The benefit of having three levels of ECC is performance because LDPC soft decode and recovery from parity both have a relatively noticeable impact on performance and are typically only needed when the drive approaches its end of life (i.e. when the NAND has been cycled a lot). Uncycled NAND has much higher reliability because the tunnel oxide hasn't worn out due to P/E cycles, so only very little ECC is needed and LDPC hard decode is sufficient and also doesn't have a dramatic impact on performance.

Source: usenix

The reason why hard decode is faster than soft decode lies in how the voltage of a cell is sensed. Hard sensing is binary based, so for an SLC cell like in the graph above the cell can be either 1 or 0. However, as you can see, the voltage threshold distributions overlap slightly and that's actually far worse with MLC and TLC since there are more voltage states. In soft sensing the voltage distributions are divided into several segments, which requires more precision and iterations. For example in segment 4 the bit value can be either 1 or 0 as the distributions overlap, so probability algorithms are used to figure out the correct value. To be honest, ECC codes and the way they work are way over my head, but in case you are familiar with ECC and want to learn more, I suggest you simply google LDPC as there are numerous publicly available academic papers that go into more depth about this topic. 

Silicon Motion claims that its NANDXtend technology can extend the endurance of TLC NAND by up to three times, making TLC more robust for heavier workloads and also allowing the use of lower quality NAND that some OEMs may use anyway due to the lack of in-house binning equipment. Unfortunately I didn't have any time to do extended endurance testing with the SM2256 yet to validate Silicon Motion's claims, but I will be sure to test that once we have a retail drive on our hands, 

Silicon Motion SM2256 Specifications
Host Interface SATA 6Gbps
NAND Interface ONFi 3.0, Toggle 2.0 & asynchronous
# of NAND Channels 4
CE per Channel 8
Sequential Read 524MB/s
Sequential Write 400MB/s
4KB Random Read 90K IOPS
4KB Random Write 70K IOPS
Encryption AES-256 & TCG Opal 2.0

The SM2256 has eight chip enablers (CE) per channel, meaning that it can simultaneously talk to up to 32 NAND dies, but one CE can control more than one die, resulting in capacities of up to 2TB with one CE per four dies. 

The engineering sample Silicon Motion sent us uses Samsung's 19nm 128Gbit TLC NAND, which is the same NAND that is found inside the 840 EVO. Samsung doesn't sell its TLC NAND to others in high volume, so we likely won't see this configuration on the market at all. The SM2256 does, of course, support TLC NAND from all other vendors as well (even Toshiba's 15nm).

Like all TLC SSD designs we have seen, Silicon Motion employs SLC caching in the SM2256 to improve performance and endurance. The size of the SLC cache is configurable by the OEM, but generally the cache size is between 3GB and 12GB depending on the capacity of the drive. The OEM can also configurate whether all IOs are cached or just smaller ones (e.g. 16KB and below), but the early stock firmware that our sample shipped with caches all IOs regardless of the size. 

AnandTech 2015 SSD Test System
CPU Intel Core i7-4770K running at 3.5GHz (Turbo & EIST enabled, C-states disabled)
Motherboard ASUS Z97 Deluxe (BIOS 2205)
Chipset Intel Z97
Chipset Drivers Intel 10.0.24+ Intel RST 13.2.4.1000
Memory Corsair Vengeance DDR3-1866 2x8GB (9-10-9-27 2T)
Graphics Intel HD Graphics 4600
Graphics Drivers 15.33.8.64.3345
Desktop Resolution 1920 x 1080
OS Windows 8.1 x64


Performance Consistency

We've been looking at performance consistency since the Intel SSD DC S3700 review in late 2012 and it has become one of the cornerstones of our SSD reviews. Back in the days many SSD vendors were only focusing on high peak performance, which unfortunately came at the cost of sustained performance. In other words, the drives would push high IOPS in certain synthetic scenarios to provide nice marketing numbers, but as soon as you pushed the drive for more than a few minutes you could easily run into hiccups caused by poor performance consistency. 

Once we started exploring IO consistency, nearly all SSD manufacturers made a move to improve consistency and for the 2015 suite, I haven't made any significant changes to the methodology we use to test IO consistency. The biggest change is the move from VDBench to Iometer 1.1.0 as the benchmarking software and I've also extended the test from 2000 seconds to a full hour to ensure that all drives hit steady-state during the test.

For better readability, I now provide bar graphs with the first one being an average IOPS of the last 400 seconds and the second graph displaying the IOPS divided by standard deviation during the same period. Average IOPS provides a quick look into overall performance, but it can easily hide bad consistency, so looking at standard deviation is necessary for a complete look into consistency.

I'm still providing the same scatter graphs too, of course. However, I decided to dump the logarithmic graphs and go linear-only since logarithmic graphs aren't as accurate and can be hard to interpret for those who aren't familiar with them. I provide two graphs: one that includes the whole duration of the test and another that focuses on the last 400 seconds of the test to get a better scope into steady-state performance.

Steady-State 4KB Random Write Performance

The SMI 2256 manages to pull off decent average IOPS under a sustained random IO workload. It can't challenge the 850 EVO that uses faster 3D V-NAND, but compared to the BX100 with SM2246EN and 16nm MLC the drop in performance isn't massive -- better yet the SMI 2256 is quite a bit faster than SanDisk's TLC drive Ultra II. 

Steady-State 4KB Random Write Consistency

Unfortunately, the performance isn't very consistent, though, but then again the SM2246EN isn't either as the BX100 is only marginally better.

SMI2256 500GB
Default
25% Over-Provisioning

The steady-state behavior of the SM2256 appears to be similar to its predecessor SM2246EN. The baseline performance is fairly low at roughly 2,000 IOPS, but bursts occur frequently and go all the way to up to 25K IOPS, although this only lasts for about a second. 

SMI2256 500GB
Default
25% Over-Provisioning


AnandTech Storage Bench - The Destroyer

The Destroyer has been an essential part of our SSD test suite for nearly two years now. It was crafted to provide a benchmark for very IO intensive workloads, which is where you most often notice the difference between drives. It's not necessarily the most relevant test to an average user, but for anyone with a heavier IO workload The Destroyer should do a good job at characterizing performance. For full details of this test, please refer to this article.

AnandTech Storage Bench - The Destroyer (Data Rate)

Under a very intensive IO workload, the SMI 2256 is a mediocre performer. Average data rate is high, which suggests good performance at large IO sizes, but the average latency is considerably higher than what MLC drives and the 850 EVO have to offer. Then again, TLC is slower than MLC and 3D TLC, and that's a fact that no controller can get around. Ultimately TLC drives, at least at first, will be more aimed towards typical client workloads anyway, which aren't really illustrated by The Destroyer. 

AnandTech Storage Bench - The Destroyer (Latency)

AnandTech Storage Bench - The Destroyer (Latency)

The number of high latency IOs is substantially higher than in the SM2246EN, but at 3.5% it's not alarming especially because the share of >100ms IOs is very moderate and not any worse than MLC drives. 

AnandTech Storage Bench - The Destroyer (Latency)

One of the inherent issues with TLC that nobody is talking about is increased power consumption, which our testing proves. TLC requires a higher number of program pulses to program the correct voltage, resulting in longer total program time as well as increased power draw. In addition, SLC caching means that all data essentially gets written twice, which obviously adds power draw despite SLC programming being much more power efficient. 

AnandTech Storage Bench - The Destroyer (Power)



AnandTech Storage Bench - Heavy

While The Destroyer focuses on sustained and worst-case performance by hammering the drive with nearly 1TB worth of writes, the Heavy trace provides a more typical enthusiast and power user workload. By writing less to the drive, the Heavy trace doesn't drive the SSD into steady-state and thus the trace gives us a good idea of peak performance combined with some basic garbage collection routines. For full details of the test, please refer to the this article.

AnandTech Storage Bench - Heavy (Data Rate)

The performance isn't overwhelming in our Heavy trace either. Again average data rate is decent, but in terms of latency the SMI 2256 is worse than SanDisk's Ultra II and by a fairly significant margin.

AnandTech Storage Bench - Heavy (Latency)

The number of >10ms IOs is alarming, unfortunately. MLC drives usually have less than half a percent, whereas the SMI 2256 is close to 6%.

AnandTech Storage Bench - Heavy (Latency)

Even though the performance isn't that high, the power consumption is among the highest we've tested. It's not substantially higher compared to competing MLC drives, but compared to e.g. the 850 EVO there is a tremendous difference. 

AnandTech Storage Bench - Heavy (Power)



AnandTech Storage Bench - Light

The Light trace is designed to be an accurate illustration of basic usage. It's basically a subset of the Heavy trace, but we've left out some workloads to reduce the writes and make it more read intensive in general. Please refer to this article for full details of the test.

AnandTech Storage Bench - Light (Data Rate)

In our Light suite, which is the most suitable test for the SMI 2256 and its target market, Silicon Motion manages to get closer to the competition. It's still not good for performance awards, but at least it's not significantly slower than the rest like in our more IO intensive trace tests. 

AnandTech Storage Bench - Light (Latency)

The number of high latency IOs remains relatively high, though, and it's an area that I hope is fixed before retail drives ship. 

AnandTech Storage Bench - Light (Latency)

Power consumption in light workloads appears to be better, although certainly not SM2246EN level. 

AnandTech Storage Bench - Light (Power)



Random Read Performance

For full details of how we conduct our Iometer tests, please refer to this article.

Iometer - 4KB Random Read

Random read performance has never been Silicon Motion's biggest strength and it actually slightly decreases with the new SM2256 controller, although that's most likely due to the slower TLC NAND. 

Iometer - 4KB Random Read (Power)

Power also goes up quite significantly, but compared to other drive it's still relatively low -- just not SM2246EN low.

SMI 2256 Reference Design 500GB

Queue depth scaling behavior seems to be similar to the SM2246EN, although at a higher power consumption. The scaling could be a little more aggressive because especially the QD4 and QD8 scores can't match the competition. 

Random Write Performance

Iometer - 4KB Random Write

Random write performance isn't too good as the SMI 2256 turns out to be the slowest of the bunch. I wonder how big of a difference another manufacturer's TLC NAND would make, but we should find that out once the retail drives ship in the next couple of months. 

Iometer - 4KB Random Write (Power)

Power efficiency is pretty poor given that performance is low, but power consumption is average. 

SMI 2256 Reference Design 500GB

It's obvious that TLC NAND limits the performance because there's practically almost no scaling at all with the queue depth. The good news is that low QD performance is pretty decent -- it's the high QD operations that suffer the most from TLC NAND, but those are very rare in typical client workloads. 



Sequential Read Performance

For full details of how we conduct our Iometer tests, please refer to this article.

Iometer - 128KB Sequential Read

Despite the somewhat low random read performance, sequential read is strong in the SM2256.

Iometer - 128KB Sequential Read (Power)

Power consumption is up from the SM2246EN, but still moderate. 

SMI 2256 Reference Design 500GB

Scaling is also perfect in the sense that SM2256 reaches its maximum SATA 6Gbps limited throughput at QD2.

Sequential Write Performance

Iometer - 128KB Sequential Write

Unfortunately, sequential write performance isn't as good. Low write performance is a known quantity of TLC NAND, but given the 500GB capacity I was expecting a little more because even the Ultra II is faster despite having only half the NAND.

Iometer - 128KB Sequential Write (Power)

Surprisingly power consumption doesn't go up compared to the SM2246EN, but given the poor throughput the efficiency isn't anywhere near as good. 

SMI 2256 Reference Design 500GB


Mixed Random Read/Write Performance

For full details of how we conduct our Iometer tests, please refer to this article.

Iometer - Mixed 4KB Random Read/Write

Mixed random performance is slightly below the average, but not catastrophic, and power efficiency appears to be rather good.

Iometer - Mixed 4KB Random Read/Write (Power)

Where the SM2256 loses is in write performance, because performance is on par with the SM2246EN before moving to 20/80 and pure writes.

SMI 2256 Reference Design 500GB

 

Mixed Sequential Read/Write Performance

Iometer - Mixed 128KB Sequential Read/Write

Mixed sequential test seems to separate the planar TLC drives from the rest because both the SM2256 and Ultra II are at the bottom. The SM2256 is higher performance and more power efficient than the Ultra II, though, but it can't keep up with the MLC drives. 

Iometer - Mixed 128KB Sequential Read/Write (Power)

Again it's the write performance where the SM2256 loses to its MLC competitors because instead of having a bathtub-like curve the performance pretty much just decreases as the share of writes increases and evens out after 40/60. 

SMI 2256 Reference Design 500GB


Final Words

To get the elephant in the room out first, I'm not overly satisfied with the performance. The SM2256 is slower than the other TLC SSDs (850 EVO & Ultra II) on the market and not just by an insignificant margin. Especially small size random IOs are rather slow, which impact the overall performance because many IOs in client workloads are 4KB and random in nature. For very light IO workloads that obviously won't be a major issue, but anything more IO intensive (like virtualization) could be severely impacted by SM2256's high latency. To be frank, I never expected the SM2256 to perform like the SM2246EN because no controller can get around the performance limitations of TLC NAND, but given how well the SM2246EN performs I was expecting the SM2256 to be faster than it is.

Aside from performance, the other problem with SM2256 is its power consumption. The part I absolutely love in the SM2246EN is its extremely low power consumption, but the SM2256 practically doubles the power draw that makes it one of the least efficient drives we have tested. Again, expecting the SM2256 to be as efficient as the SM2246EN wouldn't be fair because TLC inherently has higher power consumption as it needs a higher number of program-verify iterations, but even then doubled power consumption is a bit more than I was looking forward to. 

I do wonder how big of a difference the NAND makes, though. As I mentioned on page one, we will likely never see this configuration on the retail market because Samsung doesn't sell its TLC to third parties in large quantities, so I'm not sure if Silicon Motion has spent a ton of resources in optimizing the firmware for NAND that won't be used outside of engineering samples. I certainly hope that the SM2256 performs better with NAND from other vendors because as it stands the performance is quite underwhelming against the competition, but nothing that couldn't be fixed with better optimization. After all, the sample I have doesn't have the final retail firmware in it, so we will have to wait for shipping drives before drawing the final verdict on the SM2256.

In any case, it will all boil down to pricing anyway. If Silicon Motion's OEM partners can drive the prices down with the SM2256, I will be totally fine with the performance because the SM2256 is still more than fine for basic usage. TLC SSD pricing was actually one of the things I was very vocal about at Computex because OEMs can't price their TLC drives similarly to the MLC ones and expect it to be a good sale. TLC isn't as good as MLC and that's a fact that nobody can deny. Especially after Samsung's issues with TLC the market has become more skeptical about TLC in general, so saving a few bucks isn't enough anymore for the educated buyers to choose TLC over MLC -- I think the difference has to be in the order of 10% or so to be worth the lower performance and possible long-term reliability risks that TLC brings. I do believe that the SM2256 is a vehicle capable of delivering such cost savings, but for now we will just need to wait and see what happens.

Log in

Don't have an account? Sign up now