Original Link: https://www.anandtech.com/show/16942/usb-32-gen-2x2-portable-ssds-go-native-the-silicon-motion-sm2320-ufd-controller-preview



The external storage market has experienced rapid growth over the last few years, particularly in the retail consumer segment. The demand has been fueled by increased amounts of user-generated multimedia content and high-resolution artwork / assets for games that are better off being installed in external drives. The growth has mainly been in in bus-powered flash-based storage devices.

Thunderbolt SSDs are at the top in terms of both performance and price, but the last few years have seen various high-end portable SSDs with a USB interface. The USB 3.2 Gen 2x2 (20Gbps) ecosystem has been slowly gaining traction. Many device solutions have turned up in the retail market over the last couple of years – all of them have been based on ASMedia's ASM2364 bridge chip, with a discrete PCIe 3.0 x4 NMVe SSD downstream of the bridge. While such configurations deliver on the 20Gbps promise, they are not particularly power-efficient.

The power efficiency aspect changed earlier this month with the introduction of the Kingston XS2000 portable SSD. Based on Silicon Motion's SM232x family of USB Flash Drive (UFD) controllers, the product family offers full Gen 2x2 performance at a fraction of the power consumed by the Gen 2x2 SSD solutions currently in retail.

Looking to show off their new controllers, Silicon Motion sent across a bare reference board on which the Kingston XS2000 is based. The firmware used in the Kingston XS2000 and the reference design are pretty much identical, with the only difference being the absence of the casing and thermal solution. The review below presents a detailed evaluation report of the SM2320 reference design. Except for the analysis of the thermal design aspects / temperature profile, it also tracks what consumers can expect from the Kingston XS2000 portable SSD.

Introduction

External bus-powered storage devices have grown both in storage capacity as well as speeds over the last decade. Thanks to rapid advancements in flash technology (including the advent of 3D NAND and NVMe) as well as faster host interfaces (such as Thunderbolt 3 and USB 3.2 Gen 2x2), we now have palm-sized flash-based storage devices capable of delivering 2GBps+ speeds. Traditionally, these portable drives have fallen into one of the six categories below, depending on the performance profile and the components used.

  • 2.5GBps+ class: Thunderbolt SSDs with PCIe 3.0 x4 NVMe drives
  • 2GBps+ class: USB 3.2 Gen 2x2 SSDs with PCIe 3.0 x4 NVMe drives
  • 1GBps+ class: USB 3.2 Gen 2 SSDs with PCIe 3.0 (x4 or x2) NVMe drives
  • 500MBps+ class: USB 3.2 Gen 2 SSDs with SATA drives
  • 400MBps+ class: USB 3.2 Gen 1 SSDs with SATA drives
  • Sub-400MBps+ class: USB 3.2 Gen 1 flash drives with direct flash-to-USB controllers

At the 2021 CES, Phison introduced the U17 (USB 3.2 Gen 2) and U18 (USB 3.2 Gen 2x2) UFD controllers, which added additional categories in the above list - a sub-1GBps performance class using direct flash-to-USB 3.2 Gen 2 controllers and a sub-2GBps performance class using direct flash-to-USB 3.2 Gen 2x2 controllers. The Crucial X6 Portable SSD lineup was upgraded earlier this year to utilize the Phison U17 controller, but the U18 controller doesn't seem to have hit retail yet.

Silicon Motion, on the other hand, was late to UFD party in terms of putting out their press release. However, they managed to get their design wins shipping along with the public announcement of their SM2320 and SM2321 controllers. The SM2320 USB 3.2 Gen 2x2 controller actually promises 2GBps+ speeds, while the SM2321 USB 3.2 Gen 2 enables 1GBps+ flash drives. These numbers seem to offer more than what Phison promises in the U17 and U18, though real-world testing is essential to compare the two controllers beyond the claimed marketing numbers.

The SM2320-based portable SSDs (and the U18-based ones, when they eventually appear for sale) represent the retail market's first look at a non-ASMedia solution in the USB 3.2 Gen 2x2 device market. The advantages of a native UFD controller over a combination of a bridge and NVMe controller are self-evident:

  • Reduced BOM cost, leading to lower retail price for the consumer products
  • Reduced power consumption
  • More space on the PCB to integrate additional flash packages for extra performance
  • Reduced space requirements leading to more compact UFDs
  • Integrated security features preventing hardware-based security attacks possible in a bridge chip / NVMe SSD solution

Silicon Motion sent across their 1TB SM2320 reference design for evaluation around the same time that Kingston started sampling their XS2000 portable SSD (based on the same board). While we were not on Kingston's sampling list for the end product, the SM2320 reference design gives us quite a bit of insight into the XS2000's operations.

The SM2320 board measures 62mm x 25.4mm x 5mm, and weighs a puny 7g. These are without the casing / thermal solution. The board is double-sided and contains four flash packages. A USB 3.2 Gen 2x2 Type-C cable was supplied along with the board.

 

The SM2320 reference board represents the first native UFD USB 3.2 Gen 2x2 solution that we have evaluated. For comparison purposes, we have a couple of leading edge bridge-based 1TB solutions - the WD_BLACK P50, and the Seagate FireCuda Gaming SSD. The DIY Silverstone MS12 is also included. As a representation of the native UFD controller scene for high-performance flash drives, we also have the Crucial X6 4TB Portable SSD based on the Phison U17. Despite its USB 3.2 Gen 2 (10Gbps) performance rating, and its quadrupled capacity point, we believe it offers insights into the power efficiency possible with non-dual chip solutions.

CrystalDiskInfo provides a quick overview of the capabilities of the internal storage device. Since the program handles each bridge chip differently, and the SM2320 is quite new, many of the entries are marked as vendor-specific, and some of the capabilities (such as the interface) are deciphered incorrectly. The temperature monitoring worked well, though.

Comparative Direct-Attached Storage Devices Configuration
Aspect
Downstream Port Native Flash 1x PCIe 3.0 x4 (M.2 NVMe)
Upstream Port USB 3.2 Gen 2x2 Type-C USB 3.2 Gen 2x2 Type-C
Bridge Chip Silicon Motion SM2320XT ASMedia ASM2364
Power Bus Powered Bus Powered
     
Use Case Low-power 2GBps-class, compact portable SSD reference design M.2 2242 / 2260 / 2280 NVMe SSD enclosure
DIY 2GBps-class, compact, and sturdy portable SSD with a USB flash drive-like form-factor
     
Physical Dimensions 62 mm x 25.4 mm x 5 mm (without casing) 107 mm x 34 mm x 16 mm
Weight 7 grams (without cable and casing) 53 grams (without cable / SSD ; with thermal pads)
Cable N/A 30 cm USB 3.2 Gen 2x2 Type-C to Type-C
     
S.M.A.R.T Passthrough Yes Yes
UASP Support Yes Yes
TRIM Passthrough Yes Yes
Hardware Encryption Proprietary SSD-dependent
     
Evaluated Storage Micron 96L 3D TLC SK hynix P31 PCIe 3.0 x4 NVMe SSD
SK hynix 128L 3D TLC
     
Price USD 160 USD 70
Review Link Silicon Motion SM2320XT Reference (Kingston XS2000) 1TB Review SilverStone Tek MS12 Review

Prior to looking at the benchmark numbers, power consumption, and thermal solution effectiveness, a description of the testbed setup and evaluation methodology is provided.

Testbed Setup and Evaluation Methodology

Direct-attached storage devices (including SD Express cards) are evaluated using the Quartz Canyon NUC (essentially, the Xeon / ECC version of the Ghost Canyon NUC) configured with 2x 16GB DDR4-2667 ECC SODIMMs and a PCIe 3.0 x4 NVMe SSD - the IM2P33E8 1TB from ADATA.

The most attractive aspect of the Quartz Canyon NUC is the presence of two PCIe slots (electrically, x16 and x4) for add-in cards. In the absence of a discrete GPU - for which there is no need in a DAS testbed - both slots are available. In fact, we also added a spare SanDisk Extreme PRO M.2 NVMe SSD to the CPU direct-attached M.2 22110 slot in the baseboard in order to avoid DMI bottlenecks when evaluating Thunderbolt 3 devices. This still allows for two add-in cards operating at x8 (x16 electrical) and x4 (x4 electrical). Since the Quartz Canyon NUC doesn't have a native USB 3.2 Gen 2x2 port, Silverstone's SST-ECU06 add-in card was installed in the x4 slot. All non-Thunderbolt devices are tested using the Type-C port enabled by the SST-ECU06.

The specifications of the testbed are summarized in the table below:

The 2021 AnandTech DAS Testbed Configuration
System Intel Quartz Canyon NUC9vXQNX
CPU Intel Xeon E-2286M
Memory ADATA Industrial AD4B3200716G22
32 GB (2x 16GB)
DDR4-3200 ECC @ 22-22-22-52
OS Drive ADATA Industrial IM2P33E8 NVMe 1TB
Secondary Drive SanDisk Extreme PRO M.2 NVMe 3D SSD 1TB
Add-on Card SilverStone Tek SST-ECU06 USB 3.2 Gen 2x2 Type-C Host
OS Windows 10 Enterprise x64 (21H1)
Thanks to ADATA, Intel, and SilverStone Tek for the build components

The testbed hardware is only one segment of the evaluation. Over the last few years, the typical direct-attached storage workloads for memory cards have also evolved. High bit-rate 4K videos at 60fps have become quite common, and 8K videos are starting to make an appearance. Game install sizes have also grown steadily even in portable game consoles, thanks to high resolution textures and artwork. Keeping these in mind, our evaluation scheme for portable SSDs and UFDs involves multiple workloads which are described in detail in the corresponding sections.

  • Synthetic workloads using CrystalDiskMark and ATTO
  • Real-world access traces using PCMark 10's storage benchmark
  • Custom robocopy workloads reflective of typical DAS usage
  • Sequential write stress test

A comprehensive overview of the performance of the SM2320 reference design is provided in the following sections. Prior to providing concluding remarks, we have some observations on the drive's power efficiency aspect also.



Synthetic Benchmarks - ATTO and CrystalDiskMark

Benchmarks such as ATTO and CrystalDiskMark help provide a quick look at the performance of the direct-attached storage device. The results translate to the instantaneous performance numbers that consumers can expect for specific workloads, but do not account for changes in behavior when the unit is subject to long-term conditioning and/or thermal throttling. Yet another use of these synthetic benchmarks is the ability to gather information regarding support for specific storage device features that affect performance.

Silicon Motion claims read and write speeds of 2100 and 2000 MBps respectively for specific access traces. The ATTO benchmarks provided below deliver close results. ATTO benchmarking is restricted to a single configuration in terms of queue depth, and is only representative of a small sub-set of real-world workloads. It does allow the visualization of change in transfer rates as the I/O size changes, with optimal performance being reached around 512 KB for writes and 4MB for reads with a queue depth of 4.

CrystalDiskMark Benchmarks
TOP: BOTTOM:

For sequential accesses, the read speeds of the SM2320 reference design match / surpass the bridge-based solutions, while the write speeds are slightly behind (around 1800 MBps against 2100 MBps). In the random access workloads, the DRAM-less nature of the UFD controller result in performance loss for high-queue depth operations. At low queue depths, the write performance of the UFD controller matches the bridge-based solutions with high-end NVMe SSDs.



AnandTech DAS Suite - Benchmarking for Performance Consistency

Our testing methodology for storage bridges / direct-attached storage units takes into consideration the usual use-case for such devices. The most common usage scenario is transfer of large amounts of photos and videos to and from the unit. Other usage scenarios include the use of the unit as a download or install location for games and importing files directly from it into a multimedia editing program such as Adobe Photoshop. Some users may even opt to boot an OS off an external storage device.

The AnandTech DAS Suite tackles the first use-case. The evaluation involves processing five different workloads:

  • AV: Multimedia content with audio and video files totalling 24.03 GB over 1263 files in 109 sub-folders
  • Home: Photos and document files totalling 18.86 GB over 7627 files in 382 sub-folders
  • BR: Blu-ray folder structure totalling 23.09 GB over 111 files in 10 sub-folders
  • ISOs: OS installation files (ISOs) totalling 28.61 GB over 4 files in one folder
  • Disk-to-Disk: Addition of 223.32 GB spread over 171 files in 29 sub-folders to the above four workloads (total of 317.91 GB over 9176 files in 535 sub-folders)

Except for the 'Disk-to-Disk' workload, each data set is first placed in a 29GB RAM drive, and a robocopy command is issue to transfer it to the external storage unit (formatted in exFAT for flash-based units like the SM2320 reference design we are evaluating here).

robocopy /NP /MIR /NFL /J /NDL /MT:32 $SRC_PATH $DEST_PATH

Upon completion of the transfer (write test), the contents from the unit are read back into the RAM drive (read test) after a 10 second idling interval. This process is repeated three times for each workload. Read and write speeds, as well as the time taken to complete each pass are recorded. Whenever possible, the temperature of the external storage device is recorded during the idling intervals. Bandwidth for each data set is computed as the average of all three passes.

The 'Disk-to-Disk' workload involves a similar process, but with one iteration only. The data is copied to the external unit from the CPU-attached NVMe drive, and then copied back to the internal drive. It does include more amount of continuous data transfer in a single direction, as data that doesn't fit in the RAM drive is also part of the workload set.

AnandTech DAS Suite - Performance Consistency
TOP: BOTTOM:

The first three sets of writes and reads correspond to the AV suite. A small gap (for the transfer of the video suite from the internal SSD to the RAM drive) is followed by three sets for the Home suite. Another small RAM-drive transfer gap is followed by three sets for the Blu-ray folder. This is followed up with the large-sized ISO files set. Finally, we have the single disk-to-disk transfer set.

Workloads that are within the SLC cache exhibit good performance consistency. It is only the disk-to-disk set with more than 300GB of continuous data writes that pushes down the instantaneous bandwidth numbers to the 100MBps range. Temperatures are satisfactory, given that the reference design is a bare board with no product-level thermal solution in place. It is likely that a design like that of the Kingston XS2000 should be easily able to bring down the 86C peak to something a lot more comfortable for the flash and the controller.



PCMark 10 Storage Bench - Real-World Access Traces

There are a number of storage benchmarks that can subject a device to artificial access traces by varying the mix of reads and writes, the access block sizes, and the queue depth / number of outstanding data requests. We saw results from two popular ones - ATTO, and CrystalDiskMark - in a previous section. More serious benchmarks, however, actually replicate access traces from real-world workloads to determine the suitability of a particular device for a particular workload. Real-world access traces may be used for simulating the behavior of computing activities that are limited by storage performance. Examples include booting an operating system or loading a particular game from the disk.

PCMark 10's storage bench (introduced in v2.1.2153) includes four storage benchmarks that use relevant real-world traces from popular applications and common tasks to fully test the performance of the latest modern drives:

  • The Full System Drive Benchmark uses a wide-ranging set of real-world traces from popular applications and common tasks to fully test the performance of the fastest modern drives. It involves a total of 204 GB of write traffic.
  • The Quick System Drive Benchmark is a shorter test with a smaller set of less demanding real-world traces. It subjects the device to 23 GB of writes.
  • The Data Drive Benchmark is designed to test drives that are used for storing files rather than applications. These typically include NAS drives, USB sticks, memory cards, and other external storage devices. The device is subjected to 15 GB of writes.
  • The Drive Performance Consistency Test is a long-running and extremely demanding test with a heavy, continuous load for expert users. In-depth reporting shows how the performance of the drive varies under different conditions. This writes more than 23 TB of data to the drive.

Despite the data drive benchmark appearing most suitable for testing direct-attached storage, we opt to run the full system drive benchmark as part of our evaluation flow. Many of us use portable flash drives as boot drives and storage for Steam games. These types of use-cases are addressed only in the full system drive benchmark.

The Full System Drive Benchmark comprises of 23 different traces. For the purpose of presenting results, we classify them under five different categories:

  • Boot: Replay of storage access trace recorded while booting Windows 10
  • Creative: Replay of storage access traces recorded during the start up and usage of Adobe applications such as Acrobat, After Effects, Illustrator, Premiere Pro, Lightroom, and Photoshop.
  • Office: Replay of storage access traces recorded during the usage of Microsoft Office applications such as Excel and Powerpoint.
  • Gaming: Replay of storage access traces recorded during the start up of games such as Battlefield V, Call of Duty Black Ops 4, and Overwatch.
  • File Transfers: Replay of storage access traces (Write-Only, Read-Write, and Read-Only) recorded during the transfer of data such as ISOs and photographs.

PCMark 10 also generates an overall score, bandwidth, and average latency number for quick comparison of different drives. The sub-sections in the rest of the page reference the access traces specified in the PCMark 10 Technical Guide.

Booting Windows 10

The read-write bandwidth recorded for each drive in the boo access trace is presented below.

Full System Drive Benchmark Bandwidth (MBps)

Overall, the SM2320 reference design performs similar to a DRAM-less PCIe 3.0 x4 NVMe SSD behind a USB 3.2 Gen 2x2 bridge. This puts it in the middle of the pack in terms of PCMark 10 Storage Bench scores when compared with the other portable SSDs under consideration in this review.



Miscellaneous Aspects and Concluding Remarks

The performance of the storage bridges / drives in various real-world access traces as well as synthetic workloads was brought out in the preceding sections. We also looked at the performance consistency for these cases. Power users may also be interested in performance consistency under worst-case conditions, as well as drive power consumption. The latter is also important when used with battery powered devices such as notebooks and smartphones. Pricing is also an important aspect. We analyze each of these in detail below.

Worst-Case Performance Consistency

Flash-based storage devices tend to slow down in unpredictable ways when subject to a large number of small-sized random writes. Many benchmarks use that scheme to pre-condition devices prior to the actual testing in order to get a worst-case representative number. Fortunately, such workloads are uncommon for direct-attached storage devices, where workloads are largely sequential in nature. Use of SLC caching as well as firmware caps to prevent overheating may cause drop in write speeds when a flash-based DAS device is subject to sustained sequential writes.

Our Sequential Writes Performance Consistency Test configures the device as a raw physical disk (after deleting configured volumes). A fio workload is set up to write sequential data to the raw drive with a block size of 128K and iodepth of 32 to cover 90% of the drive capacity. The internal temperature is recorded at either end of the workload, while the instantaneous write data rate and cumulative total write data amount are recorded at 1-second intervals.

CrystalDiskMark Workloads - Power Consumption
TOP: BOTTOM:

The above graph backs up the power-efficiency claims of Silicon Motion with respect to the SM2320. A peak of 3.84W is much lower than the 5.88W peak obtained with the most power-efficient SK hynix P31 SSD behind the ASMedia ASM2364 bridge chip. The idle power (0.7W) matches the other native UFD solution in the Phison U17 (though that chip is a USB 3.2 Gen 2 solution, unlike the SM2320's Gen 2x2). However, this power number is higher than the best bridge solution's 0.49W. Otherwise, for long idle periods, the reference design will go to a completely idle state (sipping as low as 9mW) after around 20 minutes.

Final Words

Silicon Motion's SM2320 reference design allowed us to get an idea of the capabilities of high-performance native USB flash drive (UFD) controllers in a USB 3.2 Gen 2x2 configuration. The company also has a USB 3.2 Gen 2 solution in the SM2321, which will be looked at in a retail product shortly.

Performance-wise, the product ends up behaving similarly to a DRAM-less PCIe 3.0 x4 NVMe SSD behind a USB 3.2 Gen 2x2 bridge. The lack of DRAM for flash management may affect the performance for certain workloads, but that is more than made up for by the liberal amount of SLC cache – more than 10% of the drive's capacity. Write-intensive workloads with a larger active set can get negatively affected, but most consumers on the lookout for a UFD controller-based portable SSD are unlikely to put their drive through such access traces. Retail products based on the SM2320 are likely to be much smaller than dual-chip (bridge + controller) solutions.

The success of any product depends on the pricing. While we do not have any idea of what Silicon Motion charges its customers for the SM2320, we do have a product already in retail using the solution - the Kingston XS2000.

At $160, it undercuts the WD_BLACK P50 and Seagate FireCuda Gaming SSD by $50. Even a DIY solution with the SK hynix P31 and the SilverStone MS12 ends up at $185. While some of these costlier solutions offer better performance for some workloads, the XS2000 wins on the power efficiency front. Silicon Motion and Kingston seem to have the right pricing strategy to ensure that USB 3.2 Gen 2x2 continues to gain traction in the market.

 

Log in

Don't have an account? Sign up now