Original Link: https://www.anandtech.com/show/4712/the-crucial-m4-ssd-update-faster-with-fw0009
The Crucial m4 SSD Update: Faster with FW0009
by Anand Lal Shimpi on August 31, 2011 12:56 AM ESTAn Update on SandForce
Before we get to the topic at hand today I wanted to give a brief update on SandForce. In our last SSD article I mentioned that I'd been able to replicate the infamous SF-2281 BSOD bug. In my testing the issue never appears as a full on BSOD, instead I either see periods of very high IO latency (multiple seconds) or a hard lock requiring a reset. The problem doesn't appear with any amount of regularity in most of my testbeds, however I can get one specific test system (the ASUS P8Z68V-Pro I mentioned in the earlier article) running the right workload to exhibit this issue at least once in any 72 hour period. I don't know whether or not this issue is related to the BSOD bug that many complain about, but I do know that the behavior isn't desirable and doesn't appear to impact other comparable SSDs. At the same time, the issue doesn't appear to be present and/or as severe on all platforms. Since the last article I've deployed two more drives in separate systems, neither of which has come back with any serious issues yet.
I still believe whatever issue plagues these drives to be limited in scope, but without a way of predicting whether or not the problem will occur it's still a thorn in SandForce's side. Contrary to what you may have heard, I believe this issue impacts all SF-2xxx based drives and I've reproduced it on drives from multiple vendors.
SandForce is going to be flying down a representative to take a look at my test system to help determine the root cause of the issue.
The Crucial m4 Update
When we first reviewed Crucial's m4 SSD we came away with mixed feelings on the drive. In some cases it was the first or second fastest drive we'd reviewed, while in others it struggled to outperform last year's C300. While Crucial has been diligent in updating the m4 to fix compatibility issues, we haven't seen any of the performance increases Crucial promised at the drive's introduction.
That all changed last week as Crucial posted the latest 0009 firmware for the m4 and Micron C400. The firmware updates drives that shipped with the original 0001 firmware as well as those with the previous 0002 version. Crucial supplies a bootable ISO that you can either burn to a CD or image to a USB drive.
The firmware update process went smoothly for me. I tested on an Intel DH67BL motherboard with the SATA ports set to AHCI. I used a USB stick imaged with the ISO via UNetbootin.
Crucial's release notes indicate improved performance as a major feature of FW0009:
Release Date: 08/25/2011
Change Log:
- Changes made in version 0002 (m4 can be updated to revision 0009 directly from either revision 0001 or 0002)
- Improved throughput performance.
- Increase in PCMark Vantage benchmark score, resulting in improved user experience in most operating systems.
- Improved write latency for better performance under heavy write workloads.
- Faster boot up times.
- Improved compatibility with latest chipsets.
- Compensation for SATA speed negotiation issues between some SATA-II chipsets and the SATA-III device.
- Improvement for intermittent failures in cold boot up related to some specific host systems.
The Test
CPU | Intel Core i7 2600K running at 3.4GHz (Turbo & EIST Disabled) - for AT SB 2011, AS SSD & ATTO |
Motherboard: | Intel DH67BL Motherboard |
Chipset: | Intel H67 |
Chipset Drivers: | Intel 9.1.1.1015 + Intel RST 10.2 |
Memory: | Corsair Vengeance DDR3-1333 2 x 2GB (7-7-7-20) |
Video Card: | eVGA GeForce GTX 285 |
Video Drivers: | NVIDIA ForceWare 190.38 64-bit |
Desktop Resolution: | 1920 x 1200 |
OS: | Windows 7 x64 |
Random Read/Write Speed
The four corners of SSD performance are as follows: random read, random write, sequential read and sequential write speed. Random accesses are generally small in size, while sequential accesses tend to be larger and thus we have the four Iometer tests we use in all of our reviews.
Our first test writes 4KB in a completely random pattern over an 8GB space of the drive to simulate the sort of random access that you'd see on an OS drive (even this is more stressful than a normal desktop user would see). I perform three concurrent IOs and run the test for 3 minutes. The results reported are in average MB/s over the entire time. We use both standard pseudo randomly generated data for each write as well as fully random data to show you both the maximum and minimum performance offered by SandForce based drives in these tests. The average performance of SF drives will likely be somewhere in between the two values for each drive you see in the graphs. For an understanding of why this matters, read our original SandForce article.
Random read performance was a strong suit of the old C300 and Crucial scaled back on it a bit when building the m4's firmware. The 0009 firmware improves performance a little bit but not tremendously. As we've seen in our previous reviews however, for desktop users being able to hit over 60MB/s in 4KB random reads doesn't translate into real world performance gains. All of the drives here do very well.
Random write performance also improved a little bit post update. The m4 is still incredibly fast here, easily the second fastest drive we've tested.
Many of you have asked for random write performance at higher queue depths. What I have below is our 4KB random write test performed at a queue depth of 32 instead of 3. While the vast majority of desktop usage models experience queue depths of 0 - 5, higher depths are possible in heavy I/O (and multi-user) workloads:
Crucial's performance doesn't scale up with higher queue depths like SandForce's. That's mostly because with highly compressible data sets, SF's drives don't actually write more as queue depth goes up. The controller has to do more deduping but as long as it can keep up, performance looks like it increases while physical writes to NAND don't. Most random writes on a desktop machine are going to be highly compressible, however most desktop workloads won't see sustained 4KB random writes at a queue depth of 32.
Sequential Read/Write Speed
To measure sequential performance I ran a 1 minute long 128KB sequential test over the entire span of the drive at a queue depth of 1. The results reported are in average MB/s over the entire test length.
Here's where we see some huge gains. The original m4 firmware posted sequential read performance lower than the C300. With the update to FW0009, performance now surpasses the C300 and comes in just slightly behind Intel's SSD 510.
Sequential write performance sees a small gain but nothing major here.
AS-SSD Incompressible Sequential Performance
The AS-SSD sequential benchmark uses incompressible data for all of its transfers. The result is a pretty big reduction in sequential write speed on SandForce based controllers.
AS-SSD tends to post higher numbers than our Iometer sequential tests, and here we see Crucial break the 500MB/s barrier in read performance with the FW0009 update.
Write speed under AS-SSD actually fell a bit, the older firmware was 10% faster here.
AnandTech Storage Bench 2011
Last year we introduced our AnandTech Storage Bench, a suite of benchmarks that took traces of real OS/application usage and played them back in a repeatable manner. I assembled the traces myself out of frustration with the majority of what we have today in terms of SSD benchmarks.
Although the AnandTech Storage Bench tests did a good job of characterizing SSD performance, they weren't stressful enough. All of the tests performed less than 10GB of reads/writes and typically involved only 4GB of writes specifically. That's not even enough exceed the spare area on most SSDs. Most canned SSD benchmarks don't even come close to writing a single gigabyte of data, but that doesn't mean that simply writing 4GB is acceptable.
Originally I kept the benchmarks short enough that they wouldn't be a burden to run (~30 minutes) but long enough that they were representative of what a power user might do with their system.
Not too long ago I tweeted that I had created what I referred to as the Mother of All SSD Benchmarks (MOASB). Rather than only writing 4GB of data to the drive, this benchmark writes 106.32GB. It's the load you'd put on a drive after nearly two weeks of constant usage. And it takes a *long* time to run.
1) The MOASB, officially called AnandTech Storage Bench 2011 - Heavy Workload, mainly focuses on the times when your I/O activity is the highest. There is a lot of downloading and application installing that happens during the course of this test. My thinking was that it's during application installs, file copies, downloading and multitasking with all of this that you can really notice performance differences between drives.
2) I tried to cover as many bases as possible with the software I incorporated into this test. There's a lot of photo editing in Photoshop, HTML editing in Dreamweaver, web browsing, game playing/level loading (Starcraft II & WoW are both a part of the test) as well as general use stuff (application installing, virus scanning). I included a large amount of email downloading, document creation and editing as well. To top it all off I even use Visual Studio 2008 to build Chromium during the test.
The test has 2,168,893 read operations and 1,783,447 write operations. The IO breakdown is as follows:
AnandTech Storage Bench 2011 - Heavy Workload IO Breakdown | ||||
IO Size | % of Total | |||
4KB | 28% | |||
16KB | 10% | |||
32KB | 10% | |||
64KB | 4% |
Only 42% of all operations are sequential, the rest range from pseudo to fully random (with most falling in the pseudo-random category). Average queue depth is 4.625 IOs, with 59% of operations taking place in an IO queue of 1.
Many of you have asked for a better way to really characterize performance. Simply looking at IOPS doesn't really say much. As a result I'm going to be presenting Storage Bench 2011 data in a slightly different way. We'll have performance represented as Average MB/s, with higher numbers being better. At the same time I'll be reporting how long the SSD was busy while running this test. These disk busy graphs will show you exactly how much time was shaved off by using a faster drive vs. a slower one during the course of this test. Finally, I will also break out performance into reads, writes and combined. The reason I do this is to help balance out the fact that this test is unusually write intensive, which can often hide the benefits of a drive with good read performance.
There's also a new light workload for 2011. This is a far more reasonable, typical every day use case benchmark. Lots of web browsing, photo editing (but with a greater focus on photo consumption), video playback as well as some application installs and gaming. This test isn't nearly as write intensive as the MOASB but it's still multiple times more write intensive than what we were running last year.
As always I don't believe that these two benchmarks alone are enough to characterize the performance of a drive, but hopefully along with the rest of our tests they will help provide a better idea.
The testbed for Storage Bench 2011 has changed as well. We're now using a Sandy Bridge platform with full 6Gbps support for these tests.
AnandTech Storage Bench 2011 - Heavy Workload
We'll start out by looking at average data rate throughout our new heavy workload test:
Overall performance in our heavy workload is now on par with last year's C300. We don't see huge gains in our heavy suite but Crucial has never really done well here to begin with. At least the numbers are up above the C300. The SF-2281 based Kingston HyperX has a 33% performance advantage here but if you look at the disk busy time below that amounts to around 4 minutes saved over hours of execution. In other words, you'd be hard pressed to tell the difference between nearly any of these drives.
The next three charts just represent the same data, but in a different manner. Instead of looking at average data rate, we're looking at how long the disk was busy for during this entire test. Note that disk busy time excludes any and all idles, this is just how long the SSD was busy doing something:
AnandTech Storage Bench 2011 - Light Workload
Our new light workload actually has more write operations than read operations. The split is as follows: 372,630 reads and 459,709 writes. The relatively close read/write ratio does better mimic a typical light workload (although even lighter workloads would be far more read centric).
The I/O breakdown is similar to the heavy workload at small IOs, however you'll notice that there are far fewer large IO transfers:
AnandTech Storage Bench 2011 - Light Workload IO Breakdown | ||||
IO Size | % of Total | |||
4KB | 27% | |||
16KB | 8% | |||
32KB | 6% | |||
64KB | 5% |
Our lighter workload however shows about a 6% gain in performance over the older firmware. Granted a 6% increase in IO bound performance likely won't be noticeable in real world tasks, but it's always nice to see numbers moving up.
Testing TRIM
SSDs by default have no knowledge of what locations in NAND contain valid or invalid data. All writes are committed to NAND and it's not until an LBA is overwritten that the controller gets rid of the data previously at that address. Well designed controllers thrive with as much free space as possible, but without knowing what NAND blocks contain valid data the amount of free space on a drive from the controller's perspective diminishes over time. This results in the oh-so-familiar performance degradation over time that plagues all SSDs. The ATA TRIM command was introduced to help alleviate this issue. In a supported OS with TRIM enabled drivers, whenever data is deleted a command is sent to the SSD to TRIM the LBAs that map to the now invalid data. Upon receiving this command, modern SSDs mark those NAND blocks as invalid and schedule them for recycling. In the end this helps performance remain high over time.
Note that TRIM doesn't solve all fragmentation - you need a good controller with well architected firmware to ensure that fragmentation doesn't become an out of control problem. To understand how the m4's TRIM implementation performs let's first look at its fresh, out-of-box sequential write speed:
Note the relatively consistent write performance averaging 250MB/s. Now let's put the drive in a horribly fragmented state by first writing to all user addressable LBAs sequentially then performing a 4KB random write test at a queue depth of 32 across the entire drive for 20 minutes. This is what a sequential pass looks like after our little torture session:
Note the remarkable falloff in performance. Most of this is due to just how fast the m4 can write 4KB of data randomly across the drive, but it also shows that Crucial manages to reach such high 4KB random write speeds by not adequately combating fragmentation on the fly. Thankfully the drive does seem to recover pretty well, here's what it looks like after a second sequential pass:
Finally if we format the entire drive we get to see how well the m4 responds to the ATA TRIM command. To avoid giving the m4 an easier time I secure erased the drive, re-ran our torture test without the second pass (above) and then TRIMed its contents to produce the graph below:
Performance is back to new. What does all of this tell us? If you're running an OS with TRIM support you'll likely be just fine with the m4. Pseudo-random writes are common within any desktop workload, but if you avoid filling your drive to capacity and have a TRIM supported OS the drive shouldn't get into a bad state. The bigger concern is running the m4 in an OS without official TRIM support (e.g. Mac OS X) where you could find yourself in a particularly bad situation over a long period of time. Even then, it's obvious that sequential write passes over used LBAs cleans the drive up fairly well. Chances are that a standard desktop workload in a TRIM-free OS would be fine over the long run. If not, some sequential writes to any free space would do the trick (e.g. copying a large video file then deleting it).
Final Words
The 0009 firmware update for Crucial's m4 drive mostly improves its sequential read performance. Unless you're doing a lot of large file copies to another high speed SSD or drive array you likely won't see any huge gains in real world use. Where does the m4 stand in the grand scheme of things? It is among the SF-2281 and Intel SSD 510s of the world when it comes to performance. The latter two generally benchmark better but all three drives do well in real world usage.
SSD Pricing | |||||
Crucial m4 | Intel SSD 510 | Kingston HyperX | OCZ Vertex 3 | ||
256GB/250GB/240GB | $390.99 | $504.99 | $489.99 | ||
128GB/120GB | $227.99 | $276.99 | $244.99 | $219.99 |
At $390.99 the m4 is one of the most affordable 256GB drives on the market today. You don't get the best absolute performance in all of our tests but you'd be hard pressed to tell the difference. The m4 does quite well in our light workload which is very representative of a typical desktop usage experience. My only concern with the m4 is really how bad the controller will let performance get with sustained random writes. I'd gladly trade lower 4KB random write performance for better sustained numbers.
If you don't want a SandForce drive and want something more affordable than Intel's SSD 510, the m4 just started looking a lot better.