I only included Intel SSD 335 240GB because it's newer and actually a bit faster. You can always use our Bench tool to compare any SSD, here's M5 Pro versus 520:
I got a 240GB drive from a forum guy for $160 shipped. Seemed like a great deal on a solid drive. Besides, I thought that a lot of people have the 520.
This drive scores well in the Anandtech Storage Benchmarks. So my question is whether it means your test doesn't measure impact of IO consistency or if it simply doesn't matter in the real world?
Our Storage Suites are run on an empty drive, whereas in the IO consistency test the drive is first filled with sequential data before being hammered by 4KB random writes. The Storage Suites also consist of various IOs with different transfer sizes, queue depths and data patterns and as we have shown before, sequential writes recover performance with most SSDs. The SSD is also not being subjected to IO load all of the time, there are lots of idle periods where the SSD can do GC to recover performance.
So, our Storage Suites don't fully ignore IO consistency but it's hard to say how much of an impact the M5 Pro's IO consistency has on its scores.
The test is hitting the drive so hard, that cleanup operations don't have time to improve matters. More testing is needed. Few usage patterns would resemble indefinite 4KB random writes.
You need to examine the latency for this SSD to see what it is doing. Like you, I was surprised when I first saw the M5P dropping down to such low IOPS under sustained heavy load. Basically, the M5P is rapidly switching between two modes -- a slow throughput mode (presumably doing GC) and a high throughput mode. It certainly does not look pretty when you plot it out.
But there are two (possibly) mitigating factors:
1) The average throughput isn't terrible, especially with at least 20% OP. The more OP, the greater percentage of time the SSD spends in the high throughput mode, thus raising the average throughput. The average throughput still is not as good as the Vector, Neutron, or 840 Pro, but it is not as bad as it looks on the graph.
2) Importantly, Plextor appears to put an ABSOLUTE cap on worst-case latency. I have never seen the latency go over 500ms, no matter what you throw at it. For comparison, with the Samsung 840 Pro, with no OP and a very heavy load, the latency will, very occasionally, go over even 1000ms. You can easily see the bimodal distribution of latencies for the Plextor if you look at the normal probability scale CDF plot. It seems that Plextor has tuned the firmware so that whenever it is in the slow mode doing GC, it has an absolute limit of 500ms before any IO returns. I guess the price to be paid for that absolute latency cap is that the average and worst-case throughput is lower than the competition -- but not so much lower that nobody could consider it an acceptable trade-off in order to gain the absolute cap on worst-case latency.
Personally, I would still choose the 840 Pro over the Plextor for sustained heavy workloads (I would overprovision either SSD by at least 20%) because the 840 Pro has much better average latency. But I can imagine that some applications might benefit from the absolute 500ms cap on worst-case latency that the Plextor provides.
Note that none of this really matters for the consumer workloads most people would put an SSD under. Under most consumer workloads, neither the Plextor nor any of the others would have performance drops anywhere near as bad as shown in these sustained heavy workload conditions.
The max latency constraint is very clear from the graphs you generated. It's not that the firmware is "bad" necessarily, it is just optimizing for a different performance measurement, one that Anandtech doesn't cover.
I think an analysis of whether max latency is ever important than max throughput would be interesting, along with some data on how the Plextor compares to other drives in this alternate metric.
Unless the 1.02 firmware corrects some Bug or other issue, it's hardly worth the effort to update. As far as the listed SSDs, few if anyone would actually be able to tell the difference in performance between the various drives in actual use. The synthetic benches produce theoretical differences which can't be seen by typical desktop or lapto users.
"The peaks are actually high compared to other SSDs but having one IO transfer at 3-5x the speed every now and then won't help if over 90% of the transfers are significantly slower."
Just a note that the statement quoted above is misleading. From your graphs, you can see that the Plextor is cycling between 100 IOPS and 32,000+ IOPS (there are a scattering of points in between, but the distribution is predominantly bimodal). So it is hardly "3-5x the speed". The peaks are actually more than 300x the low speed. So the average throughput depends almost entirely on what percentage of the time the SSD spends at the 32,000+ IOPS speed. It is easy to see that adding 20% or 25% OP allows the Plextor to spend a larger percentage of its time at the high speed.
In my testing, the throughput peaks always last about 0.4sec, but at 0% OP they only occur with a period of about 7 seconds, while at 20% OP the peaks occur with a period of about 1.4 seconds, so the average throughput increases by about a factor of 5 (= 7 / 1.4) when comparing 20% OP to 0% OP. Incidentally, the SSD is spending about 30% of its time in the high-speed mode with 20% OP, but only about 6% of its time in the high-speed mode with 0% OP.
jwilliams. Got a few models of SSD that you'd recommend for reliability? I have a plextor M3 pro 120 GB and I'm thinking of buying a second one, probably 128, maybe 256. Samsung 830 is hard to find cheap now, so I'm looking at the Samsung 840 pro/Plextor M5 pro/Corsair/Kingston new ones released in the past 3 months.
At the moment, I think the Plextor M5P is probably the highest quality consumer SSD that you can buy. It has been out for more than 4 months and I have not seen any real bugs or defects reported. I like that Plextor publicizes the details of the tests they do for qualification as well as their production tests, and they are tough tests, so you know that the shipping SSDs that have passed those tests are high quality.
Now, I am assuming that you are not going to be subjecting the SSD to sustained heavy workloads like an hour of 4KQD32 writes. The M5P can do that of course, but if that were your main type of workload, there are higher performance choices like the Samsung 840 Pro or the Corsair Neutron GTX. But the 840 Pro has not been out long enough for me to call it high quality, and the Neutron and Neutron GTX do have a few reports of failures that may (or may not) be indicative of lower quality than the Plextor.
Thank you for the detailed explanation! Much appreciated.
I guess I can go ahead and search for good deals on M5P. I put reliability and durability(as in how long the drive can last) above all else. A slower speed that makes the drive that much longer is much more preferable to me.
The consistency testing is totally irrelevant for consumer workloads. Testing 4k full span random writes is ridiculous on an consumer SSD. This will NEVER be seen by any user outside of an enterprise scenario. This type of testing has been done for ages with enterprise SSDs by several sites. Never for consumer SSDs as its irrelevance is obvious. Then mix in the fact that this is done without a filesystem (which no user could ever do because they need this little thing called an 'operating system) and to top it off with absolutely no TRIM in play. without a filesystem there is no TRIM. What exactly are you recreating here that is of relevance?
I think it is a worthwhile test. At the very least, it is always interesting to see how products react when you hit them very hard. Sometimes you can expose hidden defects that way. Sometimes you can get a better idea of how the product operates under stress (which may sometimes be extrapolated to understand how it operates under lighter workloads). Since SSD reviewers generally only have the product for a few days before publishing a review, putting the equivalent of weeks (or months) of wear on the SSDs in a few days requires hitting them as hard as possible. And there will always be a few users who will subject their SSDs to extremely heavy workloads, so it will be relevant to a few users.
As long as the review mentions that the specific extreme workload being tested is unlikely to match that of the majority of consumers, I think the sustained heavy workload is a valuable component of all SSD reviews.
Without a filesystem or TRIM that the testing is merely pointing out anomalies in SSD performance. Full span writes in particular reduce the performance in many aspects such as GC. These SSDs are tailored to be used in consumer environments with consumer workloads. This is the farthest thing from a consumer workload that you can possibly get. The firmware is designed to operate in a certain manner, with filesystems and TRIM functions. They are also optimized for low QD usage and scheduled GC/maintenance that typically occurs during idle/semi-idle times. Pounding this with sustained and unreal workloads is like saying "Hey, if we test it for something it wasn't designed for it doesn't work well !!!!. Surprise. Of course it doesn't. Testing against the grain merely shows odd results that will never be observed in real life usage. This is a consumer product. This SSD is among the best in testing that is actually semi-relevant (though the trace testing isnt conducted with TRIM or a filesystem either, as disclosed by the staff) but the 'consistency testing' places it among the worst. They aren't allowing the SSD to function as it was designed to, then complain that it has bad performance. Kristan even specifically states that users will notice hiccups in performance. unreal.
There is no one way that an SSD should work, unless you want to say that it is to store and retrieve data written to LBAs. Since there are many ways SSDs can be used, and many different types of filesystems, it is absurd to say that doing a general test to the raw device is irrelevant.
On the contrary, it is clearly relevant and often useful, for the reasons I already explained. In many cases, it is even more relevant than picking one specific filesystem and testing with that, since any quirks of that filesystem could be irrelevant to usage with other filesystems. Besides, anandtech already does tests with a common filesystem (NTFS), so the tests you are so upset about are merely additional information that can be used or ignored.
Anand does not do testing with a filesystem . The trace program operates without a filesystem or the benefit of TRIM. It also substitutes its own data in place of the actual data used by the programs during recording. This leads to incorrect portrayals of system performance when dealing with SSDs that rely upon compression, and also any type of SSD in general since it isn't utilizing a filesystem or TRIM. Utilizing raw I/O to attempt to emulate recordings of user activity on an actual filesystem is an apples to oranges comparison. There is no 'wrong again' to it. SSDs do store and retrieve data from LBAs, but they are reliant upon the filesystem to issue the TRIM command. Without a filesystem there is no TRIM command. Therefore it is unrealistic testing not relevant to the device tested. The SSDs are tuned for consumer/light workloads, and their internal housekeeping and management routines are adjusted accordingly.
Wrong that it is totally irrelevant. Wrong again that the SSDs are only designed for certain workloads. Wrong that there is no TRIM without a filesystem.
Also wrong that anandtech does not do (any) testing with a filesystem. Some of the tests they do can only be run with a filesystem.
You are right that they do some very limited testing with filesystems, such as the ATTO testing. They do use Iometer, though it is possible to test without a filesystem in Iometer, so we aren't sure if they are or not. Their trace testing does not use a filesystem, and this has been publicly acknowledged. Recording filesystem usage and replaying it without a filesystem, and with different data, does not make sense. The firmware on SSDs can be tailored to handle tasks, such as GC, during certain times. This is determined upon the intended usage model for the SSD. The firmware is also tuned for certain types of access, which explains the huge differences in performance between some firmwares (reference M4 firmwares). This is SSD 101. TRIM requires that the SSD be informed of which data is now deleted, which is a function of the filesystem. Repeating the word 'wrong' isn't doing much to further your argument or actually prove anything.
I don't know what else to call it other than wrong. Would you prefer correctionally challenged? :-)
TRIM can be done without a filesystem. I'm not sure why you seem to think filesystems are magical. A filesystem is just a set of routines that handle the interface between the programs using filesystem calls and the raw block device. But there is nothing stopping a program from sending TRIM commands directly to the device. I use TRIM in my tests without a filesystem (hdparm is useful for that).
By the way, it seems to me that we have lost track of the important point here, which is that tests like we are discussing, without a filesystem and with sustained heavy write loads, are NOT meant to be given much weight by people with typical consumer workloads who are comparing SSDs. However, that does not mean that the test is irrelevant or a waste of time. When the test results are used correctly (as I explained earlier), in combination with results from other real-world tests, they provide a useful addition to our knowledge about each SSD.
Please don't assume that I am arguing that these tests are the only tests that should be done, or even that these are the most important tests. They are not. But they are also NOT irrelevant.
Yes, you can manually send a TRIM command via a number of techniques. However, this is no substitute for real time TRIM. For instance, when replaying a trace (such as those used in the bench here) the filesystem will be tagging data as deleted, and issuing TRIM commands in real time. Then it is up to the SSD to process these commands. This makes a tremendous impact upon the performance of the SSD, as it is intended to. Replaying traces without this crucial element leads to disastrously wrong results. Different SSDs handle TRIM commands better, or worse, than others. As a matter of fact some SSDs have had to get 'exceptions' from Windows because they do not handle the TRIM commands within a specified time range as dictated by spec (SandForce). So there is much more to TRIM than meets the eye, and it has a tremendous impact upon performance. Otherwise it simply would not exist. Running traces that are recorded with the benefit of TRIM without the benefit of TRIM is what you end up with. It has already been publicly acknowledged that anands trace testing does not have the benfit of TRIM, of course it doesnt, the SSDs do not have a filesystem for deletion issuance and command. SO yes, irrelevant incorrect trace testing. Yes, irrelevant and incorrect 'consistency testing' of consumer SSDs which are designed to operate with TRIM, and state such in their specifications. Pointing out errata on consumer SSDs revealed outside of intended usage is irresponsible.
In its default configuration, Windows does not issue TRIM commands for deleted files until you empty the trash (or the size of the trash exceeds some amount). So any trace that is "issuing TRIM commands in realtime" is not especially realistic. Besides, some write-intensive workloads do not delete files at all, so there would not be any TRIM commands to be issued (eg., some types of database files, some types of VM files)
Your problem is that you are making assumptions about how SSDs will used, and then saying any usage that does not follow your assumptions is irrelevant. As long as you continue to do that, you will continue to be wrong.
These are not assumptions: it is the intended market. These SSDs are designed and sold in the consumer market. period. There is a separate class of SSDs designed for these workloads, and they are designed and sold in the enterprise market. Thus the distinction, and name : "Enterprise SSDs". That is not hard to figure out. There are two classes simply due to the fact that the devices are tailored for their intended market, and usage model. I do not know how to explain this in more simple terms so that you may 'get it'. Once you begin speaking of the lack of TRIM commands in VM and database files surely something 'clicks' that you are beginning to speak of enterprise workloads. Consumer SSDs=designed to work in consumer environment (thus the name) with TRIM and other functionality.
If SSDs were not to be used for heavy workloads, then the warranty would state the prohibited workloads. But they do not, other than (most of them) giving a maximum total amount TB written.
The fact is that SSDs are used for many different applications and many different workloads.
Just because you favor certain workloads does not mean that everyone does.
Fortunately, not everyone thinks like you do on this subject.
I would ask you to explain why there is a difference between Enterprise class SSDs and consumer SSDs. If there is no difference in the design, firmware and expectations between these two devices then I am sure that the multi-billion dollar companies such as Google are awaiting your final word on this with baited breath. Why, they have been paying much more for these SSDs that are designed for enterprise workloads for no reason! You would think that these massive companies, such as Facebook, Amazon, Yahoo, and Google with all of their PHDs and humongous datacenters would have figured out that there is NO difference between these SSDs! Why for their datacenters (which average 12 football fields in size) they can just deploy Plextor M5s! By the tens of thousands! *hehe*
Also, operating systems and applications that delete data, such as temp and swap files, that do not end up in the recycle bin are subject to immediate TRIM command issuance. These files, thousands of them, are deleted every user session and are never seen in the recycle bin. Care to guess how many temp and cache files Firefox deletes in an hour?
Though I side with JR, in that the published IO consistency check doesn't show anything useful, modified it might have been an interesting test...(different blocksizes, taking breaks from only writing, maybe a reduction of load to see at which level the drives still perform well etc.etc).
I find the test disrespectful to Plextor, who after all have delivered an improved firmware, and Anand to come out of this one lacking in credibility.
The point is that since the files are subject to immediate issuance of a TRIM command and this cannot be done when testing without a filesystem, is that you cannot attempt to emulate application performance with trace-based testing and the utility they are using. This doesn't just apply to browsers either, this applies to ALL applications. From games, to office docs, to Win7 (or 8), or a simple media player. All applications, and operating systems, have these types of files that are subject to immediate TRIM commands.
possible to get the samsung 830 through the performance consistency test? I know it's been discontinued but it's still available for sale in many places.
I don't have any Samsung 830s at my place but I'll ask Anand to run it on the 830. He's currently travelling so it will probably take a while but I'll try to include it in an upcoming SSD review.
WOW! My jaw hit the floor when I first saw the 1st graph.
I totally understand that the consistency test isn't realistic for consumer usage. But what about people using the M5 Pro in low-to-moderate use servers? For example, I'm about to set up a new VMware dev server with four 256GB M5 Pros in RAID 10 (using LSI 9260-8i) supporting 8-10 VMs. In this config, I won't have TRIM support at all.
If I understand this consistency test, the drive is being filled and then its being sent non-stop requests for 30 mins that overwrite data in 4KB chunks. This *seems* to be an ultra-extreme case. In reality, most server drives are getting a choppy, fluctuating mixture of read and write operations with random pauses so the GC has much more time to do it work than the consistency test allowed, right?
Should I be concerned about using the M5 Pro is low-to-moderate use server situations? Should I over-provision the drives by 25% to ameliorate this? Or, worse yet, are these drives so bad that I should return them while I can? Or is this test case so far away from what I'll likely be seeing that I should use the drives normally with no extra over-provisioning?
For a non-TRIM environment this would be especially bad. You are constrained to the lowest speed of each SSD. In practical use in a scenario such as yours you will be receiving the bottom line of performance constantly. The RAID is only as fast as the slowest member, and with each drive dropping into the lower state frequently, one SSD will surely be in this low range constantly. Therefore your RAID will suffer. These SSDs are simply not designed for this usage. They are tailored for low QD, and the test results show that clearly.
Understood, if my server's workload were to be relatively heavy. But do you really think that my server's workload (based on an admittedly rough description above) is going to get into these sorts of problematic situations?
I disagree. RAID 5 stripes, as does RAID 0, so they need to be synchronized(hard drives had to spin in-sync.) But RAID 1 uses the drive that answers first, as they have the same data. RAID 10 is a bit of both I suppose, but I also don't agree that you think that the lack of TRIM forces the drive into a low speed state in the first place.
Doesn't TRIM just tell the drive what is safe to delete? Unless the drive is near full, why would that affect its' speed? TRIM was essential 2-3 years ago, but after SF drives GC got much better. I don't even think TRIM matters on consumer drives now.
For the most part I don't think these "steady state" tests even matter on consumer drives(or servers as lunadesign has). Sure, they are nice tests and have useful data, but it lacks real world data. The name "steady state" is misleading, to me anyway. It will not be a steady state in my computer as that is not my usage pattern. Why not test the IOPS during standard benchmark runs? Even with 8-10 VM's his server will be idle most of the time. Of course, if all of those VM's are compiling software all day, then that is different, but that's not what VM's are setup for anyway.
GC still does not handle deleted data as efficiently as TRIM. There is still a huge need for TRIM. We can see the affects of using this SSD for something other than its intended purpose outside of a TRIM environment. There is a large distribution of writes that are returning sub-par performance in this environment. The array (striped across RAID 1) will suffer low performance, constrained to the speed of the lowest I/O. There are SSDs designed for this type of use specifically, hence why they have the distinction between enterprise and consumer storage.
Re: 'nar "RAID 5 stripes, as does RAID 0, so they need to be synchronized(hard drives had to spin in-sync.)"
Only RAID 3 required spindle-synced drives for performance reasons. No other RAID level requires that. Not only is spindle-sync completely irrelevant for SSDs, hard drives haven't been made with spindle-sync support for a very long time. Any "synchronization" in a modern RAID array has to do with the data being committed to stable storage. A full RAID 4/5/6 stripe should be written and acknowledged by the drives before the next stripe is written to prevent the data and parity blocks from getting out of sync. This is NOT a consideration for RAID 0 because there is no "stripe consistency" to be had due to the lack of a parity block.
Re: JellyRoll "The RAID is only as fast as the slowest member"
It is not quite so simple in most cases. It is only that simple for a single mirror set (RAID 1) performing writes. When you start talking about other RAID types, the effect of a single slow drive depends greatly on both the RAID setup and the workload. For example, high-QD small-block random read workloads would be the least affected by a slow drive in an array, regardless of the RAID type. In that case you should achieve random I/O performance that approaches the sum of all non-dedicated-parity drives in the array.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
46 Comments
Back to Article
JonnyDough - Monday, December 10, 2012 - link
I'm a bit sad the Intel 520 isn't represented here.Kristian Vättö - Monday, December 10, 2012 - link
I only included Intel SSD 335 240GB because it's newer and actually a bit faster. You can always use our Bench tool to compare any SSD, here's M5 Pro versus 520:http://www.anandtech.com/bench/Product/731?vs=529
JonnyDough - Monday, December 10, 2012 - link
I got a 240GB drive from a forum guy for $160 shipped. Seemed like a great deal on a solid drive. Besides, I thought that a lot of people have the 520.mckirkus - Monday, December 10, 2012 - link
This drive scores well in the Anandtech Storage Benchmarks. So my question is whether it means your test doesn't measure impact of IO consistency or if it simply doesn't matter in the real world?Kristian Vättö - Monday, December 10, 2012 - link
Our Storage Suites are run on an empty drive, whereas in the IO consistency test the drive is first filled with sequential data before being hammered by 4KB random writes. The Storage Suites also consist of various IOs with different transfer sizes, queue depths and data patterns and as we have shown before, sequential writes recover performance with most SSDs. The SSD is also not being subjected to IO load all of the time, there are lots of idle periods where the SSD can do GC to recover performance.So, our Storage Suites don't fully ignore IO consistency but it's hard to say how much of an impact the M5 Pro's IO consistency has on its scores.
TemjinGold - Monday, December 10, 2012 - link
Curious as to why this metric HASN'T been reviewed yet? I'm sure a lot of us would be curious as to how all the major SSDs do in this.skytrench - Monday, December 10, 2012 - link
The test is hitting the drive so hard, that cleanup operations don't have time to improve matters. More testing is needed. Few usage patterns would resemble indefinite 4KB random writes.jwilliams4200 - Monday, December 10, 2012 - link
You need to examine the latency for this SSD to see what it is doing. Like you, I was surprised when I first saw the M5P dropping down to such low IOPS under sustained heavy load. Basically, the M5P is rapidly switching between two modes -- a slow throughput mode (presumably doing GC) and a high throughput mode. It certainly does not look pretty when you plot it out.But there are two (possibly) mitigating factors:
1) The average throughput isn't terrible, especially with at least 20% OP. The more OP, the greater percentage of time the SSD spends in the high throughput mode, thus raising the average throughput. The average throughput still is not as good as the Vector, Neutron, or 840 Pro, but it is not as bad as it looks on the graph.
M5P with 0% OP (avg 7MB/s):
http://i.imgur.com/30ZDE.png
M5P with 20% OP (avg 75MB/s):
http://i.imgur.com/yj0cF.png
2) Importantly, Plextor appears to put an ABSOLUTE cap on worst-case latency. I have never seen the latency go over 500ms, no matter what you throw at it. For comparison, with the Samsung 840 Pro, with no OP and a very heavy load, the latency will, very occasionally, go over even 1000ms. You can easily see the bimodal distribution of latencies for the Plextor if you look at the normal probability scale CDF plot. It seems that Plextor has tuned the firmware so that whenever it is in the slow mode doing GC, it has an absolute limit of 500ms before any IO returns. I guess the price to be paid for that absolute latency cap is that the average and worst-case throughput is lower than the competition -- but not so much lower that nobody could consider it an acceptable trade-off in order to gain the absolute cap on worst-case latency.
M5P with 0% OP, worst-case latency 500ms:
http://i.imgur.com/pVmWQ.png
Samsung 840 Pro with 0% OP, worst-case latency >1000ms:
http://i.imgur.com/fjA7N.png
Personally, I would still choose the 840 Pro over the Plextor for sustained heavy workloads (I would overprovision either SSD by at least 20%) because the 840 Pro has much better average latency. But I can imagine that some applications might benefit from the absolute 500ms cap on worst-case latency that the Plextor provides.
Note that none of this really matters for the consumer workloads most people would put an SSD under. Under most consumer workloads, neither the Plextor nor any of the others would have performance drops anywhere near as bad as shown in these sustained heavy workload conditions.
Kevin G - Monday, December 10, 2012 - link
This makes me wonder Plextor has optimized their firmware for more consumer oriented loads. They typically have a lower queue depthckevin1 - Tuesday, December 11, 2012 - link
Great analysis, thank you!The max latency constraint is very clear from the graphs you generated. It's not that the firmware is "bad" necessarily, it is just optimizing for a different performance measurement, one that Anandtech doesn't cover.
I think an analysis of whether max latency is ever important than max throughput would be interesting, along with some data on how the Plextor compares to other drives in this alternate metric.
Beenthere - Monday, December 10, 2012 - link
Unless the 1.02 firmware corrects some Bug or other issue, it's hardly worth the effort to update. As far as the listed SSDs, few if anyone would actually be able to tell the difference in performance between the various drives in actual use. The synthetic benches produce theoretical differences which can't be seen by typical desktop or lapto users.Jocelyn - Monday, December 10, 2012 - link
Any chance we'll see Steady State testing with the M5 Pro for comparison?Jocelyn - Monday, December 10, 2012 - link
Had no idea this was a multi part review, sorry and Thank You :)jwilliams4200 - Monday, December 10, 2012 - link
"The peaks are actually high compared to other SSDs but having one IO transfer at 3-5x the speed every now and then won't help if over 90% of the transfers are significantly slower."Just a note that the statement quoted above is misleading. From your graphs, you can see that the Plextor is cycling between 100 IOPS and 32,000+ IOPS (there are a scattering of points in between, but the distribution is predominantly bimodal). So it is hardly "3-5x the speed". The peaks are actually more than 300x the low speed. So the average throughput depends almost entirely on what percentage of the time the SSD spends at the 32,000+ IOPS speed. It is easy to see that adding 20% or 25% OP allows the Plextor to spend a larger percentage of its time at the high speed.
In my testing, the throughput peaks always last about 0.4sec, but at 0% OP they only occur with a period of about 7 seconds, while at 20% OP the peaks occur with a period of about 1.4 seconds, so the average throughput increases by about a factor of 5 (= 7 / 1.4) when comparing 20% OP to 0% OP. Incidentally, the SSD is spending about 30% of its time in the high-speed mode with 20% OP, but only about 6% of its time in the high-speed mode with 0% OP.
chrnochime - Monday, December 10, 2012 - link
jwilliams. Got a few models of SSD that you'd recommend for reliability? I have a plextor M3 pro 120 GB and I'm thinking of buying a second one, probably 128, maybe 256. Samsung 830 is hard to find cheap now, so I'm looking at the Samsung 840 pro/Plextor M5 pro/Corsair/Kingston new ones released in the past 3 months.Thanks.
jwilliams4200 - Monday, December 10, 2012 - link
At the moment, I think the Plextor M5P is probably the highest quality consumer SSD that you can buy. It has been out for more than 4 months and I have not seen any real bugs or defects reported. I like that Plextor publicizes the details of the tests they do for qualification as well as their production tests, and they are tough tests, so you know that the shipping SSDs that have passed those tests are high quality.Now, I am assuming that you are not going to be subjecting the SSD to sustained heavy workloads like an hour of 4KQD32 writes. The M5P can do that of course, but if that were your main type of workload, there are higher performance choices like the Samsung 840 Pro or the Corsair Neutron GTX. But the 840 Pro has not been out long enough for me to call it high quality, and the Neutron and Neutron GTX do have a few reports of failures that may (or may not) be indicative of lower quality than the Plextor.
JellyRoll - Tuesday, December 11, 2012 - link
In steady state the 840 is ridiculously good compared to other SSDs.chrnochime - Tuesday, December 11, 2012 - link
Thank you for the detailed explanation! Much appreciated.I guess I can go ahead and search for good deals on M5P. I put reliability and durability(as in how long the drive can last) above all else. A slower speed that makes the drive that much longer is much more preferable to me.
Kristian Vättö - Tuesday, December 11, 2012 - link
Oh, what I meant was that Plextor's peak throughput is around 3-5x the peak throughput of other SSDs.JellyRoll - Tuesday, December 11, 2012 - link
The consistency testing is totally irrelevant for consumer workloads. Testing 4k full span random writes is ridiculous on an consumer SSD. This will NEVER be seen by any user outside of an enterprise scenario. This type of testing has been done for ages with enterprise SSDs by several sites. Never for consumer SSDs as its irrelevance is obvious.Then mix in the fact that this is done without a filesystem (which no user could ever do because they need this little thing called an 'operating system) and to top it off with absolutely no TRIM in play. without a filesystem there is no TRIM. What exactly are you recreating here that is of relevance?
jwilliams4200 - Tuesday, December 11, 2012 - link
I think it is a worthwhile test. At the very least, it is always interesting to see how products react when you hit them very hard. Sometimes you can expose hidden defects that way. Sometimes you can get a better idea of how the product operates under stress (which may sometimes be extrapolated to understand how it operates under lighter workloads). Since SSD reviewers generally only have the product for a few days before publishing a review, putting the equivalent of weeks (or months) of wear on the SSDs in a few days requires hitting them as hard as possible. And there will always be a few users who will subject their SSDs to extremely heavy workloads, so it will be relevant to a few users.As long as the review mentions that the specific extreme workload being tested is unlikely to match that of the majority of consumers, I think the sustained heavy workload is a valuable component of all SSD reviews.
JellyRoll - Tuesday, December 11, 2012 - link
Without a filesystem or TRIM that the testing is merely pointing out anomalies in SSD performance. Full span writes in particular reduce the performance in many aspects such as GC.These SSDs are tailored to be used in consumer environments with consumer workloads. This is the farthest thing from a consumer workload that you can possibly get.
The firmware is designed to operate in a certain manner, with filesystems and TRIM functions. They are also optimized for low QD usage and scheduled GC/maintenance that typically occurs during idle/semi-idle times.
Pounding this with sustained and unreal workloads is like saying "Hey, if we test it for something it wasn't designed for it doesn't work well !!!!.
Surprise. Of course it doesn't.
Testing against the grain merely shows odd results that will never be observed in real life usage.
This is a consumer product. This SSD is among the best in testing that is actually semi-relevant (though the trace testing isnt conducted with TRIM or a filesystem either, as disclosed by the staff) but the 'consistency testing' places it among the worst.
They aren't allowing the SSD to function as it was designed to, then complain that it has bad performance.
Kristan even specifically states that users will notice hiccups in performance. unreal.
jwilliams4200 - Tuesday, December 11, 2012 - link
Wrong again.There is no one way that an SSD should work, unless you want to say that it is to store and retrieve data written to LBAs. Since there are many ways SSDs can be used, and many different types of filesystems, it is absurd to say that doing a general test to the raw device is irrelevant.
On the contrary, it is clearly relevant and often useful, for the reasons I already explained. In many cases, it is even more relevant than picking one specific filesystem and testing with that, since any quirks of that filesystem could be irrelevant to usage with other filesystems. Besides, anandtech already does tests with a common filesystem (NTFS), so the tests you are so upset about are merely additional information that can be used or ignored.
JellyRoll - Tuesday, December 11, 2012 - link
Anand does not do testing with a filesystem . The trace program operates without a filesystem or the benefit of TRIM. It also substitutes its own data in place of the actual data used by the programs during recording. This leads to incorrect portrayals of system performance when dealing with SSDs that rely upon compression, and also any type of SSD in general since it isn't utilizing a filesystem or TRIM. Utilizing raw I/O to attempt to emulate recordings of user activity on an actual filesystem is an apples to oranges comparison.There is no 'wrong again' to it. SSDs do store and retrieve data from LBAs, but they are reliant upon the filesystem to issue the TRIM command. Without a filesystem there is no TRIM command. Therefore it is unrealistic testing not relevant to the device tested. The SSDs are tuned for consumer/light workloads, and their internal housekeeping and management routines are adjusted accordingly.
jwilliams4200 - Tuesday, December 11, 2012 - link
Wrong that it is totally irrelevant. Wrong again that the SSDs are only designed for certain workloads. Wrong that there is no TRIM without a filesystem.Also wrong that anandtech does not do (any) testing with a filesystem. Some of the tests they do can only be run with a filesystem.
JellyRoll - Tuesday, December 11, 2012 - link
You are right that they do some very limited testing with filesystems, such as the ATTO testing. They do use Iometer, though it is possible to test without a filesystem in Iometer, so we aren't sure if they are or not.Their trace testing does not use a filesystem, and this has been publicly acknowledged. Recording filesystem usage and replaying it without a filesystem, and with different data, does not make sense.
The firmware on SSDs can be tailored to handle tasks, such as GC, during certain times. This is determined upon the intended usage model for the SSD. The firmware is also tuned for certain types of access, which explains the huge differences in performance between some firmwares (reference M4 firmwares). This is SSD 101.
TRIM requires that the SSD be informed of which data is now deleted, which is a function of the filesystem.
Repeating the word 'wrong' isn't doing much to further your argument or actually prove anything.
jwilliams4200 - Tuesday, December 11, 2012 - link
I don't know what else to call it other than wrong. Would you prefer correctionally challenged? :-)TRIM can be done without a filesystem. I'm not sure why you seem to think filesystems are magical. A filesystem is just a set of routines that handle the interface between the programs using filesystem calls and the raw block device. But there is nothing stopping a program from sending TRIM commands directly to the device. I use TRIM in my tests without a filesystem (hdparm is useful for that).
By the way, it seems to me that we have lost track of the important point here, which is that tests like we are discussing, without a filesystem and with sustained heavy write loads, are NOT meant to be given much weight by people with typical consumer workloads who are comparing SSDs. However, that does not mean that the test is irrelevant or a waste of time. When the test results are used correctly (as I explained earlier), in combination with results from other real-world tests, they provide a useful addition to our knowledge about each SSD.
Please don't assume that I am arguing that these tests are the only tests that should be done, or even that these are the most important tests. They are not. But they are also NOT irrelevant.
JellyRoll - Wednesday, December 12, 2012 - link
Yes, you can manually send a TRIM command via a number of techniques. However, this is no substitute for real time TRIM.For instance, when replaying a trace (such as those used in the bench here) the filesystem will be tagging data as deleted, and issuing TRIM commands in real time. Then it is up to the SSD to process these commands.
This makes a tremendous impact upon the performance of the SSD, as it is intended to. Replaying traces without this crucial element leads to disastrously wrong results. Different SSDs handle TRIM commands better, or worse, than others. As a matter of fact some SSDs have had to get 'exceptions' from Windows because they do not handle the TRIM commands within a specified time range as dictated by spec (SandForce). So there is much more to TRIM than meets the eye, and it has a tremendous impact upon performance. Otherwise it simply would not exist.
Running traces that are recorded with the benefit of TRIM without the benefit of TRIM is what you end up with.
It has already been publicly acknowledged that anands trace testing does not have the benfit of TRIM, of course it doesnt, the SSDs do not have a filesystem for deletion issuance and command.
SO yes, irrelevant incorrect trace testing.
Yes, irrelevant and incorrect 'consistency testing' of consumer SSDs which are designed to operate with TRIM, and state such in their specifications. Pointing out errata on consumer SSDs revealed outside of intended usage is irresponsible.
jwilliams4200 - Wednesday, December 12, 2012 - link
In its default configuration, Windows does not issue TRIM commands for deleted files until you empty the trash (or the size of the trash exceeds some amount). So any trace that is "issuing TRIM commands in realtime" is not especially realistic. Besides, some write-intensive workloads do not delete files at all, so there would not be any TRIM commands to be issued (eg., some types of database files, some types of VM files)Your problem is that you are making assumptions about how SSDs will used, and then saying any usage that does not follow your assumptions is irrelevant. As long as you continue to do that, you will continue to be wrong.
JellyRoll - Wednesday, December 12, 2012 - link
These are not assumptions: it is the intended market. These SSDs are designed and sold in the consumer market. period.There is a separate class of SSDs designed for these workloads, and they are designed and sold in the enterprise market. Thus the distinction, and name : "Enterprise SSDs".
That is not hard to figure out. There are two classes simply due to the fact that the devices are tailored for their intended market, and usage model. I do not know how to explain this in more simple terms so that you may 'get it'.
Once you begin speaking of the lack of TRIM commands in VM and database files surely something 'clicks' that you are beginning to speak of enterprise workloads.
Consumer SSDs=designed to work in consumer environment (thus the name) with TRIM and other functionality.
jwilliams4200 - Thursday, December 13, 2012 - link
If SSDs were not to be used for heavy workloads, then the warranty would state the prohibited workloads. But they do not, other than (most of them) giving a maximum total amount TB written.The fact is that SSDs are used for many different applications and many different workloads.
Just because you favor certain workloads does not mean that everyone does.
Fortunately, not everyone thinks like you do on this subject.
JellyRoll - Thursday, December 13, 2012 - link
I would ask you to explain why there is a difference between Enterprise class SSDs and consumer SSDs.If there is no difference in the design, firmware and expectations between these two devices then I am sure that the multi-billion dollar companies such as Google are awaiting your final word on this with baited breath. Why, they have been paying much more for these SSDs that are designed for enterprise workloads for no reason!
You would think that these massive companies, such as Facebook, Amazon, Yahoo, and Google with all of their PHDs and humongous datacenters would have figured out that there is NO difference between these SSDs!
Why for their datacenters (which average 12 football fields in size) they can just deploy Plextor M5s! By the tens of thousands!
*hehe*
JellyRoll - Wednesday, December 12, 2012 - link
Also, operating systems and applications that delete data, such as temp and swap files, that do not end up in the recycle bin are subject to immediate TRIM command issuance. These files, thousands of them, are deleted every user session and are never seen in the recycle bin. Care to guess how many temp and cache files Firefox deletes in an hour?jwilliams4200 - Thursday, December 13, 2012 - link
So now every SSD user must run a web browser and keep the temporary files on the SSD?Who named you SSD overlord?
skytrench - Thursday, December 13, 2012 - link
Though I side with JR, in that the published IO consistency check doesn't show anything useful, modified it might have been an interesting test...(different blocksizes, taking breaks from only writing, maybe a reduction of load to see at which level the drives still perform well etc.etc).I find the test disrespectful to Plextor, who after all have delivered an improved firmware, and Anand to come out of this one lacking in credibility.
JellyRoll - Thursday, December 13, 2012 - link
The point is that since the files are subject to immediate issuance of a TRIM command and this cannot be done when testing without a filesystem, is that you cannot attempt to emulate application performance with trace-based testing and the utility they are using. This doesn't just apply to browsers either, this applies to ALL applications. From games, to office docs, to Win7 (or 8), or a simple media player.All applications, and operating systems, have these types of files that are subject to immediate TRIM commands.
gostan - Tuesday, December 11, 2012 - link
Hi Vatto,possible to get the samsung 830 through the performance consistency test? I know it's been discontinued but it's still available for sale in many places.
Cheers!
Kristian Vättö - Tuesday, December 11, 2012 - link
I don't have any Samsung 830s at my place but I'll ask Anand to run it on the 830. He's currently travelling so it will probably take a while but I'll try to include it in an upcoming SSD review.lunadesign - Tuesday, December 11, 2012 - link
WOW! My jaw hit the floor when I first saw the 1st graph.I totally understand that the consistency test isn't realistic for consumer usage. But what about people using the M5 Pro in low-to-moderate use servers? For example, I'm about to set up a new VMware dev server with four 256GB M5 Pros in RAID 10 (using LSI 9260-8i) supporting 8-10 VMs. In this config, I won't have TRIM support at all.
If I understand this consistency test, the drive is being filled and then its being sent non-stop requests for 30 mins that overwrite data in 4KB chunks. This *seems* to be an ultra-extreme case. In reality, most server drives are getting a choppy, fluctuating mixture of read and write operations with random pauses so the GC has much more time to do it work than the consistency test allowed, right?
Should I be concerned about using the M5 Pro is low-to-moderate use server situations? Should I over-provision the drives by 25% to ameliorate this? Or, worse yet, are these drives so bad that I should return them while I can? Or is this test case so far away from what I'll likely be seeing that I should use the drives normally with no extra over-provisioning?
JellyRoll - Tuesday, December 11, 2012 - link
For a non-TRIM environment this would be especially bad. You are constrained to the lowest speed of each SSD. In practical use in a scenario such as yours you will be receiving the bottom line of performance constantly. The RAID is only as fast as the slowest member, and with each drive dropping into the lower state frequently, one SSD will surely be in this low range constantly. Therefore your RAID will suffer.These SSDs are simply not designed for this usage. They are tailored for low QD, and the test results show that clearly.
lunadesign - Tuesday, December 11, 2012 - link
Understood, if my server's workload were to be relatively heavy. But do you really think that my server's workload (based on an admittedly rough description above) is going to get into these sorts of problematic situations?'nar - Tuesday, December 11, 2012 - link
I disagree. RAID 5 stripes, as does RAID 0, so they need to be synchronized(hard drives had to spin in-sync.) But RAID 1 uses the drive that answers first, as they have the same data. RAID 10 is a bit of both I suppose, but I also don't agree that you think that the lack of TRIM forces the drive into a low speed state in the first place.Doesn't TRIM just tell the drive what is safe to delete? Unless the drive is near full, why would that affect its' speed? TRIM was essential 2-3 years ago, but after SF drives GC got much better. I don't even think TRIM matters on consumer drives now.
For the most part I don't think these "steady state" tests even matter on consumer drives(or servers as lunadesign has). Sure, they are nice tests and have useful data, but it lacks real world data. The name "steady state" is misleading, to me anyway. It will not be a steady state in my computer as that is not my usage pattern. Why not test the IOPS during standard benchmark runs? Even with 8-10 VM's his server will be idle most of the time. Of course, if all of those VM's are compiling software all day, then that is different, but that's not what VM's are setup for anyway.
JellyRoll - Tuesday, December 11, 2012 - link
GC still does not handle deleted data as efficiently as TRIM. There is still a huge need for TRIM.We can see the affects of using this SSD for something other than its intended purpose outside of a TRIM environment. There is a large distribution of writes that are returning sub-par performance in this environment. The array (striped across RAID 1) will suffer low performance, constrained to the speed of the lowest I/O.
There are SSDs designed for this type of use specifically, hence why they have the distinction between enterprise and consumer storage.
cdillon - Tuesday, December 11, 2012 - link
Re: 'nar "RAID 5 stripes, as does RAID 0, so they need to be synchronized(hard drives had to spin in-sync.)"Only RAID 3 required spindle-synced drives for performance reasons. No other RAID level requires that. Not only is spindle-sync completely irrelevant for SSDs, hard drives haven't been made with spindle-sync support for a very long time. Any "synchronization" in a modern RAID array has to do with the data being committed to stable storage. A full RAID 4/5/6 stripe should be written and acknowledged by the drives before the next stripe is written to prevent the data and parity blocks from getting out of sync. This is NOT a consideration for RAID 0 because there is no "stripe consistency" to be had due to the lack of a parity block.
Re: JellyRoll "The RAID is only as fast as the slowest member"
It is not quite so simple in most cases. It is only that simple for a single mirror set (RAID 1) performing writes. When you start talking about other RAID types, the effect of a single slow drive depends greatly on both the RAID setup and the workload. For example, high-QD small-block random read workloads would be the least affected by a slow drive in an array, regardless of the RAID type. In that case you should achieve random I/O performance that approaches the sum of all non-dedicated-parity drives in the array.
JellyRoll - Tuesday, December 11, 2012 - link
i agree, but i was speaking specifically to writes.bogdan_kr - Monday, March 4, 2013 - link
1.03 firmware has been released for Plextor M5 Pro series. Is there a chance for performance consistency check for this new firmware?