Name: Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs
Item: Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs
Author: Anand Lal Shimpi

Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs

by Anand Lal Shimpi on 12/4/2012 12:00 AM EST

Posted in
Storage
SSDs
Intel
Samsung

Post Your Comment
Please log in or sign up to comment.

Comments Locked

134 Comments

Back to Article

jimhsu - Tuesday, December 4, 2012 - link
On a TRIM-ed system, is leaving, say, 20% of a partition as free space equivalent to shrinking the partition by that same amount? I would think so, but I'm not getting (subjective) performance like the 25% graph above even with leaving about that much free space available.
jimhsu - Tuesday, December 4, 2012 - link
Drive is a C300 / firmware 0007, BTW.
geddarkstorm - Tuesday, December 4, 2012 - link
On the last article someone was pondering the same sort of question. It sounds like that is not the case, and if the area is flagged as part of a partition, then it is not treated as spare area; hence the need to turn that area into its own partition, format it, and then delete it.

That's what it sounds like to me, and from the testing done in this article.
geddarkstorm - Tuesday, December 4, 2012 - link
Ok, reading more comments, it seems if you have TRIM (i.e. Windows 7/8), then simply spare space even on your partition will count as spare area. Hm, be interesting to test that assumption just in case, if it hasn't been already.
dananski - Tuesday, December 4, 2012 - link
I would have thought the free space on a partition is free to use as spare area as the drive needs. It sounds like the partitioning is just to stop you accidentally filling it up more than you need.

Are you looking at the linear-axis graphs? They show you can still get quite a variation in performance despite the 25% spare area, but it's still a tighter range and overall faster than 12%. What would make a big difference is how your C300 controller makes use of this spare area. I'd certainly like to see these graphs for a wider spectrum of SSDs, though I understand that it all takes time.

Anand, could we see more drives? Also could you look into how RAID 0 affects IO performance consistency? :-)
zyxtomatic - Wednesday, January 16, 2013 - link
Your partitioning comment is exactly what I was just thinking as I read this. How does the drive controller *know* that the extra unpartitioned space is available for use as spare area? There must be something more going on that I'm not understanding.
dcaxax - Monday, January 21, 2013 - link
The Samsung SSD software (Samsung Magician) creates spare area on your drive by shrinking the existing partition by 20% and leaving the free space unallocated.

This suggests that the spare area used is not free space from the partioned area, but only unallocated space.

Though it is not 100% clear if allocated but free space is also usable, it seems unlikely.
Flying Goat - Thursday, December 6, 2012 - link
I think this is unclear. If the entire drive was originally in use, and then you repartition it, I assume the partition manager would have to be smart enough to TRIM the entire drive when the partition was deleted.
mayankleoboy1 - Tuesday, December 4, 2012 - link
For normal consumer usage, the Samsung840 Pro is the best, but it performs the worst of all here.
The Corsair Neutron is worst of all three in desktops usage, but performs the best here.
(not considering Intel, as its not a consumer SSD)
dishayu - Tuesday, December 4, 2012 - link
That's a stunning revelation to me. I had no clue SSD i could increase the I/O performance manyfolds by simply increasing the spare area. Thank you Anand and jwilliams4200.

I need around 90 gigs on my system SSD. So, i guess i should just create a single 128GB partition on my 256GB drive and then extend the partition using Partition Magic Pro or something, in case i absolutely NEED more space in the future. That sounds like the best way, please correct me if i'm wrong.
dishayu - Tuesday, December 4, 2012 - link
Quick question. Why does one graph use logarithmic scale and the other one linear scale? That makes it look like Write IOPS are affected more than Read IOPS when in fact the impact is pretty similar.
dishayu - Tuesday, December 4, 2012 - link
Okay, nevermind, got my answer. And I really should learn to read more carefully. :|
B3an - Tuesday, December 4, 2012 - link
I'm not sure if thats what the article is saying... But either way i'd also like to know for sure if simply partitioning an SSD to have about 75% of usable space will increase I/O performance (even over a new drive with hardly anything on it).

Think the article needs to make this more clear...
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
Updated with more info on this. In short, yes with the controllers used here that's all you need to do to allocate more spare area.

Take care,
Anand
smpltn - Tuesday, December 4, 2012 - link
If I wanted to increase the spare area on a drive, should I secure erase and reinstall the Windows, or is simple shrinking the partition through Computer Management enough?
smpltn - Tuesday, December 4, 2012 - link
*simply
twtech - Tuesday, December 11, 2012 - link
You could do that, or you could just set aside 25% from the beginning and not worry about it after that. The delta between 25% and 50% spare area is relatively small; even 20% is probably enough.
mayankleoboy1 - Tuesday, December 4, 2012 - link
Any chance of doing the same testing for a SF 1xxx and 2281 arch based SSD's ?
MadMan007 - Tuesday, December 4, 2012 - link
I was just going to write this. They might not be the newest, sexiest controllers, and the SF 1xxx is getting old at this point, but there are a LOT of them out there and the SF 2xxx in particular may be the most widespread controller because so many drives use it. SF 2xxx pretty please (and sugar on top if you do one with standard firmware and an Intel too)? :)
Notmyusualid - Tuesday, December 4, 2012 - link
I'll 3rd that.
Tjalve - Tuesday, December 4, 2012 - link
I actually ndid similar testing on a Vertex 3 120GB Drive for my SSD-Guide. I did this to explain how overprovisining works. Check it out. This graph shows how free space has an inpackt on the drives performance over time.
For you who dint understand swedish:
Minuter = Minutes
Tom = Empty
http://www.nordichardware.se/images/labswedish/art...
GullLars - Tuesday, December 4, 2012 - link
+1, should be tested. I only have SandForce in my laptop, and it's not avalible for testing. If anyone wants to bench and post their results (that won't be directly comparable to this test) it would be interresting. As SandForce uses encryption remember to use incompressible data. It could also be interresting looking at how much the compressibility of the data affects the sustained random write performance, but that's quite a bit more work as you add a dimention.
Per Hansson - Tuesday, December 4, 2012 - link
Yea, the Intel SSD 330 (Sandforce) was quite impressive in the previous performance consistency tests done, so please do test it with more spare area aswell :)
Tjalve - Tuesday, December 4, 2012 - link
Check out my post above
Kristian Vättö - Tuesday, December 4, 2012 - link
Testing Intel SSD 335 240GB as we speak. Any specific OPs you would like to see tested?
Khato - Tuesday, December 4, 2012 - link
Well just for comparison's sake the default and 25% OP numbers would be the most interesting. Definitely looking forward to seeing the results!
Kristian Vättö - Wednesday, December 5, 2012 - link
Results for Intel SSD 335 240GB:

Default OP: https://dl.dropbox.com/u/7934241/Default%20OP.png
Non-logarithmic scale: https://dl.dropbox.com/u/7934241/Degault%20OP%20no...

25% OP: https://dl.dropbox.com/u/7934241/25%25%20OP.png
Non-logarithmic scale: https://dl.dropbox.com/u/7934241/25%25%20OP%20non-...
jwilliams4200 - Wednesday, December 5, 2012 - link
Did you use incompressible data for both the sequential write and the 4KQD32 random write? Are you sure it is incompressible?

With IOMeter, for example, there are three choices of data: repeating bytes, pseudo random, and full random. Only "full random" is actually incompressible.
Kristian Vättö - Wednesday, December 5, 2012 - link
Yeah, I did. I admit that the 25% OP graph doesn't make all that much sense, the IOPS should start at ~50K and not below 40K. I can try running the test again just to make sure.
Kristian Vättö - Wednesday, December 5, 2012 - link
Reran the test at 25% OP, looks like something was off in the first run.

25% OP: https://dl.dropbox.com/u/7934241/25%25%20OP_1.png
Logarithmic: https://dl.dropbox.com/u/7934241/25%25%20OP_1.png

I ran it for a total of one hour so here is the graph of the whole run:

https://dl.dropbox.com/u/7934241/25%25%20OP%201h.p...
Khato - Wednesday, December 5, 2012 - link
Thanks for the data, disappointing though it is. Really looks like all the more that providing the 25% OP does is delay the inevitable - the end result looks pretty much exactly the same between default and 25% OP, no?
Kristian Vättö - Wednesday, December 5, 2012 - link
Yeah, that seems to be the case. Other drives seem to benefit more from the increased OP.

Looks like I forgot to include the non-logarithmic graph, so here it is:

https://dl.dropbox.com/u/7934241/25%25%20OP%20non-...
Per Hansson - Thursday, December 6, 2012 - link
Thank you very much Kristian!
MasterYoda - Tuesday, December 4, 2012 - link
Could you please clarify the actual procedure for setting the amount of spare area on a drive? Is it as simple as leaving unpartitioned space on the drive? If yes, how does the controller know the unpartitioned space is not being used? I feel like I'm missing something.
gostan - Tuesday, December 4, 2012 - link
'With TRIM support even the partitioning step isn't really necessary" - ANAND THE BOSS.

If you are on Win 7 / 8, there is nothing you need to do. Just format the drive, leave about 25% of free space, and enjoy the performance :D
Kristian Vättö - Tuesday, December 4, 2012 - link
As gostan said, there is no need to partition the drive if you're running an OS with proper TRIM support. TRIM will make sure that if you have 25% free space on the drive, that 25% is actually empty. Without TRIM it's possible that you end up in a situation where you think you have 25% free but the drive is actually full (that depends on how your drive handles garbage collection and how heavy your workload is, but it's not a far-fetched scenario).
nathanddrews - Tuesday, December 4, 2012 - link
On a fresh Windows 7/8 install can you just format 75% of the space to a new partition and leave the 25% unformatted? Or must it be partitioned and formatted?

I assume if the 25% is formatted, you would disable the recycle bin, indexing, page file, etc. for that spare area?
extide - Tuesday, December 4, 2012 - link
I prefer to leave that extra bit un-formatted as it ensures there are never any writes to those LBA's.
nathanddrews - Tuesday, December 4, 2012 - link
So basically short-stroking the SSD. Nice.
JNo - Tuesday, December 4, 2012 - link
Nice analogy! (completely different mechanism behind it all I know but still...)
jwilliams4200 - Tuesday, December 4, 2012 - link
Excellent set of tests and well-chosen comparison SSDs! It is interesting to see how the Vector and Neutron compare to the 840 Pro.

One point I think is worth making is that highest minimum performance (or lowest maximum latency) is probably more important for most applications than is having a small variation in performance. In other words, if I had a choice between 40,000 IOPS minimum (say 45,000 average with +/- 5000 variation) and 30,000 IOPS minimum with less than +/- 1000 variation, I would choose the one with five times higher variation but 33% higher minimum IOPS. Yet another way to say it is that I value highest worst-case performance over performance consistency -- no reason to cut off the upside if there is already a good floor supporting the downside.

Only thing that I missed seeing in this article is the Vector at 50% (or perhaps 49% just to be sure to be under 50%). That is potentially interesting due to the way that SSD changes its behavior at 50% full.

One minor correction: On your second set of graphs, the 50% button is labeled "192GB" instead of "128GB"
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
Thanks for the inspiration :)

That is a very good point on the Vector. Tests on both sides of the 50% mark could be very interesting. I'll see if I can run some this week.

I also agree on your analysis of which is the more desired configuration. I've been pushing a lot of the controller vendors recently to take performance consistency more seriously. It sounds like at least two of them will have firmware based solutions to this problem within the next 12 months. Hopefully we won't have to make the tradeoff you're outlining anymore if this catches on.

Thanks for the correction as well.

Take care,
Anand
jwilliams4200 - Tuesday, December 4, 2012 - link
Yes, if you have time to run 49% and 51% on the Vector, I think that would be quite interesting!

One other thing that might be interesting (for all SSDs), if you have time. I noticed that you use a 1sec averaging time. In my data, I used a 0.1sec averaging time, since one of the SSDs I was testing has some variation that was only visible with a resolution of a few hundred milliseconds.

I wonder if the little "blips" that are visible in your graphs for the Vector and Neutron might have some structure that may become apparent if you increased your time resolution.
rob.laur - Wednesday, December 12, 2012 - link
"Only thing that I missed seeing in this article is the Vector at 50% (or perhaps 49% just to be sure to be under 50%). That is potentially interesting due to the way that SSD changes its behavior at 50% full"

As per Tweaktowns recent review of the 128GB Vector:

"Vector 128GB performs very well when half of the drive has data on it. This contradicts some of the statements made about the way storage mode works. If storage mode becomes active when half of the drive has data on it we would see a massive performance drop here, but as you can see, there is only a small drop between 25% and 50%"

Storage mode is only temporary and lasts for a couple of minutes and then the drive returns to its previous performance. It does not last forever if the drive is filled over 50%.
jwilliams4200 - Wednesday, December 12, 2012 - link
If you read the tweaktown review, it is clear that they are clueless about how storage mode works. If you check the comments on that review, you can see some discussion of how storage mode may actually work.
rob.laur - Wednesday, December 12, 2012 - link
Storage mode was explained by OCZ reps in depth on their forums as well and they said the same thing tweaktown's tests showed, that it is indeed a mode that lasts for a couple of minutes. Instead of doing tests on empty drives which is not a real world scenario, I think Anand needs to starting doing his tests with lets say 60% of data on the drives so we can really see which performs the best. Mixed reads/writes also is indicative of real world scenarios instead of all read or all write.
jwilliams4200 - Wednesday, December 12, 2012 - link
OCZ has never properly explained how storage mode works. For example, OCZ's vague explanation cannot explain these test results:

http://www.tomshardware.com/reviews/vertex-4-firmw...

As for testing SSDs with static data, I agree that is a useful thing to do.

But mixed read/writes is more problematic, because there are so many ways that reads and writes can be mixed in real workloads. I agree it is a useful test, but deciding exactly what type of mix (eg., percentage reads, block size of reads, whether the reads and writes are interleaved with 1 IO each, 16 IOs each, etc.) to run is tricky.
hpvd - Tuesday, December 4, 2012 - link
sorry i didn't get it in all details.
Could you give some concrete advice for the usage of the results from this great article in the following cases:
1) Windows, TRIM => simply make a 25% smaller partition than the size of the SSD? Or does only free space within a partition help?
2) Windows, No TRIM => ?
3) RAID0 (intel), TRIM => ?
4) Raid5, no TRIM => ?

Thanks very much!

Best regards

HPVD
Kristian Vättö - Tuesday, December 4, 2012 - link
With TRIM: Just make sure you have 25% free.

Without TRIM: Make the volume 25% smaller than the full size.
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
Just create an empty partition of roughly 25% of the NAND capacity on the drive and you're good to go. If you've got TRIM, go ahead and format that partition then delete it.

Take care,
Anand
Rick83 - Tuesday, December 4, 2012 - link
Hey, I'd be interested in getting more information on the underlying date, by ways of some very basic statistical analysis. I understand that you probably want to keep the raw data to yourself, but a quick mean/min/max/stddev/outlier plot/table would render this kind of data much more comparable, than clicking buttons to look at dots.

The same holds true for any kind of benchmark or measure - getting out that kind of info appears to me to be quite important.

I'm still not quite sure why so few (none that I know of) tech site's provide some simple stats on their benchmarking.
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
I was very close to publishing exactly what you asked for but kept it out for ease of presentation. I do believe the types of figures you're talking about are necessary for good comparisons with this sort of data going forward so you will see more of it.

We don't always include it because we don't necessarily have the source data that lends itself to producing what you're asking for. A lot of our CPU benchmarks for example are very repeatable and thus you don't see much deviation from the average. Things are different with this SSD data because of the way defrag/GC routines have to run.

Take care,
Anand
jwilliams4200 - Tuesday, December 4, 2012 - link
I like to see a normal probability scale cumulative distribution function (CDF) to get a good idea of the statistical distribution of SSD performance parameters. Combine that with a textbox showing number of data points, mean, and SD, and you can read off almost anything you want to know.

Here are a couple examples on a 256GB Samsung 840 Pro, one for 100% capacity utilization, and one for 80% (20% OP):

http://i.imgur.com/atHo2.png

http://i.imgur.com/Mvj71.png
brundleflyguy - Tuesday, December 4, 2012 - link
I'm still don't understand how to take advantage of this. Do I make the partition 75% of the available size, or do I make the partition 100% of available size. and monitor the space used to keep it under 75%?
gattacaDNA - Tuesday, December 4, 2012 - link
Based on the earlier discussions and these, if you are in the desktop arena and do not have a > high sustained workloads < hitting the SSD, then if you just ensure you leave 25-30% free on the SSD, you have nothing else to do. As the drive gets a "rest" it can handle the trim and cleanup etc..

Personally, I like this option b/c if I do need the area for a large file etc.. then I can use it temporarily. The key is keeping that 25-30% of size free.

However, if you have high sustained workloads hitting the SSD (as in these tests) then from earlier discussions, it would be best to only allocate 70-75% of the SSD's size to the working partition and leave the rest of the unpartitioned area as the extra spare work area on the SSD. jwilliams and anand keep us honest here.. Cheers.
edlee - Tuesday, December 4, 2012 - link
Great article, but I am not going to buy a 256gb ssd to only use 64gb. I have to use at least 40-50 percent.
ShieTar - Tuesday, December 4, 2012 - link
75% spare wasn't even mentioned in the article. What is your point?
edlee - Tuesday, December 4, 2012 - link
Whoop I misunderstood, I thought u wrote leave 75% spare space
jwilliams4200 - Tuesday, December 4, 2012 - link
In my comments, I usually refer to the percentage of LBAs being utilized, so I often talk about 80%, by which I mean that 20% of the LBAs are reserved for OP. Anand's article emphasizes the percentage of reserved space rather than the percentage utilization.
SunLord - Tuesday, December 4, 2012 - link
Given the results from the tests you ran the 3700s firmware is optimized to take full advantage of the spare area and while the other drives can and do make use of the fact there spare area it's not reserved spare area meaning less optimization is offered resulting in some what lower performance. I would imagine you'd see a negative impact on performance if you were to find away to use the 3700s spare area it's all about trade offs.
jabber - Tuesday, December 4, 2012 - link
I did some testing with HDTunePro a few months ago and found that performance smoothed out far more with more space given to spare.

I didn't increase it by much, just a few GB but it made a marked improvement.
designerfx - Tuesday, December 4, 2012 - link
wtf?

You compare everything else to show the flaws, but don't test the S3700 through the same gamut?

sigh.

If you ask me it's not a popularity thing, it's also an accuracy thing.Show what happens to the S3700 at 12/25/50%.
jwcalla - Tuesday, December 4, 2012 - link
Come on, man. It's like the point of the article whooshed right by ya... or you went right to the graphs without reading (the second paragraph, in particular).

The Intel comes out of the box with 30% spare area, which can't be changed. How would he test 12% and 25%? The whole point was to show that the Intel had such consistent performance largely because it had so much spare area. And if you compare the other drives at 25% to the Intel default, that seems to be the case.
JNo - Wednesday, December 5, 2012 - link
That's what I understood too. But if this is the case, why is the S3700 so expensive? Surely the recommendation is, even for enterprise, to buy a bunch of Corsair Neutrons and partition off 25%, get similar performance to the S3700 and save $$$ right?

Or are they getting something elsewhere with the S3700 - write endurance, throughput etc? Because if not, that natural conclusion would be that Intel haven't actually achieved much with the S3700...
JNo - Wednesday, December 5, 2012 - link
oh, and by the way, great work jwilliams4200 in seeding a whole anandtech article based on work and conclusions you were able to come to just through curiosity and by tinkering away at home (presumably)...
Kristian Vättö - Wednesday, December 5, 2012 - link
Intel S3700 uses eMLC NAND, whereas the others in this case use MLC NAND.
kozietulski - Wednesday, December 5, 2012 - link
the difference and important cost factor is - I belive - use of eMLC NAND in S3700. "e"-like-enterprise MLC is supposed to have endurance about 30k in comparison to regular (aka "c"-like-consumer) MLC NAND value of 3-5k.
Whats interesting single nand wafer may be source of both types of MLC dies. These dies best suited for enterprise usage are selected and - as far as I understand - programmed differently then consumer dies. The trick is - in short - to use more gentle, less stressful write sequence which is good for cells endurance but means also (so to say) lower Signal-to-Noise ratio for written data which translates into more vulnerability to fluctuations (both static and these dynamic resulting from disturbances from operations on nearby pages).
Net result is higher endurance at a cost of much lower retension of written data. cMLC is supposed to be able to keep data written at the end of its endurance life still readable for a year, for eMLC such period is measured rather in months if not weeks.
Denithor - Tuesday, December 4, 2012 - link
So how much difference does this equate to in real-life typical consumer-level use? Will we even notice the difference?
phillyry - Monday, December 10, 2012 - link
The moral of the story is: If your drive is starting to feel slower, free up some space.

Of course, if you're fanatical about performance then you should always just keep 30% free.
Lepton87 - Tuesday, December 4, 2012 - link
In random write Neutron at 192 drops to Intel performance level only a few times and maintain much better average performance and about the same worst case performance and you say that Intel behaves better in this test? So if a hypothetical drive draw a perfectly smooth line at half intel's average performance you would say it's better than intel because it's more consistent? I don't get your new-found fascination with this new Intel enterprise drive.
jwilliams4200 - Tuesday, December 4, 2012 - link
One thing to keep in mind is that Anand was using an averaging time of 1 second. That tends to smooth out the peaks and dips in performance. In other words, if an SSD dropped all the way to zero IOPS for 0.1 second but maintained 40K IOPS for the other 0.9 seconds of averaging time, then Anand's graph would show the blip in thoughput as a drop to 36K IOPS. But in reality the speed dropped to ZERO for 0.1 seconds.

I'm not saying that the Neutron necessarily drops to zero speed at any time. I'm only saying that it is POSSIBLE that it does, given what we know from the graphs in this article. More likely, it does not drop all the way to zero, but I would be willing to bet it does drop (for a fraction of a second) to a significantly lower speed than the graph might lead you to believe.
jwilliams4200 - Tuesday, December 4, 2012 - link
I'll add that it is possible that the Intel S3700 also drops to a lower speed for a fraction of a second, but it would need to be either only slightly lower, or for a very short time (less than a millisecond), or some combination in order for the graph to look as "blip free" as it does.
rrohbeck - Wednesday, December 5, 2012 - link
Yes, we should see the PDF of all latencies. Averaging doesn't make sense here.
jwilliams4200 - Wednesday, December 5, 2012 - link
Well, I wasn't actually trying to say that "averaging doesn't make sense here".

My point is that if someone is talking about worst case (or best case) performance, that person needs to have data with sufficient time resolution to actually resolve that worst case (or best case). Any time that there is a one-data-point blip in the data, it is an indication that there is insufficient time resolution to be certain that the event in the data is fully resolved. Good methodology is to then increase the time resolution and repeat the experiment until there are no longer any one-data-point blips.
cserwin - Tuesday, December 4, 2012 - link
So, this article finally prompted me to find out what the hell 'write amplification' is.

So, if you are operating in a condition where the controller is amplifying writes by a factor of 3 or 4, in addition to crappy write performance, would you also be reducing the logevity of your drive by a similar factor?
Kristian Vättö - Tuesday, December 4, 2012 - link
Yes, you would because when you think you're writing let's say 100MB, the controller ends up writing 300-400MB to the NAND.
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
Correct. But most of the estimates we do for lifespan assume fairly high write amplification to begin with so endurance concerns are still a non-issue for the vast majority of consumers.

Take care,
Anand
krumme - Tuesday, December 4, 2012 - link
Now this is great works !

What a pleasure to read
CK804 - Tuesday, December 4, 2012 - link
"Most client drives on the other hand only feature about 7% of their total NAND capacity set aside as spare area (256GiB of NAND, 238GiB of user storage). "

That's incorrect. SSD and HDD manufacturers report their products' total capacity in decimal while Windows and other operating systems report the total capacity in binary. This has been a standard practice for as long as I can remember.

(256GB * 10^9) / 1024^3 = 238.4GB
jwcalla - Tuesday, December 4, 2012 - link
What he said is correct though. If we look at the Samsung 256 GB drive for example, there is 256 GiB of NAND on the PCB (NAND capacities are in binary). But after the over-provisioning is put into place, only 238 GiB is available to the user. This 238 GiB translates to 256 GB, which is the advertised capacity.

Basically, NAND capacities are in binary, but they're using the bytes lost in the binary <-> decimal conversion as the over-provisioned spare area that's unavailable to the user, so they continue to advertise with the binary units.
jwcalla - Tuesday, December 4, 2012 - link
"so they continue to advertise with the binary* units."

*decimal. We need an edit button. :)
rrohbeck - Wednesday, December 5, 2012 - link
Easier said:
GB/GiB=(1024^3)/1E9=1.074.
jwilliams4200 - Wednesday, December 5, 2012 - link
Actually, that is reversed. GiB > GB
A5 - Tuesday, December 4, 2012 - link
"Consumer SSD prices are finally low enough where we're no longer forced to buy the minimum capacity for our needs. "

Not really? 256GB drives are still well over $200. That's a significant part of a PC build.
Kevin G - Tuesday, December 4, 2012 - link
The Samsung 840 Pro figure has two 192 GB buttons in the second table for selecting. One for 25% and 50% free area. One of those buttons should read 128 GB.
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
Fixed, thank you!

Take care,
Anand
Steve@mac - Tuesday, December 4, 2012 - link
Hi friends,

I believe I understood the results.
But as I'm working on the mac (where everything is mostly simple) I want to ask you:

HOW can I customize my SSD for the benefit ON THE MAC?

Is it enough to use disk utility and set up (after erasing) one single partion with only 75% of available space? Or do I have to do more?

Thanks for answers
mayankleoboy1 - Tuesday, December 4, 2012 - link
There is an app for that
Anand Lal Shimpi - Tuesday, December 4, 2012 - link
Yep that's all you need to do :)

Take care,
Anand
sonci - Tuesday, December 4, 2012 - link
Nice article,
I used to automatically defragment my HDD with diskeeper or similar tools, but not anymore with SSD,
this is an unnecessary step, am I correct?
I have 25% free of my Vertex2 SSD, is it ok, or I should reinstall w7, and partition only 75% of the capacity of my SSD?
mayankleoboy1 - Tuesday, December 4, 2012 - link
I too have a 256GB Vertex2. Would be interesting to know how to increase performanmce for older archs.
Tjalve - Tuesday, December 4, 2012 - link
I think some people are missing the point here. For client loads you wont see any diffrence at all. Theese tests are based on a full drive running a 4K Random write pattern at high QD. The only place that this scenario could happen in real life, is in a Server under high load for extended periods of time. Like a database server or a file server handling alot of clients at the same time.

So you dont have to manually overprovion your drive. However if you do, the overall WA will be lower and the drive will last longer. But on the other hand you could just try not to fill you drive all the way.
rrohbeck - Wednesday, December 5, 2012 - link
Exactly. As long as you have some main memory free, your OS will buffer writes and you'll never know in a desktop like environment.
Zink - Tuesday, December 4, 2012 - link
No wonder you have so many complaints about IO consistency Anand, you torture your MBP SSD to the max daily. I don't think there are too many other enthusiast who run right to the point of getting disk space errors. I'm not so sure there is really any problem with SSD consistency for desktop use. Wouldn't statistics on max response time help us understand if there are any noticeable slowdowns? All of these drives are providing 25k IOs during the worst one second periods when filled reasonably. Unless there are some ridiculously long pauses in there I can't see how that would noticeably affect any workload.
JellyRoll - Tuesday, December 4, 2012 - link
Thanks for the article, this illustrates a point that i have come across when testing several SSDs.
The Intel isnt that spectacular. The consistent I/O story is marketing hype, there are other enterprise SSDs that perform just as well, or much better.
One can even take consumer SSDs and reach the same level of performance with similar amounts of OP.
Will there be an adjustment to the glowing review of the Intel datacenter SSD? or perhaps some comparisons to other relevant enterprise SSDs so that users can see that in that space the Intel is merely run-of-the-mill, if not slower!, than other alternatives?
Lepton87 - Tuesday, December 4, 2012 - link

Will there be an adjustment to the glowing review of the Intel datacenter SSD?

Of course not, just as GTX680 is still the fastest card on the market according to ananad. It was already slower then 7970GHz with launch drivers, but anandtech was one of a few sites to say otherwise.
Shadowmaster625 - Tuesday, December 4, 2012 - link
So is it better to set your partition to 75% when you install windows, or simply keep the full partition but never fill the drive past 75%? Is there any difference between these two methods?
Tjalve - Tuesday, December 4, 2012 - link
No.
M477 - Tuesday, December 4, 2012 - link
These test were all 256 GB drives. But if you have a 512GB drive, do you still need to leave 25% free (128 GB), or is it the amount of space (approx 64 GB) that is important?

I would have thought it is the amount/size of OP space that would be important - not the %.

Might it also explain why advertised drive speed improves with capacity (all with the same 7% OP).
phillyry - Monday, December 10, 2012 - link
Very good point. I second that question.
Death666Angel - Tuesday, December 4, 2012 - link
The paper you link to is gone, I get a 404 error with both links :).
rigel84 - Tuesday, December 4, 2012 - link
...but reliability means even more. Just looking at that OCZ brand makes me think of massive return rates.
I was wondering with your name and influence, can't you once in a while (or in an article), include some data from a retailer like behardware.com does on return rates?

http://www.behardware.com/articles/881-7/component...
Chris_DeepBlue - Tuesday, December 4, 2012 - link
Hi Anand

To add a plextor M3Pro to this test?
I bought 2 of those (256gb each) which I'll be using on a stripe band configuration as soon as I get them and this kind of testing is just great to see how they fare at each capacity level and the minimum performance that can be expected.

Thanks!
Kristian Vättö - Wednesday, December 5, 2012 - link
My M3 Pro is currently used in a system but I can test the M5 Pro.
CougTek - Thursday, December 6, 2012 - link
Please test the Plextor M5 Pro. We hesitate between it and the Corsair Neutron GTX for our next server build.
unityole - Wednesday, December 5, 2012 - link
well good article, although it'd make sense not to use a SSD to it's maximum capacity, why not do actual test runs on these percentage fill? rather than just IOPS, do tests like running a game, encoding videos and see how long each take should be more practical right?

As for sandforce firmware broken and this article, was already covered by Tweaktown like.. half a year ago lol. They have been going on about Trim not working and fill tests showing vantage score on their website for as far as April iirc I'm more surprised Anand only picked this up now =/

but thumbs up, better late than never, such as SSDreview which doesn't do tests on a filled drive at all.
Tjalve - Wednesday, December 5, 2012 - link
http://www.nordichardware.se/images/labswedish/art...

This is somthing that i did for my SSD-Guide. It shows the diffrence in speed compared to hos mutch OP you have and then filled the drive with data. This data is based on a vertex 3 120GB. i also compared the WA.

Free: Write Amplification
0GB 13,22x
1GB 10,78x
2GB 9,40x
4GB 7,41x
8GB 5,46x
16GB 3,82x
empty 1,53x
0GB (compressible) 1,59x
khanov - Wednesday, December 5, 2012 - link
This is very interesting data Tjalve.

I think what makes the Anandtech article so popular now is that it shows the effects of over-provisioning beyond what you have done. Your data goes between 0GB and 16GB on a 128GB drive. That is a max. of 12.5% OP.

This Anandtech article shows that OP of 25-50% can have a more dramatic effect on performance, particularly on keeping minimum IOPS at a reasonably high level.

So I guess people asking for more data re. Sandforce drives really want to know if the sweet spot for their drives is more like 25% or higher, as it seems to be for drives in this article.

Personally I would love to see some data between 25% and 50% OP as the sweet spot for at least some drives may be 30% or 35% for example.

Given that 50% OP is a bit drastic for most consumers, it would be great to know that, for example, 33% OP can give you 99% of the performance of 50% OP, if that is the case. Any chance of numbers between 25% and 50% Anand?
M477 - Wednesday, December 5, 2012 - link
Is it the % OP or the actual size of the OP that matters?

i.e. would you get the same performance boost on a 512GB drive with 12.5% OP (64 GB) as on a 256 GB drive with 25% OP (64GB) ?

If not, can someone explain...? (i.e. why do you need more OP space as the drive size increases?)
unityole - Wednesday, December 5, 2012 - link
actually, sandforce do better when data present on drive but with partially broken Trim and slower incompressible write speed.
kozietulski - Wednesday, December 5, 2012 - link
first, please remember we are taking here about performance for long term heavy (qd32) random 4k write load. It is really not that common case on desktop drives :). Periods of even heavy random writes which are short enough to not exhaust pool of clean blocks do not qualify here. Of course size of clean blocks pool depends directly on the absolute size of free space/OP. So one can say that absolute size of free space defines what is really "short period" of heavy write load and were sustained workloads start :)

Now why for such sustained workload what matters for performace is percentage of free space? It is so because - at least for modern SSD - under such worst case workload (again more likely to be found in some enterprise environments then on the desktop) the factor limiting performance is Write Amplification and WA value depends on average number of free pages you (well actually firmware not you nor me :) can find in each and every block in relation to total number of pages per block. The less of free pages in average block the more unwanted baggage must be written back to nand array together with actual data hence the higher WA value is.
Now if we assume random distribution of free space pages amongst all nand blocks (which is pretty natural assumption for sustained random write load) then by definition percentage of free pages per block (and thus WA) is equal to total percentage of free space on the ssd drive.
Thus general conclusion that given ssd drive with two times more of free space - including implicit OP - should be about two times faster (comparing to the same ssd drive of course) under our worst case random workload. Obviousely conclusion has a chance to holds true only as long as WA is the performace bottleneck - sooner or later, depending on drive/firmware details other bottleneck will pop up breaking that nearly linear relation between WA (and effective free space) and performance.
Tjalve - Wednesday, December 5, 2012 - link
As i said in an erlier poost. This scenario is VERY unlikley to happen in a client enviroment. Bithout doing synthertic benchmarks, like we do now, i would say that its impossible and will NEVER happen. This is only interersting if you plan to host your database server or similar, on client SSDs, Witch you shouldnt do in any case.

But regarding the data. I testetd between 0 and 16Gb as you said. But with the included OP of evrey drive its alot more then 12,5%.
Total NAND = 128GB or 137,4 GiB. That would mean somthing like 25GB och OP for that test, witch is 18%
gamoniac - Wednesday, December 5, 2012 - link
Anand,
This is what I call great journalism!

Would this be applicable to NAND storage on mobile devices that support TRIM either at the OS or controller level? It just seems to me my Android devices run quite a bit slower over time even though I have a lot of storage space left. I don't suppose Android or iOS support TRIM, do they? (And I assume WP8 does...)
jwcalla - Wednesday, December 5, 2012 - link
There's no TRIM in mobile NAND flash memory as the command is part of the SATA interface. In these cases, wear leveling and garbage collection is likely the responsibility of the embedded flash controller.
gamoniac - Wednesday, December 5, 2012 - link
Good point. Thanks.
skroh - Wednesday, December 5, 2012 - link
Apropos of nothing, the last article (besides causing me to sit up at the Intel 330 results) also inspired me to pull up the Intel Toolbox and look up my usage statistics. I have an 80GB X25-M G2 that I bought in May of 2010. I play a lot of games on that system. In the last 2.5 years of usage, my replacement blocks remaining is at 99% and my write cycles remaining is at 98%. Unless there is some hidden drop-off in the curve waiting ahead for me, this drive may last longer than I do!
Jerryrigged - Thursday, December 6, 2012 - link
In the test utilizing default memory allocation, once saturated why not reduce the write workload and see how well each drives garbage collection restores performance?
Jerryrigged - Thursday, December 6, 2012 - link
I did not get to complete that email.
During the reduced write workload, schedule reading. Some drives are impacted here, such as the Neutron. Do they recover?
Impulses - Thursday, December 6, 2012 - link
Any thoughts on how RAID 0 would impact any of this? (on motherboards that support TRIM over RAID) Same principle (leave 25% free or partition it off) or would RAID change things in any way?
AbRASiON - Thursday, December 6, 2012 - link
:/ very very interesting data - it's appreciated.
chrone - Friday, December 7, 2012 - link
Great article as always, Anand!

Quick question, will eMMC on smartphone or any flash drive chipset outhere benefit from this?
jameskatt - Sunday, December 9, 2012 - link
The biggest problem I have with SSDs is that if you keep it nearly full, it wears out FASTER, the FAILS.

SSDs have a limited lifespan. Each storage cell on an SSD can only be written to a few thousand times.

The problem is that if the SSD is nearly full, then the cells in the space left has a higher likelihood of reaching their write limit. When this happens, the entire SSD fails!

Thus, to keep the SSD from wearing out, it is best to keep 25% of it empty.

I backup to a regular hard drive on the same computer, ready to go, in case the SSD fails.

When SSDs fail, it s catastrophically sudden.
Impulses - Sunday, December 9, 2012 - link
None of that is in any way accurate... Like at all. First off, all SSD employ wear leveling, so even if you constantly keep it 80% or 90% full the write operations are still distributed across all cells and stuff is regularly moved around the drive to account for this.

You seem to be vastly overstating how much of a limited lifespan SSD have... Feel free to reef Anand's previous articles on the matter.

SSD don't generally fail suddenly and in catastrophic fashion when you're talking about cell decay either... They have SMART attributes you can monitor like on any hard drive, and even once the write limit for a cell has been reached you should still be able to read from it.
skroh - Tuesday, December 11, 2012 - link
Read my post on the previous page of this thread. I have an SSD that I've been running as my main system volume for more than 2 years. According to the provided tools, I have used about 2% of its predicted life for read/write cycles. According to Anand, when these counters reach zero, they have a built-in safety margin of about 10% beyond that. Unless you are writing gigabytes per day, every day, to your drive, it will likely last longer than the motherboard you have it plugged into or the traditional hard drive you are backing it up to.

As for the drive dying much faster if you keep it full, Google "spare area." The drive manufacturers already reserve space on the drive to reduce the dreaded write amplification. Keeping a little extra room out of the space visible to you as a user is just common sense for several reasons, including performance, but thanks to the hidden spare area it doesn't have to be anywhere near 25%. In any case, under normal non-server workloads, both wear and performance problems are much smaller than you imagine.
batguiide - Sunday, December 9, 2012 - link
Share
a website with you ,
( socanpower。ca)
Believe you will love it.
laptop battery,CPU fan,AC power adapters.DC power adapters and laptop keyboard.
I bought two. Cheap, good quality, you can
go and ship with there.
scottkarch - Tuesday, December 11, 2012 - link
None of the RAID controllers on our servers support TRIM. Since the OS can't talk to each drive individually, I was wondering if you either know or could give me an educated guess about the following two scenarios.

1) Make a logical RAID volume but when partitioning in the OS, set aside 25% of space as unpartitioned

2) I vaguely recall seeing the option when making the RAID set, to use a % of the available space... This was in the raid controller, prior to OS intstall/config.

I've already done #1 on a test Citrix server with SSDs. I will have to see if #2 is an option on any rebuilds we do.

I'm wondering if this can be made to work with SSDs behind an older RAID controller... Thanks. Great article
scottkarch - Tuesday, December 18, 2012 - link
Sorry to bump. I'll try to ask a different way.

If putting SSDs behind a raid controller, does just making it part of a raid set somehow use all the available storage? Or, if you THEN only make a partition with 70% of the available space, would that unused 30% get treated as free by the disk and give the free blocks needed for garbage collection?

2 x 512GB SSDs in a RAID1 but only make a partition of 385GB

or

6 x 512GB SSDs in a RAID5 = 2.56TB logical volume, and make a single 1.7TB partition.

Would this accomplish the same thing as free space on bare drives?
jonjonjonj - Saturday, December 15, 2012 - link
i would have loved to see sandforce and marvell controllers included. id be willing to bet more people have a sandoforce or marvell drive then the 4 above combined. either way its an interesting article.
ssd_distri - Sunday, December 16, 2012 - link
You put the spotlight on intersting subject, but used a scenario that is pretty useless for 99,99% readers - the scenario is totally crazy from desktop user`s view standpoint.

How about:
- disclose the GB of data that the SSD is filled with for the Heavy and Light Anand SSD benches
- rerun the benches for each SSD with additional data loaded onto the SSD, so it is filled with default load / 50% / 70% / 80% / 90% (100% is pretty uninteresting imo)
- plot a chart for the benches like always, only make it bigger and on each result bar mark respective performance losses at higher SSD capacity usage.
- That way we can see how different SSD controllers manage workload vs increased capacity used in realistic user workloads.... ?

And should include SF2x and V4 for reference and maybe the TLC 840
olee22 - Tuesday, December 25, 2012 - link
I have this drive as system-boot drive for Windows 7.
Kingston SSDNow V Series SNV125-S2-64GB
http://media.kingston.com/support/downloads/MKD_10...

What is the good way to set it up regarding free space?
1. Format full free space, leave 20% empty all the time.
2. Partition into two areas (80% = 52GB, 20%=13GB), and only format the bigger space, and fill it as much as wanted, and leave the smaller area unformatted.

This SSD has no native TRIM, I use Diskeeper to make a garbage collection about every week.
olee22 - Tuesday, December 25, 2012 - link
I plan to upgrade to an Intel SSD 330 180GB, to be a system disk.

How shall I partition and format the SSD for optimal performance?

There's a lot written in the article and here on the forum, and I understand extra space is needed, but it's not clear how I should proceed with this drive. Thanks!
MauiHawk - Sunday, January 6, 2013 - link
I recently bought a 64GB SSD to use as an SRT cache drive. Though I don't think in a cache drive scenario write amplification is as big of a deal since SRT enhanced mode doesn't really help write speeds in the first place, I have read in some circumstances SRT can actually lower write performance if the SSD writes are too slow. Therefore I would like to make sure some space is being kept free on the SSD to make sure this doesn't become an issue.

But I suspect the SRT algorithm may keep some free space available on its own, even if you allow it to use the whole drive. I don't want to limit SRT to only 80% of the drive if it's only going to use 80% of what I give it anyway. But I can't seem to find any info on whether SRT will maintain any free space on its own, and if so, how much.

Does anybody know?
Ravenise - Thursday, January 25, 2018 - link
Samsung suggests with each Magician release to update their software to ensure optimal drive performance. There is a massive difference in I/O consistency between Samsung Magician software releases; all pertaining to changes made to Rapid-Mode. For example, Rapid Mode in Samsung Magician 5.1.0 had a much faster read performance and poorer write performance, by a factor of sometimes 500 mb/s or more on my system depending on the benchmark, whereas in 5.2.0 reads and write performance are dramatically more equal. Why the dramatic change? Would not the degree of these changes have as much influence on SSD read/write life as say over provisioning? Either way, does this suggest I/O performance problems can be solved at a software level using rapid-mode like management, without the need for excessive over provisioning? Drives & operating systems that do not use similar kernel level enhancements as rapidmode however, cannot benefit from such enhanced i/o performance; and probably will benefit more greatly from over-provisioning. This also suggests that these tests should be redone using each consecutive release of Samsung Magician to compare and see if over provisioning really benefits drives in variation of Rapid-Mode software. This was confirmed on my SATA2 chipset. I am curious if results will differ on a SATA3 system.
Dathide - Thursday, December 3, 2020 - link
This article needs to be followed up with 17% vs 20% vs 23% filled! What is the sweet spot? It's obvious that 25% to 50% isn't much of a difference. Is it the same for 20% to 25%?

Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs

Post Your Comment

134 Comments

Back to Article

jimhsu - Tuesday, December 4, 2012 - link

jimhsu - Tuesday, December 4, 2012 - link

geddarkstorm - Tuesday, December 4, 2012 - link

geddarkstorm - Tuesday, December 4, 2012 - link

dananski - Tuesday, December 4, 2012 - link

zyxtomatic - Wednesday, January 16, 2013 - link

dcaxax - Monday, January 21, 2013 - link

Flying Goat - Thursday, December 6, 2012 - link

mayankleoboy1 - Tuesday, December 4, 2012 - link

dishayu - Tuesday, December 4, 2012 - link

dishayu - Tuesday, December 4, 2012 - link

dishayu - Tuesday, December 4, 2012 - link

B3an - Tuesday, December 4, 2012 - link

Anand Lal Shimpi - Tuesday, December 4, 2012 - link

smpltn - Tuesday, December 4, 2012 - link

smpltn - Tuesday, December 4, 2012 - link

twtech - Tuesday, December 11, 2012 - link

mayankleoboy1 - Tuesday, December 4, 2012 - link

MadMan007 - Tuesday, December 4, 2012 - link

Notmyusualid - Tuesday, December 4, 2012 - link

Tjalve - Tuesday, December 4, 2012 - link

GullLars - Tuesday, December 4, 2012 - link

Per Hansson - Tuesday, December 4, 2012 - link

Tjalve - Tuesday, December 4, 2012 - link

Kristian Vättö - Tuesday, December 4, 2012 - link

Khato - Tuesday, December 4, 2012 - link

Kristian Vättö - Wednesday, December 5, 2012 - link

jwilliams4200 - Wednesday, December 5, 2012 - link

Kristian Vättö - Wednesday, December 5, 2012 - link

Kristian Vättö - Wednesday, December 5, 2012 - link

Khato - Wednesday, December 5, 2012 - link

Kristian Vättö - Wednesday, December 5, 2012 - link

Per Hansson - Thursday, December 6, 2012 - link

MasterYoda - Tuesday, December 4, 2012 - link

gostan - Tuesday, December 4, 2012 - link

Kristian Vättö - Tuesday, December 4, 2012 - link

nathanddrews - Tuesday, December 4, 2012 - link

extide - Tuesday, December 4, 2012 - link

nathanddrews - Tuesday, December 4, 2012 - link

JNo - Tuesday, December 4, 2012 - link

jwilliams4200 - Tuesday, December 4, 2012 - link

Anand Lal Shimpi - Tuesday, December 4, 2012 - link

jwilliams4200 - Tuesday, December 4, 2012 - link

rob.laur - Wednesday, December 12, 2012 - link

jwilliams4200 - Wednesday, December 12, 2012 - link

rob.laur - Wednesday, December 12, 2012 - link

jwilliams4200 - Wednesday, December 12, 2012 - link

hpvd - Tuesday, December 4, 2012 - link

Kristian Vättö - Tuesday, December 4, 2012 - link

Anand Lal Shimpi - Tuesday, December 4, 2012 - link

Rick83 - Tuesday, December 4, 2012 - link

Anand Lal Shimpi - Tuesday, December 4, 2012 - link

jwilliams4200 - Tuesday, December 4, 2012 - link

brundleflyguy - Tuesday, December 4, 2012 - link

gattacaDNA - Tuesday, December 4, 2012 - link

edlee - Tuesday, December 4, 2012 - link

ShieTar - Tuesday, December 4, 2012 - link

edlee - Tuesday, December 4, 2012 - link

jwilliams4200 - Tuesday, December 4, 2012 - link

SunLord - Tuesday, December 4, 2012 - link

jabber - Tuesday, December 4, 2012 - link

designerfx - Tuesday, December 4, 2012 - link

jwcalla - Tuesday, December 4, 2012 - link

JNo - Wednesday, December 5, 2012 - link

JNo - Wednesday, December 5, 2012 - link

Kristian Vättö - Wednesday, December 5, 2012 - link

kozietulski - Wednesday, December 5, 2012 - link

Denithor - Tuesday, December 4, 2012 - link

phillyry - Monday, December 10, 2012 - link

Lepton87 - Tuesday, December 4, 2012 - link

jwilliams4200 - Tuesday, December 4, 2012 - link

jwilliams4200 - Tuesday, December 4, 2012 - link

rrohbeck - Wednesday, December 5, 2012 - link

jwilliams4200 - Wednesday, December 5, 2012 - link

cserwin - Tuesday, December 4, 2012 - link

Kristian Vättö - Tuesday, December 4, 2012 - link