Original Link: https://www.anandtech.com/show/2319
RAID Primer: What's in a number?
by Dave Robinet on September 7, 2007 12:00 PM EST- Posted in
- Storage
Introduction
The majority of home users have experienced the agony of at least one hard drive failure in their lives. Power users often experience bottlenecks caused by their hard drives when they try and accomplish I/O-intensive tasks. Every IT person who has been in industry for any length of time has dealt with multiple hard drive failures. In short, hard drives have long caused the majority of support headaches in standard desktop or server configurations today, with little hope of improvement in the near term.
With the increased use of computers in the daily lives of people worldwide, the dollar value of data stored on the average computer has steadily increased. Even as MTBF figures have moved from 8000 hours in the 1980s (example: MiniScribe M2006) to the current levels of over 750,000 hours (Seagate 7200.11 series drives), this increase in data value has offset the relative decrease of hard drive failures. The increase in the value of data, and the general unwillingness of most casual users to back up their hard drive contents on a regular basis, has put increasing focus on technologies which can help users to survive a hard drive failure. RAID (Redundant Array of Inexpensive Disks) is one of these technologies.
Drawing on whitepapers produced in the late 1970s, the term RAID was coined in 1987 by researchers at the University of California, Berkley in an effort to put in practice theoretical gains in performance and redundancy which could be made by teaming multiple hard drives in a single configuration. While their paper proposed certain levels of RAID, the practical needs of the IT industry have brought several slightly differing approaches. Most common now are:
RAID 0 - Data Striping
RAID 1 - Data Mirroring
RAID 5 - Data Striping with Parity
RAID 6 - Data Striping with Redundant Parity
RAID 0+1 - Data Striping with a Mirrored Copy
Each of these RAID configurations has its own benefits and drawbacks, and is targeted for specific applications. In this article we'll go over each and discuss in which situations RAID can potentially help - or harm - you as a user.
RAID 0
RAID 0 takes two or more disk drives and writes data in a "stripe" across each disk. Data is accessed by requesting the stripe from the array, resulting in the disks more or less simultaneously feeding their portion of the data back to the controller. The overall capacity of the array is equal to the sum of the formatted capacities of all drives, and disk usage is more or less spread evenly among all drives in the array.
The net result is that the system will see much faster sustained transfer rates for both read and write operations compared to a single drive. File access time, however, is not measurably improved by leveraging multiple disks in a RAID 0 set, which means that systems which require frequent access of small, non-contiguous files (as is often the case in desktop configurations) generally do not benefit from RAID 0.
RAID 0 is an excellent choice for video editing and large-scale "solving" applications, where large files need to be read and written in a continuous manner.
Perhaps the greatest drawback to RAID 0 is that the arrays are rendered inaccessible when a single drive in the array fails. In that sense, RAID 0 isn't actually RAID at all, as it lacks the "Redundant" part of the equation. Data reliability and retention is decreased exponentially as drives are added to a RAID 0 setup, so unless frequent backups are made - or if the data is not regarded as even remotely important - RAID 0 should be approached with caution.
Pros:
- Excellent streaming performance
- Maximum capacity available for users (sum of all disks)
- No redundancy of data
- Negligible performance benefits for many users
RAID 1 sits at the other extreme of the spectrum. It makes a continuous copy of all data from one disk (which is written to and read from by the system) onto another physical disk which is in "standby" mode. This "standby" disk is held in reserve by the controller for when a failure is detected on the first disk. At that point in time, the controller "fails over" to the second disk in the system, with all data still available to the user.
While RAID 1 usually offers no performance benefits (and indeed, it often slightly degrades performance in some situations), it does increase the uptime of the host computer by allowing it to remain online even after a disk in the system has failed. This makes it an extremely popular option for mirroring operating systems on enterprise-class servers, and for small office users without the need for massive amounts of data storage but a requirement for constant uptime.
Higher quality RAID 1 controllers can outperform single drive implementations by making both drives active for read operations. This can in theory reduce file access times (requests are sent to whichever drive is closer to the desired data) as well as potentially doubling data throughput on reads (both drives can read different data simultaneously). Most consumer RAID 1 controllers do not provide this level of sophistication, however, resulting in performance that is at best slightly worse than what would be achieved with a single drive. Software RAID 1 solutions also lack support for reading from both drives in a RAID 1 set simultaneously.
Pros:
- Redundancy of data
- Lowest cost data redundancy available (one additional disk)
- Simple operations make it easy to implement solution using software only
- Poor usage of drive capacity (only 50% of purchased hard drive capacity available)
- Typically no performance benefit over a single hard disk
RAID 5
In an effort to strike a balance between performance and data redundancy, RAID 5 not only stripes data across multiple disks, but also writes a "parity" block among those disks which allows the array to recover from a single failed drive in the set. This parity block is staggered from one drive to the next, resulting in each drive having either a portion of the data that is trying to be read, or the parity block which allows the data to be reconstructed. In this way, the array gains some performance benefits in having the data striped among multiple disks, while being able to stay online after the failure of a single disk in the array.
Rather than having a dedicated drive for parity as in the less-popular RAID levels 3 and 4, the parity sharing of RAID 5 allows for a more distributed drive access pattern, resulting in improved write performance and a more even disk wear pattern than with a dedicated parity drive.
In its optimal form, RAID 5 provides substantially faster read performance than a single drive or RAID 1 configuration, but write performance suffers due to the need to write (and sometimes recalculate) the parity information for the majority of writes performed. While this write performance is still faster than a single disk configuration in most cases, the true performance benefit of a RAID 5 array is in its read ability.
It should be noted that the read performance of a RAID 5 array improves as the number of disks in the array increases. This increase in disks, however, increases the odds of a disk failure in the array due to the law of averages, which results in a performance degradation during rebuilding operations. It also increases the likelihood of the entire array being unrecoverable if a second disk in the array fails before the first failed disk is replaced. In the image shown, the array contains a "hot spare" disk - in the event of a disk failure, this "hot spare" disk would be brought into the array to replace the failed disk immediately.
RAID 5 finds a comfortable home in most "read often, write infrequently" server applications which require long periods of uptime, such as web servers, file/print servers, and database servers. Dedicated RAID 5 controllers that include large amounts of RAM can negate much of the write performance penalty, but such setups are quite a bit more costly. Note also that simplified RAID 5 controllers exist that require the CPU to perform the parity calculations, which can result in write performance that is lower than a single drive.
Pros:
- Good usable amount of data (capacity is the sum of all but one drive in the set)
- Fault-tolerant - can survive one disk failure without impact to users
- Strong read performance
- Write performance (without a large controller cache) is substantially below that of RAID 0
- Expensive (either in terms of controller cost or CPU usage) due to parity calculations
RAID 6 attempts to address the most glaring of the RAID 5 issues: The comparatively large window in which the array is in a dangerous state due to a failed disk.
As in RAID 5, RAID 6 staggers its parity information across multiple drives. Its major difference, however, is that it writes two parity blocks for every stripe of data, which means that the array is capable of remaining accessible to the users even after having sustained two simultaneous drive failures. The advantage of a RAID 6 array versus a RAID 5 array with a hot spare risk is that no rebuilding is necessary to bring the last disk into the array in the event of a failure. In this sense, performance is more or less guaranteed after a single disk failure under RAID 6, whereas a significant performance hit occurs during the rebuilding required under RAID 5.
RAID 6's parity scheme is not simply multiple copies of the same parity information, but rather two different means of calculating parity information for the same data. This results in a much higher computing overhead than the already-intensive RAID 5 scheme, and a resulting increase in controller/CPU usage. This requirement for the second parity calculation and write, however, further adversely impacts the write performance of a RAID 6 array versus a RAID 5 solution.
RAID 6 is an excellent choice for both extremely mission-critical applications and in instances where large numbers of disks are intended to be used in the array to improve read performance. Because of this (and the poor write performance without special hardware), RAID 6 support is typically only included in high-end, expensive controller cards.
Pros:
- Fair usable amount of data (sum of all but two drives in the set)
- Provides more comfortable levels of redundancy for very large array sizes (8+ disks)
- Strong read performance
- Expensive (both in computing power, controller, and in additional "wasted" disks)
- Write performance is generally very poor compared to other RAID solutions
RAID 1+0 / 0+1
RAID 1+0 (10) or 0+1 attempts to get the best of all worlds: It generally provides the best read and write performance, as well as offering a level of redundancy for its data when compared to RAID 0.
Both RAID 0+1 and RAID 1+0 are considered "nested" solutions, which is to say they use RAID 0's data striping and RAID 1's mirroring capabilities. The difference between the two is that RAID 1+0 (10) creates a striped set from a series of mirrored drives while RAID 0+1 creates a second striped set to mirror the primary striped set.
In practice, the only reason an administrator would choose either RAID 0+1 or 1+0 (10) is in extremely I/O intensive operations which would bottleneck a RAID 5 or RAID 6 array, and where drive cost is not a major concern. The redundancy provided is in reality very low although RAID 1+0 offers better fault tolerance and rebuild capabilities than 0+1.
In an RAID 1+0 array all but one drive from each RAID 1 set could fail without damaging the mirrored data. However, if the failed drive or drives is not replaced, the last working drive in the set then becomes a single point of failure for the entire array. So if that single hard drive fails, all data stored in the entire array is then lost.
The RAID 0+1 array can operate if one or more drives (greater than 4 drives utilized) fail in the same mirror set, However, if two or more drives fail on either side of the mirroring set, then data on the entire array is lost. Also, once a failed drive is replaced, in order to rebuild its data all the disks in the array must participate in the rebuild. In the case of RAID 1+0, it only has to re-mirror the lost drive so the rebuild process is substantially faster.
Pros:
- Best performance available, as the system disk is essentially a RAID 0 array.
- Expensive in terms of drives.
- Usable storage space is only half of the total drive capacity.
- Only minimally fault tolerant.
In the IT world, some level of RAID is virtually guaranteed to be employed on any production server due to the relatively high failure rate of hard disks compared with most other components in the system. For end-users, though, the picture becomes far murkier. Most home computers occupy large amounts of time seeking from small file to small file, with the resulting speed limitation imposed by the physical mechanisms of the drive itself (rotational speed, etc). These limitations are not overcome even by the top-performing RAID 0. The only benefits, therefore, that users can seek in RAID are to increase overall capacity of their single drive, add a level of redundancy for their system, or to improve large-file performance.
The attraction of RAID for users seeking a large single drive is diminishing by the day, due to the massive single drive sizes on the market today. When a capacity conscious user can get a full terabyte of space in a single physical package, the argument becomes one of backing up said data, rather than seeing a 2TB drive on their system.
In the case of redundancy, there is most certainly an argument for taking advantage of the RAID 1 feature found on many motherboards (and even in most operating systems). As stated previously, most users have experienced a hard drive failure at one point in their lives, and as more of our daily work shifts to a computing platform, data integrity is becoming increasingly important. More to the point, however: Should users be more worried about backing up their data to removable media on a periodic basis to protect against the accidental deletion or corruption of data, or in keeping their machine up and running when a complete failure occurs?
This type of question can only be answered by the individual user themselves, and depends on the nature of data being stored on the system. We recently provided a first look at Windows Home Server, which may prove to be a far more compelling backup solution than any form of RAID. That does require the use of an entire computer, but the user-controlled data mirroring, volume shadow copy, and the ability to support multiple systems certainly make it a viable alternative in households with multiple computers.
It also bears mention that redundant storage of data using RAID really isn't a sufficient backup strategy for most businesses, and some form of off-site storage of backups should also be considered. RAID can be useful in making sure that systems remain operational in the event of a hard drive failure, but other catastrophes -- flooding, fire, theft, etc. -- can still claim all of the data on a RAID storage device. If the data is truly important, saving periodic backups to a different medium and storing it at a separate location should be considered.
Large-file performance is likely the most compelling reason to adopt RAID in a home system. For video editing operations, bandwidth in write operations is an absolute must, and RAID 0 fills this need very well. Increasingly, however, hard drives are finding their way into new areas of the home - home theater PCs, PVRs, and home video archival systems are but a few of the "read-often, write-less but always needed" systems which could benefit from a solution like RAID 5 or even the more performance oriented RAID 5+1.
At the end of the day, anyone looking into a more elaborate storage solution owes it to themselves to consider the practical implication of the decisions they make. One size most definitely does not fit all in the world of hard drive storage and RAID, and the wrong choice can certainly be more harmful than helpful in this regard.
We would like to thank Adaptec for providing the charts utilized in our article today.