Custom Search

Raid Levels and Raid Types

Raid Levels and Raid Types
RAID, an acronym of Redundant Array of Independent (Inexpensive) Disks is the talk of the day. These are an array of disk to give more power, performance, fault tolerance and accessibility to the data, as a single storage system. It's not mere combination of disks but all the disks are combined providing standard MTBF (mean time before failure) reliability scheme; otherwise chances are performance would be affected drastically if disks are not combined as a single storage unit. 


Raid Levels


All the RAID types and models are commonly classified as RAID levels, since RAID represented by a higher number is regarded to be superior, more efficient, high-performance array than the low numbered RAID. 


Hence, high security feature of RAID also depends on the RAID level you are using. RAID arrays, not only, provide the users with maximum security and reliability but also make sure that if a disk fails no data is lost. The in-depth knowledge about RAID levels would help you through buying of RAID servers. 


Raid Types



  • RAID0 is simply data striped over several disks. This gives a performance advantage, as it is possible to read parts of a file in parallel. However not only is there no data protection, it is actually less reliable than a single disk, as all the data is lost if a single disk in the array stripe fails.
    RAID0 principles





  • RAID1 is data mirroring. Two copies of the data are held on two physical disks, and the data is always identical. RAID1 has a performance advantage, as reads can come from either disk, and is simple to implement. However, it is expensive, as twice as many disks are needed to store the data.
    RAID1 principles





  • RAID2 is a theoretical entity. It stripes data at bit level across an array of disks, then writes check bytes to other disks in the array. The check bytes are calculated using a Hamming code. Theoretical performance is very high, but it would be so expensive to implement that no-one uses it.





  • RAID3 A block of data is striped over an array of disks, then parity data is written to a dedicated parity disk. Successful implementations usually require that all the disks have synchronised rotation. RAID3 is very effective for large sequential data, such as satellite imagery and video.
    RAID3 principles
    In the gif above, the right hand disk is dedicated parity, the other three disks are data disks.





  • RAID4 data is written in blocks onto the data disks (i.e. not striped), then parity is generated and written to a dedicated parity disk.
    RAID4 principles
    In the gif above, the right hand disk is dedicated parity, the other three disks are data disks.





  • RAID5 data is written in blocks onto data disks, and parity is generated and rotated around the data disks. Good general performance, and reasonably cheap to implement. Used extensively for general data.
    RAID5 principles
    The gif below illustrates the RAID5 write overhead. If a block of data on a RAID5 disk is updated, then all the unchanged data blocks from the RAID stripe have to be read back from the disks, then new parity calculated before the new data block and new parity block can be written out. This means that a RAID5 write operation requires 4 IOs. The performance impact is usually masked by a large subsystem cache. 
    As Nat Makarevitch pointed out, more efficient RAID-5 implementations hang on to the original data and use that to generate the parity according to the formula new_parity = old_data XOR new_data XOR old_parity. If the old data block is retained in cache, and it often is, then this just requires one extra IO to fetch the old parity. Worst case it will require to read two extra data blocks, not four.
    RAID5 write overhead
    RAID 5 often gets a bad press, due to potential data loss on hardware errors and poor performance on random writes. Some database manufactures will positively tell you to avoid RAID5. The truth is, it depends on the implementation. Avoid software implemented RAID5, it will not perform. RAID5 on smaller subsystems will not perform unless the subsystem has a large amount of cache. However, RAID5 is fine on enterprise class subsystems like the EMC DMX, the HDS USP or the IBM DDS devices. They all have large, gigabyte size caches and force all write IOs to be written to cache, thus guaranteeing performance and data integrity.
    Most manufactures will let you have some control over the RAID5 configuration now. You can select your block stripe size and the number of volumes in an array group. 
    A smaller stripe size is more efficient for a heavy random write workload, while a larger blocksize works better for sequential writes. A smaller number of disks in an array will perform better, but has a bigger parity bit overhead. Typical configurations are 3+1 (25% parity) and 7+1 (12.5% parity).





  • RAID6 is growing in popularity as it is seen as the best way to guarantee data integrity as it uses double parity. It was originally used in SUN V2X devices, where there are a lot of disks in a RAID array, and so a higher chance of multiple failures. RAID6 as implemented by SUN does not have a write overhead, as the data is always written out to a different block.
    The problem with RAID6 is that there is no standard method of implementation; every manufacturer has their own method. In fact there are two distinct architectures, RAID6 P+Q and RAID6 DP.
    DP, or Double Parity raid uses a mathematical method to generate two independent parity bits for each block of data, and several mathematical methods are used. P+Q generates a horizontal P parity block, then combines those disks into a second vertical RAID stripe and generates a Q parity, hence P+Q. One way to visualise this is to picture three standard four disk RAID5 arrays then take a fourth array and stripe again to construct a second set of raid arrays that consist of one disk from each of the first three arrays, plus a fourth disk from the fourth array. The consequence is that those sixteen disks will only contain nine disks worth of data.
    P+Q architectures tend to perform better than DP architectures and are more flexible in the number of disks that can be in each RAID array. DP architectures usually insist that the number of disks is prime, something like 4+1, 6+1 or 10+1. This can be a problem as the physical disks usually come in units of eight, and so do not easily fit a prime number scheme.





  • RAID7 is a registered trademark of Storage Computer Corporation, and is basically RAID3 with an embedded operating system in the controller to manage the data and cache to speed up the access.





  • RAID1+0 is a combination of RAID1 mirroring and data striping. This means it has very good performance, and high reliability, so its ideal for mission critical database applications. All that redundancy means that it is expensive.





  • RAID50 is implemented as a RAID5 array that is then striped in RAID0 fashion for fast access





  • RAID53 applies this 'RAID then stripe' principle to RAID3. It should really be called RAID3+0. Both these RAID versions are expensive to implement in hardware terms





  • RAID0+1 is implemented as a mirrored array whose segments are RAID 0 arrays, which is not the same as RAID10. RAID 0+1 has the same fault tolerance as RAID level 5. The data will survive the loss of a single disk, but at this point, all you have is a striped RAID0 disk set. It does provide high performance, with lower resilience than RAID10.





  • RAID-S or parity RAID is a specific implementation of RAID5, used by EMC. It uses hardware facilities within the disks to produce the parity information, and so does not have the RAID5 write overhead. It used to be called RAID-S, and is sometimes called 3+1 or 7+1 RAID.





  • RAIDZ is part of the SUN ZFS file system. It is a software based variant of RAID5 which does not used a fixed size RAID stripe but writes out the current block of data as a varying size RAID stripe. With standard RAID, data is written and read in blocks and several blocks are usually combined together to make up a RAID stripe. If you need to update one data block, you have to read back all the other data blocks in that stripe to calculate the new RAID parity. RAIDZ eliminates the RAID 5 write penalty as any read and write of existing data will just include the current block. In a failure, data is re-created by reading checksum bytes from the file system itself, not the hardware, so recovery is independent of hardware failures. The problem, of course is that RAIDZ closely couples the operating system and the hardware. In other words, you have to buy them both from SUN.



  • 0 comments:

    You might also like: