Configuring-EBS-on-AmazonEBS-1

Why configuring RAID on your EBS volume will not help?

Amazon EBS is a persistent block storage solution where the data on EBS volume is replicated across multiple servers in different AZ’s to prevent the loss of data from the failure of any single component. This replication makes Amazon EBS volumes makes it more reliable. Normally, the customers who follow the guidance on Amazon EBS and Amazon EC2 product detail pages typically achieve good performance out of the box but there are certain scenarios where you need a higher network throughput with much better IOPS and one of the way people tend to achieve that is by configuring a software level RAID arrays on volumes. RAID is supported by almost all the operating system and ebs volumes, it is used to boost IOPS, achieve better network throughput and for higher fault tolerant storage.

Configuring RAID becomes pointless on EBS volumes for various reasons before jumping onto reasons it is important to know why RAID is used and few reasons for that are :


  • It helps achieve better IOPS
  • Improved redundancy
  • Lower costs

RAID is usually good option when you leverage maximum IOPS from storage, not so complicated to configure as well. However people tend not notice the other side of configuring RAID, the more problematic side :

  1. With RAID you can avoid single disk failure and improve performance but you never knew that when a RAID is corrupted it makes it nearly impossible to get it back to normal state. Know more about it here.
  2. A possible damage to RAID controller can take down your whole storage solution. However there are few companies who say that they can recover all your data, the fact here is that even if they succeed, it will take them hours to get it done and if that’s a mission critical application data than probably it will costs you more that just a few hours and If you don’t have backup then you’re probably out of service until your storage is up (usually the case with SMB’s).
  3. Its costs!, there are not a lot softwares which can help recover from such catastrophic issues of RAID and plus there is less even help for the same on internet.

Knowing this you might be wondering that these problems might come on a hardware RAID configuration, not an Amazon EBS volume!

Yes, you’re right. AWS provides different facilities in order to avoid such problems with features like :

  • Point-in-time snapshots : Even if you manage to screw up your RAID, it takes merely minutes to restore it back from a point-in-time snapshot. However you might lose out of some data depending of when the snapshot was taken. Know more about it here.
  • Uptime guarantee of 99.99% : It is rare that you might have power outage af AWS, having said that AWS guarantees uptime and availability of 99.999%. Know more about it here.
  • Low Annual Failure Rate (AFR) : Amazon EBS volumes are designed to be highly available and reliable and provide annual failure rate (AFR) of between 0.1% – 0.2%. This makes EBS volumes 20 times more reliable than typical commodity disk drives which have AFR of around 4%. For example : if you have 1,000 EBS volumes running for 1 year, you should expect 1 to 2 will have a failure with that it is important to know how one can increase performance of these volume in a much quicker and cheaper way.

All of these features pretty much proves why you should adopt cloud, but given that it is a fact that these features can only help recover quickly, they actually don’t solve the issue here. You won’t recover all your data that’s for sure, but you might recover maximum amount of it if you have proper backup strategy inline and given everything works good at AWS data centers.

So then coming back to the point why configuring RAID on your EBS volume will not help?

And the reason for that are as follows :

PIOPS : a reliable and less complicated way to achieve higher IOPS over RAID

Amazon EBS provides feature of Provisioned IOPS (PIOPS). It is highest performance EBS storage type option designed for critical, I/O intensive database and application workloads, as well as throughput-intensive database and data warehouse workloads.

On other side RAID is also used to achieve greater I/O performance than you can achieve with a single volume but there are certain disadvantages as well :

  • The combined performance of the striped volumes is limited to the worst performing volume in the set. Hence if one of the volume is degraded it will affect overall performance of RAID
  • Loss of a single RAID volume can result in a complete data loss for the array
  • It requires more bandwidth than normal if the data is written onto multiple volumes simultaneously
  • If RAID volume is rebooted for any reason. There might be potential loss of data as the Grub is typically installed on only one device in a RAID array, and if one of the mirrored devices fails, you may be unable to boot the operating system.

Want extra IOPS over base performance ??
Then switch to EBS optimized EC2 instances

Amazon EBS–optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for Amazon EBS I/O. This optimization provides the best performance for your EBS volumes by minimizing contention between Amazon EBS I/O and other traffic from your instance

  • They deliver dedicated bandwidth to Amazon EBS, with options between 500 Mbps and 12,000 Mbps, depending on the instance type you use.
  • General Purpose SSD (gp2) volumes are designed to deliver within 10% of their baseline and burst performance 99% of the time in a given year
  • Provisioned IOPS SSD (io1) volumes are designed to deliver within 10% of their provisioned performance 99.9% of the time in a given year

You can find out list of EBS optimized EC2 instance here.

In RAID you get to use maximum IOPS and incase in future you decide to upgrade volume  than there are all the chances that the performance might drop for period of time. This way it better to use EBS optimized EC2 instances gives a less complicated way to gain extra IOPS.

RAID exceptions

There will be some use cases where configuring RAID seems a viable option in that case it is important to know how large your RAID array should be and how many IOPS you want to provision. One has to do proper benchmarking in terms of IOPS required compared to storage size.

Few facts about configuring RAID on Amazon EBS :

  • Typically you can configure RAIDS such as  0, 1, 5 and 6; AWS supports all software level RAID configurations.
  • As per AWS, the ideal RAID configuration is RAID 0 and 1.
  • You can use different types of instance resources such as gp2, io1, st1, or sc1 volumes combined together in a RAID 0 configuration to use the accumulated available bandwidth for these volumes to achieve higher throughput and IOPS,

Be aware of unpredictable EBS disk I/O

Amazon EBS is known to have unpredictable I/O that is because of its shared tenancy, Moreover, the network that exists between your instances and your EBS volumes is shared with other customers hence this sharing of resource over network makes it will be slower compared to a normal hardware storage.

AWS has started to add in dedicated network connections for storage, to make EBS latency more predictable, On a dedicated storage system, by and large, latency and IOPS are strongly correlated. As the number of IOPS increase, latency is going to increase slowly until you saturate the storage bus or the drives themselves. At the point of saturation, pushing more IOPS simply creates a backlog ahead of the storage system (usually at the operating system layer).

On a shared storage system, the picture is a lot less clear as the behavior of other users of the system will have a direct impact on the latency you experience however this could be detected and avoided via proper monitoring and analysis of CloudWatch logs, selecting the right type of storage (general purpose ssd, piops etc ) and instance types.

Conclusion

Instead of configuring RAID which will acquire multiple volumes, higher bandwidth and a possible risk of data loss. One can use or upgrade their volumes to PIOPS storage type or instances to EBS optimized, It will cost little more than normal General Purpose SSD volumes but will be able to leverage maximum throughput with less risk.

References