Disaster recovery has been called a killer app for the cloud platform. Cloud-based disaster recovery (DR) offers some compelling advantages over legacy DR solutions, whether those traditional scenarios involve a dedicated IT infrastructure maintained in a secondary facility or just removable media carried off-site.
Cloud DR offers simplicity, faster recovery and lower costs, both in terms of infrastructure and administrative overhead. In short, leveraging the cloud as a DR platform can provide a better value than traditional methods, and it can actually make a highly effective disaster recovery solution attainable for many organizations, regardless of size.
This article defines cloud-based DR, examines the technologies that have made cloud DR (or DR as a service) such a hot topic and looks at different types of cloud DR solutions. We'll also discuss some of the details behind this technology that companies should be aware of before implementing one of these services.
A disaster is anything that causes unplanned downtime to a server or application that's important enough to warrant a DR plan in the first place. This means a disaster can be caused by an isolated hardware failure, data corruption on a storage system or an administrator accidentally shutting down a server. It doesn't require six feet of water in the data center or a tornado. In fact, most downtime events don't involve a site-wide disaster.
Recovery is the ability to restart applications and reconnect users and other applications to them -- not just restore data. Effective recoveries used to require redundant infrastructures but, thanks to virtualization, a second host that's available to run a copy of the critical virtual machine (VM) could now constitute a disaster recovery solution. There's a lot more to reliable disaster recovery than just standing up VMs in a cloud data center, but this technology can simplify the recovery process and potentially lower the cost of legacy DR solutions.
Cloud-based DR defined
Cloud DR can have several definitions, but for our purposes we're going to assume it involves a company running critical applications in its own data center and moving that data in the form of VM images to the cloud. The solution usually includes the ability to restart those VMs on the host infrastructure where they are stored.
Server virtualization has greatly simplified the recovery process. Encapsulating an entire server instance into a file or two has made restarting a VM almost as easy as restoring a VMDK or VHD file. While it may not satisfy the requirements of all companies, a bare-bones DR solution can be as simple as locating a version of a backed up VM and copying its associated files back to the original server or to a standby server.
Recovery in place
VMs can also be run "in place" by pointing a recovery server at the backup storage location. This is the functionality that effectively turned cloud-based backup into cloud DR and spawned the DR-as-a-service market segment. While it promises much faster recovery times than moving VM images back from the cloud, there are some potential issues -- mostly around latency -- that users need to be aware of (see the section on "Cloud backup becomes cloud compute").
What is DRaaS?
Disaster recovery (DR) as a service is essentially cloud DR as we've defined it. In fact, these two terms are used interchangeably by most vendors who want to establish as broad an appeal as possible. There can be some differences in how much infrastructure is rolled into the monthly payment, like whether the cloud provider is just renting cloud resources or also supplying an on-site appliance. But since cloud DR isn't a colocation kind of solution, where the user provides the equipment, they're all still selling DR packaged as a service.
Cloud DR service providers
There are two common approaches to cloud-based DR:
1. An existing backup provider that adds the option to store and run VM images in its cloud.
2. An existing cloud storage-, compute- or infrastructure-as-a-service provider that adds a disaster recovery service.
Cloud backup provider. Backup and disaster recovery overlap at some point, and both are needed for a complete data protection solution. For this reason, it's logical for an existing backup vendor that provides off-site storage to add cloud DR, as many services have. One common architecture involves the use of an on-site appliance, either a physical box or a VM, to control local backups and manage data transfers to the cloud. Many of these solutions offer physical-to-virtual conversion so they can provide data protection for bare-metal (non-virtualized) servers that need to be included in the DR plan.
Most of these service providers also offer the option of running VMs in the cloud, usually in their own cloud facility. However, the failover and failback processes vary by vendor, as does the sophistication of their cloud infrastructures. Many of these products are tailored for small to midsized companies with messaging that touts disaster recovery as a "one-click" solution. Obviously, buyers need to understand how these solutions work and the potential associated risks. That said, there are certainly some benefits to this approach.
The combination of backup and DR can simplify data protection and using an on-site appliance means a local copy of data is available for faster file restores and server recovery if needed. Since most downtime events involve a single server or application rather than a site-wide outage, having a local copy to recover from is a logical and efficient solution. A hybrid appliance simplifies data transfers with the cloud and assumes the overhead of this process. It also provides some options to improve failback by handling the synchronization with host servers.
Some of the many vendors that have added cloud DR to a data protection solution with an on-site appliance include Acronis nScaled, Axcient, Barracuda Networks, Datto, Quantum, Quorum and Unitrends. There are cloud backup solutions with disaster recovery options that don't involve an on-site appliance, but the hybrid method is the most common implementation and offers some significant advantages.
Cloud infrastructure provider. The other common approach to cloud-based DR involves services that typically run hosting or cloud-based storage and compute infrastructure businesses and have added a DR option. Some offer platform-specific solutions that integrate with the storage systems their clients currently use to leverage embedded replication features. But most offer generic cloud storage and compute services that support storing and running VMs from the cloud.
Their offerings vary widely, from a complete turnkey solution that's installed on-site and managed by the provider to simply supplying the "building blocks" a company needs to essentially create its own cloud DR solution. The specific architecture used depends on the service provider and the client company's environment, but most involve software running on a dedicated server or VM to handle data transfers with the cloud.
These providers typically focus on the higher end of the market, emphasizing the need for more than just a host in the cloud. Their message is that the quality of the cloud infrastructure shouldn't be taken for granted, nor should the need for engineering and support services, which most providers offer. Many strive to give users a seamless experience when an application fails over to the cloud, addressing "front-end" issues of getting users and other applications reconnected to the failed servers not just the "back-end" job of running stored VMs.
Some of the companies providing these services were active in the legacy DR market, providing redundant infrastructure solutions, but they have now embraced the advantages of VM-based disaster recovery. A small sampling of the vendors in this group include Amazon (through partners), Databarracks, Egenera, IBM SoftLayer, Rackspace, Seagate's EVault and Windstream. VMware has also set up a cloud DR service using its hypervisor-based replication engine to get VM images into its cloud.
Don't forget local disasters. While hurricanes and natural disasters grab all the headlines, it's far more likely a company will face application downtime from such mundane causes as hardware failure, corrupted software or human errors. For this reason, a cloud DR solution that includes an on-site storage component and the ability to provide LAN-based recoveries for failed servers can have considerable appeal.
Cloud backup becomes cloud compute. When a company needs to run an application in the cloud, its relationship with a cloud provider changes -- the cloud backup provider becomes a cloud compute provider. They need to understand what kinds of service-level agreements (SLAs) the provider offers and how long they can support running the company's applications. In the case of a regional disaster, a cloud backup provider's compute infrastructure may be quickly overwhelmed if multiple organizations simultaneously initiate recoveries. To ensure they get the service they're expecting, companies need compute-class SLAs and not just best effort assurances.
Don't forget reconnection. Restarting failed servers in the cloud is the first step in the recovery process, but it's not the only one. Users and other application servers also need to be reconnected to those VMs. Companies looking at cloud-based DR need to be aware of such details as networking, firewalls, port monitoring, intrusion protection and security if they expect to run production applications in the cloud while they fix their primary infrastructures.
Exit strategy. Finally, companies need to understand their exit process -- how failback works and how long it will take. The longer an application runs in the cloud, the longer it will take to synchronize with the primary server over a WAN. For large data sets, the solution may include shipping a storage appliance, but that process will still involve a re-sync to accommodate shipping time.
The bottom line on cloud DR
Cloud disaster recovery is an excellent use of current cloud and virtualization technologies. The growth of server virtualization, cloud services and hybrid backup solutions has made "real" disaster recovery a viable option for those companies that could never justify a traditional DR infrastructure. But consideration must be given to the details of a cloud DR solution, especially the recovery aspects, such as whether the provider offers SLAs, how the cloud environment handles reconnection of users and applications, and what the exit strategy is when the disaster has passed. Companies should also ponder a solution that includes an on-site appliance to provide recovery from local or limited disasters.
About the author:
Eric Slack is an analyst at Storage Switzerland, an IT analyst firm focused on storage and virtualization.
Why it's time to replace your DRaaS vendor