Disaster recovery remains an ongoing challenge for storage professionals. Because, unlike many other areas of IT, circumstances surrounding DR have become even more complex and difficult to get a handle on in recent years.
First, the definition of a "disaster" has expanded to include almost any service interruption. Second, the causes of a disaster are many. There are the natural disasters that we all hear about (earthquakes, floods, hurricanes) and there are man-made disasters like cyberattacks, ransomware, accidental user error or corporate sabotage. Third, the amount of data we have to consider in case of a disaster has grown exponentially. And, fourth, user expectations for a rapid recovery have increased tremendously. Today, users expect little to no interruption in their key applications.
Meanwhile, as a backdrop to these difficulties, most organizations' DR budgets remain stagnant. As a result, many enterprises -- from the smallest to the largest -- are looking to their cloud disaster recovery plan to eliminate these problems while reducing overall spending on DR.
Backup, archive and DR in the cloud
Understanding the cloud is a service delivery model (see "What is the cloud?") helps identify which data protection processes and cases it's good and not so good for. Most data protection processes have three basic components: backup, archive and recovery.
Backup is the copying of data with the goal of having a version of that data available for recovery in the case of a failure or loss. Most organizations keep backups for a finite number of years (three to seven is typical), so leveraging the cloud for backup means renting the capacity to store each day's backup for at least that length of time. Depending on the amount of data you need to store, this may or may not be as cost-effective as doing it yourself.
Archive is the long-term retention of data. This should be the place where a final copy of data is stored -- ideally, limited to only two copies of each piece of data in two separate locations and, preferably, two formats. The stronger an organization's archive strategy, the smaller the backup footprint will be in terms of the amount of data protected and the length of time it takes to recover that data. Most organizations need to keep archive data for at least seven years, potentially much longer. Renting storage for decades isn't going to be cost-effective for most organizations, so the cloud may not be the right destination for archive data.
Recovery, specifically disaster recovery, almost always requires the most recent copy of data. Any recoveries outside of that "most recent" time frame should come from an archive, if at all possible. The cloud is a great location for DR, as only the most recent copy of data has to be stored there, and most -- if not all -- providers allow you to use their compute resources in the event of disaster. This saves on the cost of maintaining your own server and storage hardware at a remote DR site, all without adding too much to backup-disk capacity costs. Consequently, cloud-based DR, or disaster recovery as a service (DRaaS), has quickly become an ideal way to solve DR challenges.
What is the cloud?
First, let's be clear, the cloud isn't some magical place that solves all IT problems. It's a data center that has to struggle with the same laws of physics as private data centers. The advantage is that a cloud provider is an entire organization focused solely on delivering IT services with -- for the most part -- top-notch data center designs.
They also have the advantage of economies of scale, meaning the sheer quantity of IT products purchased gives cloud providers a significant cost advantage. Combine these economies with most heavy investments in process and automation, and you've got operational costs typically much lower on a per-unit basis.
Remember, the cloud is primarily an IT delivery model. Instead of buying IT products and services upfront, enterprises rent them on a periodic basis. The rental nature of IT services means that the cloud is very attractive for temporary use and less attractive for more permanent cases.
Let's take a look at the steps you should take and the things you should consider to develop the most effective cloud disaster recovery plan for your organization.
Step 1: Get data to the cloud
The first step in building a cloud disaster recovery plan is to get your data to your provider, of course, but also to remember to optimize costs by controlling how much data is stored there -- preferably just the latest copy. Also, while cloud data protection options are numerous, they can be typically broken down into two types: products that back up data and those that replicate data. The difference between them is in how data is stored.
Most cloud backup products store data in a proprietary backup format. In the event of disaster, data has to be extracted and moved into a format enterprises can access by a virtual machine. Most cloud backup products leverage an on-premises appliance that captures all backups first and then copies changed data to a cloud location. Nearly all replication products replicate data to the cloud, but -- by contrast -- store it in a native file-system format that's immediately accessible in the event of a primary-site failure. Customers can even choose to have this data stored on high-performance cloud storage to make "return to operations" even faster.
With backup or replication, there's the challenge of initial data seeding. It can take hours, days or even weeks to transmit the data that will create the cloud baseline that the backup or replication software will need to compare against. To speed up the process, some cloud vendors ship a ruggedized high-capacity NAS to the customer. The foundational data set is copied onto it and then the NAS is shipped back. Ideally, more cloud providers would use tape, which is easier to ship and more cost-effective for seeding, however. Once data is in the cloud, daily updates to that data generally happen easily, while technologies like compression, deduplication and changed block replication significantly thin the amount of data that has to move across the network.
There is a third method to move data to the cloud -- run production data in the cloud itself. This either involves using an on-premises cache so that local applications don't suffer latency delays or shifting the entire workload to the cloud itself. While the operational gains of placing both primary storage and secondary storage in the cloud may outweigh the costs, customers have to be comfortable with moving all their data to the cloud. And, although this method also eliminates the ongoing transfer of data to the cloud, you have to make sure your cloud vendor provides acceptable resiliency for the data it's storing.
Step 2: Declaring a disaster in the cloud
At least once in every IT professional's career, a disaster will occur. In fact, it is more likely today than ever since the definition of "disaster" has expanded from a data center wipe to an important application becoming unavailable. If a disaster is isolated, meaning that it only impacts one workload, then failing over both compute and data to the cloud provider is unnecessary and, frankly, unwanted. Typically, an organization is better off recovering on premises. Some cloud backup products can leverage their appliance in the on-premises data center to host both the fallen application's data and, in some cases, be the compute needed to drive a virtual machine version of that application.
Replication-based DRaaS products should be set up to have a local storage target in addition to the cloud. That way, the organization can recover locally or in the cloud.
For more extensive disasters, where your data center becomes unavailable, true failover to the cloud is required. Here, the first step after disaster is to start all the services in the cloud the high-priority applications require to operate, such as DNS and directory services, then all the servers that make up that application. Last, adjust the networking configurations so that users logging in can seamlessly access the now cloud-hosted application.
Obviously, testing the DR process before it's actually needed is critical to make certain all these steps work, especially the networking changes. It's also important to factor end-user conditions into the changes. They likely won't be at a central office and may be logging in from a virtual private network connection at a coffee shop or home.
The process is similar for enterprises that put all their data in the cloud but keep compute on premises. The new network routing issues are the same, but since the data is already in the cloud, you merely need to start the applications alongside of that data to get up and running.
Step 3: The return
At some point, you will want to leave the cloud and return to normal on-premises operations. The cloud exit is one of the more difficult aspects of leveraging a cloud disaster recovery plan, as all the techniques IT uses to facilitate the daily transfer of data to the cloud won't work here since there is no baseline available for comparison.
In most cases, you can restore data from the cloud, albeit slowly, while the provider continues to host your application. Once data's done transferring, perform a quick data sync and then switch operations back to your enterprise's primary data center. Depending on the amount of data to be transferred and available bandwidth, this transfer could take days or even weeks -- all while your organization is paying a surcharge on the compute it's using in the cloud.
DRaaS providers should offer the ability to mass ship data directly to a customer's new data center. As mentioned before, this can be accomplished through ruggedized NAS or tape. This would allow the baseline copy to be created much faster while affected applications still run in the provider's cloud.
The ordeal of return illustrates another advantage of the "already in the cloud" case. Since no data has to be moved, you can start applications in the new data center -- with data cached locally -- as systems begin to access data in the cloud. Only active data has to be copied on premises.
In terms of storage, the cloud is most valuable and cost-effective for disaster recovery. That's because DR storage is lower in capacity than other data protection products and seldom read, and the cloud provides access to compute that can leverage data for a quick restart of mission-critical applications in event of disaster. A cloud disaster recovery plan also eliminates much of the cost of an organization's DR strategy, as you only have to pay for compute resources when testing your DR plan or when an actual disaster occurs.
How the cloud enabled widespread DR plan adoption
What belongs in an AWS cloud-based DR plan
Enjoy a pain-free cloud DR service experience