Regardless of the type of disaster recovery solution an organization uses, testing is essential. It would be a...
tremendous leap of faith to simply assume that a disaster recovery solution works as advertised and is configured correctly. Thorough testing is a must. The question is how to go about that testing.
The only way to adequately protect your data is to have three copies. You need the original data, an on-premises backup (for quick recovery) and an off-premises (in this case, cloud-based) copy, so that you won't lose your backups if the data center is destroyed.
The reason I mention this is that this philosophy of data protection weighs heavily on the type of disaster recovery testing that you must do. If you are using the cloud as an off-premises backup solution to supplement your on-premises backups, then it usually means that routine restorations will be made from the on-premises backups. It also means that something must be really wrong if you are restoring from the cloud instead of from an on-premises backup. This could be something as simple as a bad tape, but it may very well be something of the magnitude of having your data center destroyed.
That being the case, it is extremely important to perform cloud recovery testing that simulates a real recovery following a major catastrophe. It is a good idea to do this testing from an isolated network segment, so that your production network is not visible to the recovery process.
There are a couple of reasons for doing this. First, you will want to make sure that the recovery testing does not in any way interfere with the production network. Second, following a real-world catastrophe, the resources on your production network wouldn't exist.
The goal behind your first test is to determine what it actually takes to recover your data from the cloud. As you benchmark the recovery process, you might very well discover that bandwidth limitations make it impossible to recover data from the cloud quickly enough to adhere to your service level agreements.
Obviously, Internet access is a requirement, but, depending on how you have backed up and secured your data, there may be other requirements as well. For example, I once saw a situation in which an organization was unable to recover cloud data because they lacked the necessary digital certificates. Not every organization protects their cloud backups in the way that this particular organization did, but it is vitally important to find out if any external components are required to facilitate a recovery (such as a certificate authority) before a disaster actually strikes.
Once you have confirmed that you are able to recover data from the cloud, the next type of testing that I recommend doing is performance testing. When disaster strikes, your bosses and the organization's customers will demand to know how long it will be before service is restored. On-screen progress bars are notorious for being inaccurate. The only way to really know how long a recovery will take is to do benchmark testing.
As you benchmark the recovery process, try using a variety of data types, because cloud backups tend to rely heavily on deduplication. Deduplication helps data be transmitted over the Internet more quickly than it otherwise could be. The problem is that some types of data deduplicate better than others. As such, you will likely find that some types of data can be restored much more quickly than others. You can use your benchmark testing results to develop a plan for the order in which data should be restored in the event of a real emergency. You could perform the fast restorations first to get as many resources online as possible before delving into the longer duration restorations.
If recovery from the cloud is too time-consuming, then the next logical step is to look for ways to make the recovery process faster. Some cloud providers, for example, will ship you a copy of your data on tape or on a removable storage device in an effort to expedite the recovery process.
You should check with your backup provider ahead of time to determine whether they offer such a service, what the service costs are and what the turnaround time is for receiving a physical copy of your data. It is also a good idea to make sure that the data is in a format that you can actually restore. For example, it does no good to receive a tape containing a copy of your data if you don't have a tape drive that can read the tape.
As you work to test cloud-based disaster recovery, be sure to work through a variety of disaster recovery scenarios. For example, you might start out by testing your ability to do bare metal recovery, but you should also test application-level recovery, file and folder recovery, and infrastructure recovery. Infrastructure recovery involves recovering infrastructure components, such as the Active Directory, DNS servers, DHCP servers and enterprise certificate authorities.
As you work through the various recovery types, you should be sure to document the recovery procedures, so that you don't have to resort to using trial and error during a real recovery. Different types of recoveries will inevitably require you to use different recovery procedures. Familiarizing yourself with and documenting these procedures will help to make the recovery process easier (and reduce the chances of making a mistake) in the event of a real disaster.
It is extremely important to verify the recoverability of your cloud backups before disaster strikes. The most effective way to accomplish this is through comprehensive testing that simulates a number of different disaster recovery scenarios.
About the author:
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.