One of the concerns in disaster recovery is how to quickly and effectively execute a technology DR plan to get critical business systems up and running following a disruptive incident. Typical plan documents may have dozens or even hundreds of pages. In an emergency situation, when seconds count, it may take too long to find the right information, gather the right people and then execute the plan.
Options for recovering critical data systems and platforms are numerous, and the DR management software technologies in use are well-established and have proven to be effective. A prime example is "failover" software that monitors existing IT resources, such as servers, looking for possible problems. If a production system suddenly fails, the failover software detects that change in condition, updates the DNS records to point to an available device and redirects processing to the available IT asset.
This option, of course, assumes that backup IT assets are in place -- preferably in an alternate location -- and are configured so they can assume processing duties of the failed device.
The failover software should also be capable of "failback" so that when the disabled device is functioning again, the DNS can be updated to redirect production to the original device.
The above examples are fine for situations where one or two devices fail and no other production systems are affected. But, what happens if there's a massive disaster that damages or destroys an entire building, including offices, workstations, telecom systems, data systems, network access devices, storage devices and other IT assets?
Many useful solutions to this challenge are emerging from firms offering cloud-based DR products that are intuitive and easy to use and activate. Examples are EvolveIP, Axcient and Unitrends. Products can be designed to replicate all or part of an entire office and the supporting IT infrastructure so that it's possible to "recover" to a cloud-based office.
Figure 1 shows a more or less normal operating environment using cloud-based DR products. The cloud-based replicated IT environment is regularly updated so that the systems and data are always current.
Figure 2 shows what happens if the primary IT environment and the office areas are suddenly unavailable. A command to the cloud DR product initiates steps to redirect production activities to the cloud-based office.
This is a very simplified example, and assumes a few important factors: 1) employees have Internet access from home or alternative work areas; 2) all DNS tables and other relevant information are available and updated for redirecting service; and 3) IT staff has access to the Internet to remotely manage operations during the disruption.
What does this kind of product do for disaster recovery? First, it means that DR can now be a strategic part of IT operations. Cloud-based failover/failback products make it much easier to integrate DR into IT operations. Second, it means that traditional DR activities can be enhanced with more streamlined recovery and restoration processes, especially for larger-scale recovery scenarios. Third, it means that testing of technology DR plans can be greatly simplified.
Let's examine the testing aspect in more detail. When testing DR plans for data systems, the options range from a simple tabletop exercise to full-system failover and failback. Creation of a playbook or script for these tests is a critical part of a good test. The script documents the steps to take, the proper sequence of steps, the programming commands to enter and the expected outcomes. This is often the most important part of a data system test, because the recovery steps must be in the correct sequence, and the programming commands must be accurate. Otherwise, the test will fail, and in a real disaster such a failure could negatively impact the company.
Suppose we could automate the above activities in such a way that a test would be as simple as pushing a single button (or just a few buttons). While the idea of "simplified disaster recovery" may be a new term for many of you, it's closer than you may realize.
Assuming you can configure a cloud-based replica of certain critical IT assets or even an entire office, if DR management software to redirect production from the main environment to the cloud-based replica is configured and available, recovery could be almost instantaneous. You could recover and restart IT operations more quickly than you could relocate IT staff and employees to an alternate location. This, of course, assumes that production data files and databases can be immediately replicated to the cloud. It further assumes that sufficient network bandwidth is available to replicate the data assets in the cloud. It also assumes -- critically -- that you can spin up servers in the cloud.
Naturally, you'll want to give a product like this careful consideration, especially the costs of the cloud compute and storage and replication product you'll need to satisfy your recovery time objectives and recovery point objectives.
Your existing DR plans and procedures will need to be revised and rewritten. Your plans could become much simpler, especially the parts where you launch the recovery of data systems, data, databases, telecom services, data network services and other IT assets. Such improvements in the recovery process could improve the likelihood of a successful data system recovery.
With dramatic improvements in cloud technology and DR management software, the idea of one-touch technology disaster recovery is practically a reality. If your IT DR requirements include rapid failover and failback, potentially on a large scale, it may be just the right time to begin investigating the solutions described in this article.
DR plans can help achieve effective change management
Guide: Disaster recovery management
Risk management for DR planning