Each year, organizations spend significant amounts of money to develop and document a disaster recovery (DR) plan.
This effort is usually driven by a business requirement and "project" participants are typically filled with good intentions as they start diligently planning for disaster recovery. The end goal is to produce a comprehensive Standard Operating Procedure(s) (SOP) document that can be referenced in the event of a disaster to ensure a timely and successful recovery. Ideally, this routine happens yearly within organizations. However, many DR strategies are carefully planned once, and then not revisited for several years, a dangerous habit for organizations to get into.
Disaster recovery plans that are allowed to become out-of-date over time become useless. For many companies, a plan that is two or three years old will likely not reflect their current environment unless it was kept up to date and will be difficult to execute in the event of a disaster, especially if the people following it are not fully aware of some of the changes. The items that follow are typically where most DR plan problems occur. This list of items needs to be updated regularly in a disaster recovery plan..
Employees, suppliers, service providers, etc. can change frequently and sometimes faster than the technology we use. Wrong contact information will not automatically lead to a doomed recovery effort but can cause significant and costly delays with the recovery effort.
When using a third party for an alternate recovery site, the terms and cost of the service are usually tied to a hardware list that will be made available by the service provider at the recovery site within an agreed-upon amount of time in the event of a disaster. If the hardware configuration documentation is not maintained, the service provider will provide equipment according to the original list. Items on the original list may fall short of meeting the latest requirements and cause a failure to recover within the expected or required timeframe.
If the hardware configuration changes, so will the recovery procedures. Failure to update the site-specific recovery procedures can lead to additional delays or errors that are sometimes hard to resolve quickly. Recovery procedures are usually hardware and software dependent, so if hardware the configuration changes were not recorded, chances are the recovery procedures will be incorrect.
Data backup and restore method
Introducing new data backup and recovery technologies is not uncommon, especially in changing or growing environments. For example, adding a disk backup element to a previously all-tape solution is quite common practice. That said, failure to document the change or to ensure that the backup frequency, granularity and restore capabilities meet the requirements can introduce some gaps in the availability of data leading to potentially serious recovery problems.
Recovery requirements and priority
The recovery time objectives (RTOs) and recovery point objectives (RPOs) are dictated by business requirements but depend on the backup method and the type of hardware used. Undocumented changes to hardware configuration, the respective recovery procedures or data restorability can lead to a failure to meet RTO, RPO and recovery priority.
New systems or applications
As new systems are brought online, failure to include them in the disaster recovery plan with their respective recovery procedures automatically renders the plan outdated. This is a frequent omission with organizations that have already allowed their DR plan to collect a little dust.
Incomplete disaster recovery plans
Although an incomplete disaster recovery plan does not make it obsolete, it may render it unusable or unreliable in the event of a disaster. At that particular point, the difference between an outdated procedure and one that is missing becomes secondary to the problem.
What follows are a few simple tips to help keep your disaster recovery plan documentation current and usable in time of need.
- Assign ownership: Upon closing the project that was initiated to create the DR plan, ownership of that plan must be assigned to someone whose responsibility is to ensure it is maintained and updated. This role is typically that of a disaster recovery coordinator and should be assigned to a more senior resource less subject to task reassignment or employment changes. This person is responsible for making sure disaster recovery planning is an ongoing process.
- Regular testing: This is the easiest way to ensure that any undocumented changes or omissions in the plan are captured early. The disaster recovery plan must be tested upon completion and regularly from that point on or as major changes take place in the IT environment.
- Integration with change management: In order to capture changes to the environment that may affect the recovery procedures as they occur, disaster recovery planning must be integrated with change management. This can start with inviting the DR Coordinator to the change approval board meetings. And if your organization does not have a change management process in place; this would be a good time to implement one.
Many IT organizations have a tendency to rely heavily on the skills of their IT resources to figure out and work around environment changes that may not have been documented in the DR plan documentation if ever needed. Unfortunately, in these times of shifting and shrinking staffs, the IT resources assigned to the recovery effort in the event of a disaster may have little knowledge of how things once were set up or how they have changed. This can make the difference between a relatively smooth recovery and a long, painful and costly one that could have been avoided with a little due diligence through the development of simple processes.
About this author: Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.