What you will learn in this tip: Having a disaster recovery (DR) plan in place is essential to restore an IT infrastructure in the event of a disaster. But how much DR planning is enough? This tip offers some insight as to where the line should be drawn to avoid over-planning.
In business disaster recovery planning, a DR plan is created so that when a disaster strikes, critical business processes can still run. If a disaster strikes and you have no business DR plan, people scramble to remember how systems were configured. The interruption of critical business processes creates pressure to restart activity as quickly as possible to minimize the financial and/or public opinion impact on the company. But without clearly defined procedures, recovery time can be unnecessarily delayed and very costly. At the same time, not all applications and IT systems support critical business functions. Without trying to oversimplify, IT systems can be divided between critical, useful and nice-to-have. Critical systems are the focal point when planning for recovery; you can get around to recovering useful components once pressure starts easing a little and eventually you can take care of the nice-to-have applications once you have a functional IT environment
What should be included in your business' disaster recovery plan?
As surprising as it may sound at first, if your organization relies on systems to support highly critical, zero downtime processes, the supporting IT infrastructure should not even be part of a recovery plan. However, systems for which no downtime is acceptable do not need to be documented at length in a disaster recovery plan; they need to be made highly available and capable of failover to an alternate site. For example, imagine a securities company IT staff following the recovery documentation to restore their trading system on a weekday.
But not all companies rely on zero downtime IT infrastructures to support the business. Many companies have recovery time objectives (RTOs) in the eight- to 24-hour range for their critical applications.
Despite your environment's recovery time, these are some of the key components that should be documented in a disaster recovery plan. You should identify these services in your environment and make them a priority before application systems are recovered:
- See that the network is on top of your disaster recovery plan. Recovering IT systems without a functional network is painful and impractical.
- Data storage array configuration in a centralized storage environment.
- Authentication and name services (DNS), Active Directory in a Windows environment. Permission levels need to be restored to ensure timely recovery.
- Configuration-specific information for the data backup infrastructure. If critical data has to be restored from a traditional backup system, recovery of that system is even more critical.
- Recovery procedures for critical systems.
And note, the DR plan cannot be stored on any of the systems listed above.
What should not be included in your business disaster recovery plan
There are things that should not end up in a business disaster recovery plan; these things will only create clutter and unnecessary maintenance work. Some examples include:
- OS and application installation documentation. Focus on site-specific configuration as mentioned earlier; there is no value in rewriting the Windows or Oracle installation manuals.
- Eliminate call trees or notification phone lists if you can; these lists require too much maintenance because they change frequently. Instead, for small companies, store entire phone directories on cell phones. Larger organizations need to consider mass emergency notification services -- you cannot call 500 or 1000 employees to notify them something has happened.
- Disaster recovery procedures for non-essential systems should not be documented in the DR plan because they make the plan bulkier and harder to maintain. In general, secondary systems are usually easier to recover because other IT components are already back in place. Site-specific configuration information for secondary systems can be stored elsewhere (i.e., off-site or on electronic media), referenced in the plan and accessed only when those systems are needed. At this point, however, it is more a system rebuild task than a disaster recovery effort which is why these procedures should be kept out of the main body of the DR plan.
Automation is key in business disaster recovery planning
Because of the level of sophistication of many IT components, they can now enable the automation of many parts of the recovery process. In such cases, documenting configuration details or the recovery procedure itself becomes secondary. Configuration files can be created, stored off-site and uploaded to a replacement device to restore the environment as it was (assuming, in most of the cases below, that the recovery site has identical equipment). Some examples include:
- SAN fabric zoning
- Firewall rules
- Network switch configuration
- System images in a virtual environment
- Storage array configuration
- Authentication and Name Services (Active Directory)
Data replication and failover in DR planning
Last, there is a growing availability of affordable data replication and failover options in physical or virtual server environments. This is a large topic by itself, but to say the least, an IT infrastructure that includes elements of off-site data replication and failover capability requires far less documentation and detailed recovery procedures than an environment that must be rebuilt and recovered.
By leveraging automation, virtualization, replication and failover technologies as outlined in this tip, IT staffs can focus on creating a lightweight, easy-to-maintain business disaster recovery plan that contains only the information required to initiate a successful recovery effort.
About this author: Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.