Disaster recovery planning has become a war cry in many organizations as the result of high profile natural and manmade disasters and increasingly strict regulations and laws mandating the protection and long-term retention of data -- an organization's most irreplaceable asset second only to skilled personnel.
For the next four columns, I will present an in-depth treatment of what's involved in disaster recovery planning from a practical standpoint. For those who have not previously worked on a planning project, it is hoped that this series will dispel the myths and help you focus on what needs to be done to develop an effective recovery capability.
DR planning methodologies are often branded to specific consulting practices and represented as complex and convoluted processes known only to a few privileged practitioners. But in fact, the DR planning methodology is a straightforward application of common sense that follows a pragmatic project plan similar to systems development lifecycle methodology.
As shown in this diagram, the initial DR planning project involves 10 tasks, and these may be further refined into three subsets or phases. The first phase may be referred to as "analysis." As shown in the next figure and discussed below, the tasks in this phase include project initiation, data collection and the completion of a preliminary risk analysis. Subsequent phases are "design," in which the capabilities are created to actually recover from a disaster, and "implementation," in which the strategies selected for recovery are tested and tests provide feedback to the planning process. This column will focus on the first phase and subsequent columns on the remaining phases of the planning process.
The disaster recovery planning project typically begins with project initiation. This airy task includes the assembling of a project team, possibly including internal and external subject matter experts, and the formulation of plan objectives, budget and other logistical matters. One thing you should consider doing early is standardizing upon software tools for planning team members to use in collecting and assembling data. There are numerous DR templates available in print and as software that you may wish to consider to help standardize data collection, but desktop productivity software such as e-mail, word processing and spreadsheet or manageable database tools work just as well. The point is to have everyone submitting documents that are in the same format for ease of consolidation and analysis later on.
Data collection, the next task, involves the collection of information on internal applications and infrastructure, as well as the collection of information on risks and exposures drawn from various media, and the assembly of case study data on the disaster avoidance and recovery techniques used by other companies. Data is often collected and categorized into a "data store" or central reference repository, where those involved in later tasks can access it more readily.
Data collection aims at identifying business processes and the applications and infrastructure used to support them. You will also want to collect information about the cost to the organization of an outage affecting each business process that is 24, 48 and 72 hours in length. Interview department or business unit managers to obtain the costs both in terms of hard dollars and in terms of intangible losses (consumer confidence, etc.).
Next, you are off to risk analysis. Risk analysis involves the assignment of recovery priorities among business processes and the applications and infrastructures that support them. Partly, it is an attempt to predict the loss exposure of the company to interruptions of applications. Based on factors such as the accrued dollar loss exposure of an interruption event, criticality is assigned to each business process and to its related applications and infrastructure.
The "exposure scenario" that will guide plan development is also formulated at this juncture. Ideally, the scenario guiding a contemporary plan will be one of total loss of primary facilities, an aggregation of all dollar costs and intangible exposures accrued in 24, 48 and 72 hours following a disaster. This makes a compelling case for plan funding that may be important later if management commitment begins to wane. Ultimately, the plan you create should address the worst-case scenario but should be structured in a modular fashion to provide the means to respond to lesser disasters in a flexible way.
At the completion of the first phase, planners then turn their attention to the nuts and bolts of the plan: strategies for building avoidance and fault tolerance into their current environments and logistics for recovering data, application hosting platforms, networks and end-user work areas in the wake of a disaster. The philosophy that should guide this undertaking is simply stated, "eliminate disaster potentials that can be eliminated, and plan to minimize the impact of potentials that cannot be eliminated."
About the author: Jon William Toigo has authored hundreds of articles on storage and technology along with his monthly SearchStorage.com "Toigo's Take on Storage" expert column and backup/recovery feature. He is also a frequent site contributor on the subjects of storage management, disaster recovery and enterprise storage. Toigo has authored a number of storage books, including "Disaster recovery planning: Preparing for the unthinkable, 3/e".