By Paul Kirvan, CISA, CSSP, FBCI, CBCP
When building a data center disaster recovery (DR) plan, remember that you are protecting a significant investment in information technology and communications. Depending on the nature of the disruption, the data center's overall integrity may be untouched or it could be totally destroyed. DR plans need to be flexible and scalable to address a broad range of disruption scenarios. This article, with its associated data center disaster recovery plan template, will help you structure a plan that addresses your data center's operational and people issues.
For purposes of comparison, a data center disaster recovery plan focuses exclusively on a data center facility and its infrastructure, e.g., physical location, construction, security, power sources, and environmental systems. By contrast, a disaster recovery plan is a broad term that describes a process to recover disrupted IT systems, networks, and other critical assets an organization uses.
In this guide on data center disaster recovery planning, learn about the most important ingredients in a successful data center DR plan, who should be involved in the planning process, and how to get started. Then, after you've read our guide, you can download our free data center disaster recovery plan template.
DATA CENTER DISASTER RECOVERY PLAN TEMPLATE AND GUIDE: TABLE OF CONTENTS
>> First steps: An operational risk assessment
>> Developing the data center disaster recovery plan
>> Important data center disaster recovery planning caveats
>> Free data center disaster recovery plan template
A key activity in preparing a data center DR plan is an operational risk assessment of the building (or facility). The assessment analyzes key operating components, such as building location (e.g.,
access routes, proximity to highways, rail lines and airports; proximity to fuel storage tanks); power generation (e.g., commercial power, backup power systems); power protection (grounding and bonding, lightning arrestors, line conditioners, surge suppressors); HVAC (heating, ventilation and air conditioning); critical systems (e.g., servers, VoIP systems); network infrastructure (e.g., cabling, connectors, routers, copper and fiber circuits); security (physical access and information security); work space (e.g., offices, conference rooms, cubicles, furniture, lighting); fire protection (e.g., fire detectors, smoke detectors, fire extinguishers, FM200 extinguishment systems); building floors and walls (fire-rated walls, raised floors); and utilities (e.g., water, power, sewer, communications).
When planning a data center operations risk assessment, coordinate it with IT management and building management (if your firm is a tenant) or facilities management (if the building is your own). Review your objectives for the assessment with these organizations before starting. And review your risk assessment checklist -- if you have one -- with IT management, building management and facilities management to ensure that you are covering all the bases. If possible, ask IT and facilities for any assessments they conducted or have on file. These may help save you time, unless the data is more than a year old.
Perform the following actions in the assessment:
- The data center DR plan development team should meet with various IT groups, such as the internal technology team, application team, and network administrator(s); this ensures that all groups which regularly use the data center's facilities have input into the DR planning process.
- List internal and external data center assets, third-party suppliers and resources, and other stakeholders.
- Gather all relevant infrastructure documents, e.g., building plans, floor plans, system maps, network diagrams, and equipment configurations.
- If available, obtain copy of existing data center DR plan; if this does not exist, proceed with
the following steps:
- Work with management to find out the most serious threats to the data center, e.g., fire, human error, loss of power, system failure, security breach.
- Work with management to discover the most serious vulnerabilities to the data center, e.g., outdated backup power systems.
- Review history of data center outages and disruptions, and how they were handled.
- Determine the maximum outage time management can accept if the data center is unavailable.
- Identify current procedures for responding to data center disruptions.
- Determine when these procedures were last tested.
- Identify data center emergency team(s); determine their level of training with regard to emergencies.
- Identify data center vendor emergency response capabilities, specifically if they have ever been used, did they work properly, cost of the services and status of service contracts.
Compile results from data center operational assessments into a gap analysis report that identifies what is currently done versus what ought to be done, with recommendations on how to achieve the required level of preparedness and estimated investment required.
As part of the analysis process, examine the impact of a data center disruption on the business. What will happen to critical business processes if data center operations are disrupted? What might happen to the company's image, reputation and competitive position in the aftermath of a data center disruption? In addition to identifying business impacts, the assessment can demonstrate opportunities for improvement, and also helps development of the DR plan by identifying existing situations (e.g., older diesel generator that should be replaced) that may be affected by a disruption.
Once you have analyzed the data center and have identified potential risks to operations, prioritize the risk scenarios (e.g., fire, flood, earthquake, hurricane, vandalism) in order of severity, potential damage, and likelihood of occurrence. This will be used to focus the plan's response activities in the proper sequence for the situation.
Using the structure noted in the National Institute for Standards and Technology's SP 800-34 standard "Contingency Planning Guide for Information Technology Systems," we can expand those activities into the following structured sequence of activities.
- The data center plan development team should meet with the internal technology team, facilities department, utility service providers, and relevant vendors to establish the scope of the activity, e.g., internal and external threats, internal and external assets, third-party resources, and linkages to other offices/clients/vendors; be sure to brief senior management on these meetings so they are properly informed.
- Gather all relevant infrastructure documents, e.g., building floor plans, building site plans, utility diagrams, HVAC diagrams, network diagrams and equipment configurations.
- Obtain copies of existing IT disaster recovery plans; if these do not exist, proceed with the following steps:
- Work with management to determine the most serious threats to the data center infrastructure, e.g., fire, human error, loss of power, flooding, system failure, severe weather.
- Identify what management perceives as the most serious vulnerabilities to the data center, e.g., insufficient backup power, minimal building security, proximity of data center to flood plain.
- Review history of data center outages and disruptions, and how the firm handled them.
- Identify what management perceives as the most critical data center assets, e.g., server farms, storage systems, network infrastructure, staffing.
- Determine the maximum outage time management can accept if the identified data center assets are unavailable.
- Identify the operational procedures currently used to respond to critical data center outages.
- Determine when these procedures were last tested to validate their relevance.
- Identify emergency response team(s) for all critical data center disruptions; determine their level of training, especially in emergencies.
- Identify vendor emergency response capabilities; if they have ever been used; if they were did they work properly; how much the company is paying for these services; status of data center maintenance contracts; presence of service-level agreement(s) (SLAs) if used.
- Compile results from all assessments into a gap analysis report that identifies what is currently done versus what ought to be done, with recommendations as to how to achieve the required level of data center preparedness, and investment required.
- Have management review the report and agree on recommended actions.
- Prepare data center disaster recovery plan(s) to address critical assets, e.g., hardware and software, data storage, networks.
- Conduct tests of plans and system recovery assets to validate their operation.
- Update data center disaster recovery plan documentation to reflect changes.
- Schedule next review/audit of data center disaster recovery capabilities.
When building a data center disaster recovery plan, keep in mind the following guidance:
- Obtain senior management support so that your plans can be funded.
- Take the data center DR planning process seriously: Plans don't have to be dozens of pages long; rather, they need the right information, and that information should be current and accurate.
- Consider using standards as part of the process, including NIST SP 800-34, ISO/IEC 24762, and BS 25777, as they provide a useful structured format for plans as well as guidance on the issues to address; this aspect is particularly important if plans will be audited.
- Keep the planning process simple by gathering and organizing accurate information.
- Review results with key departments, such as IT and facilities, to ensure that your assumptions are correct.
Data center disaster plans help protect a significant investment for most organizations. While some firms address data center recovery by building a second data center or leasing specially equipped space at a third-party facility, a careful assessment of data center operations and risks is an important starting point in a DR program.
Download our free data center disaster recovery plan template to help you get started in developing a data center DR plan.
About this author: Paul Kirvan, CISA, CSSP, FBCI, CBCP, has more than 20 years experience in business continuity management as a consultant, author and educator. He has been directly involved with dozens of IT/telecom consulting and audit engagements ranging from governance program development, program exercising, execution and maintenance, and RFP preparation and response. Kirvan currently works as an independent business continuity consultant/auditor and is the secretary of the Business Continuity Institute USA chapter and can be reached at firstname.lastname@example.org.
This was first published in August 2010