Recovering from a disaster: A data center checklist
Recovering from a disaster is never easy. Arm yourself with our data center checklists to make your disaster recovery process easier.
When you developed your data center disaster recovery (DR) , you designed it to protect your organization’s investment...
Continue Reading This Article
Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
in information technology, communications and its staff. Depending on the nature of the disruption, your data center’s overall integrity may be untouched or it could be totally destroyed.
DR plans need to be flexible and scalable to address a broad range of disruption scenarios. In this article, we’ll provide data center checklists with recommended actions you can take in the aftermath of disaster. These checklists will make recovering from a disaster easier. Make sure you have the data center checklist—or a modified version using your own requirements—as you review the effects of a disruptive incident to your data center. Once you have completed an initial assessment of the situation and you are satisfied with the location of your staff, begin executing the DR plan.
Data center disaster recovery assumptions
A data center disaster recovery focuses exclusively on a data center facility and its infrastructure, such as its physical location, construction, security, power sources, environmental systems and its people. Be sure you’ve factored in the operational aspects of your data center as well as the people supporting it. This means addressing the following as you build your DR :
- Data center technical and management staff, all shifts
- Data center building (e.g., physical infrastructure, construction, location of entrances and exits, raised floor areas)
- Building location (e.g., access routes, proximity to highways, rail lines and airports; proximity to fuel storage tanks)
- Power generation (e.g., commercial power, backup power systems)
- Power protection (grounding and bonding, lightning arrestors, line conditioners, surge suppressors)
- Environment (e.g., heating, ventilation and air conditioning)
- Critical systems (e.g., servers, power distribution units, VoIP systems, call center systems)
- Network infrastructure (e.g., cabling, connectors, routers, copper and fiber circuits, cable racks)
- Security (physical access and information security)
- Work space (e.g., offices, conference rooms, cubicles, furniture, lighting)
- Fire protection (e.g., fire detectors, smoke detectors, fire extinguishers, FM200 extinguishment systems)
- Building floors and walls (fire-rated walls, raised floors)
- Utilities (e.g., water, power, sewer, communications)
Developing the disaster response
When developing disaster response action steps (the incident response part of a DR plan), you should discuss your ideas with building management (if your firm is a tenant) or facilities management (if the building is your own), as well as IT management. Review your response plan with all appropriate internal and external parties (e.g., responders) to ensure that you are covering all the bases.
Factor in the following items as part of your design process:
- Relationships with various IT groups, such as the internal technology team, application team and network administrator(s). This ensures that all groups that regularly use the data center’s facilities have input into the disaster response process
- Relationships with external stakeholders, such as vendors and managed service providers
- Relationships with other company offices (if you have them) as they could be an important part of your recovery (e.g., providing alternate data center space)
- Relevant infrastructure documents, e.g., building plans, floor plans, system maps, network diagrams and equipment configurations
The following items should be factored into your disaster response:
1. Management’s perception of the most serious data center threats, e.g., fire, human error, loss of power, system failure, security breach. Be aware that initial management assumptions may be wrong, so be prepared to make corrections quickly.
2. Management’s perception of the most serious vulnerabilities to the data center, e.g., outdated backup power systems.
3. Results of previous data center outages and disruptions, how they were handled and lessons learned.
4. Management’s maximum acceptable outage time for a data center disruption.
5. Established industry practices for responding to data center disruptions.
6. Experience and lessons learned from other data center disasters.
7. Data center emergency team(s) that are trained in responding to emergencies.
8. Emergency response capabilities of your primary and alternate data center vendors and emergency response capabilities. If they have ever been used, did they work properly? Cost of the services and status of service contracts.
Data center checklist: General response
The following checklist can be used in the initial response stages of a data center disruption. Clearly the nature of the incident will influence which steps you will take and in which sequence. For example, response steps for a power outage will probably be somewhat different than for a fire. Be sure to include these steps in your DR plan.
Scenario 1: Power outage
Step |
Action taken |
Comments |
1 |
Determine extent of outage, if backup power systems engaged |
Contact staff via cell phones, check power supplies, use rechargeable flashlights to move around safely |
2 |
Determine if staff need to evacuate |
Meet with key IT staff ASAP |
3 |
Assess potential damage to firm; ensure that critical data is backed up and protected |
Meet with key IT staff ASAP |
4 |
Contact senior management |
Advise on initial situation |
5 |
Contact utility company |
Contact via cell phone, unless PBX system is operational, arrange to have emergency crew dispatched |
6 |
Identify cause of outage, launch |
Work with utility company, |
7 |
Assess when data center operations |
Meet with key IT staff, utility |
8 |
Contact senior management, send regular updates on progress |
Advise on response, remediation and ongoing efforts post-outage |
Scenario 2: Server failure
Step |
Action taken |
Comments |
1 |
Determine extent of server outage, data loss and other potential outcomes |
Contact staff via cell phones, check server(s) in question |
2 |
Launch remediation efforts, e.g., check power supply, attempt server restart, run diagnostics |
Contact vendors as needed |
3 |
Assess potential damage to firm, ensure that critical applications and data running on server(s) is backed up and protected |
Meet with key IT staff ASAP |
4 |
Identify cause of server outage, continue remediation efforts |
Work with staff, vendors |
5 |
Assess when normal server operations can resume |
Meet with key IT technical staff, vendors |
6 |
Contact senior management, send regular updates on progress |
Advise on response, remediation and ongoing efforts post-outage |
Scenario 3: Data center fire
Step |
Action taken |
Comments |
1 |
Assess nature and extent of fire |
Contact staff via cell phones |
2 |
Use existing fire suppression equipment to extinguish fire, e.g., sprinklers, hand-held extinguishers |
If it is obvious that the fire is more severe in nature, move quickly to seal off the if possible |
3 |
Dial 911, advise of situation |
|
4 |
Evacuate building staff |
Meet with key IT staff ASAP at designated assembly (s) |
5 |
If possible, activate data backup measures to protect current data |
If offsite storage facilities are available, activate them |
6 |
Once fire is out, begin damage assessment |
Meet with IT staff, building staff, facilities management staff |
7 |
Update senior management on status |
Advise on response, remediation and efforts post-fire |
Response checklist: Major situations
The previous response sequences were for typical data center disasters. The sequence of steps for each situation may vary by organization. Be flexible in your response; modify your actions as dictated by the nature and severity of the incident.
The following data center checklist offers suggested steps for dealing with a major disruption to a data center. These steps may need to be prefaced by some of the steps in the previous scenarios. And be sure to include these steps in your DR .
Response checklist: Building loss
Step |
Action taken |
Comments |
1 |
Contact affected business units and advise them to prepare to relocate to an alternate location (or whatever is specified in the recovery ) |
Advise key internal and external staff/organizations as defined in DR |
2 |
Contact external organizations (e.g., vendors, suppliers, couriers and storage companies) to launch emergency service arrangements |
Advise key internal and external staff/organizations as defined in DR |
3 |
If hardware systems have been damaged or destroyed activate process for recovering damaged hardware |
Advise key internal and external staff/organizations as defined in DR |
4 |
If software (e.g., operating systems, applications) has been damaged or destroyed activate process for recovering damaged software |
Advise key internal and external staff/organizations as defined in DR |
5 |
If communications systems and network services have been damaged or disrupted activate process for recovering those assets |
Advise key internal and external staff/organizations as defined in DR |
6 |
If email/BlackBerry services have been damaged or destroyed activate process for recovering those operations |
Advise key internal and external staff/organizations as defined in DR , business units may need to use other means of communicating if email/BlackBerry servers are destroyed, e.g., text messaging, social networks |
7 |
If critical data have been damaged or destroyed activate data recovery and restoration processes |
Advise key internal and external staff/organizations as defined in DR |
8 |
If paper and other documents have been destroyed activate process for recovering destroyed documents |
Advise key internal and external staff/organizations as defined in DR , if problem cannot be fixed within one day by recreation from backups, discuss with staff and other stakeholders how to manage current operations on an ad hoc basis |
9 |
If paper and other documents have been damaged, activate process for recovering damaged documents |
Advise key internal and external staff/organizations as defined in DR , if problem cannot be fixed within one day by recreation from backups, discuss with staff and other stakeholders how to manage current operations on an ad hoc basis |
10 |
Confirm with management that data center staff need to relocate |
Meet with key IT staff, company management, others |
11 |
Work with corporate facilities and other internal and external groups to begin process of locating alternate data center space, e.g., temporary managed services arrangement and leased space in backup data center facilities until a data center can be completed |
DR will hopefully have addressed this scenario so that suitable primary and alternate data center space is identified |
12 |
Initiation and coordination of activities needed to relocate data center operations |
DR will hopefully have addressed this |
13 |
Provide regular progress updates to corporate management |
DR will hopefully have addressed this |
14 |
Organize and conduct regular recovery team meetings |
DR will hopefully have addressed this |
The previous steps assume that specific plans have been developed for the various situations listed, such as email recovery, hardware and software recovery, data recovery, document recovery and relocation to an alternate data center.
Post-disaster assessments
Once the situation has been mitigated and recovery can begin, assess the event, determine what happened, what worked and what didn’t work. Schedule and conduct meetings as often as practical to compile this important data, as it may be necessary for insurance claims and even possible lawsuits.
Additional data center disaster recovery resources
Developing a data center disaster response can be very complex, depending on the amount of detail you elect to include. One way to facilitate this process is to review existing standards and data center practices. Three useful ones are:
- National Institute for Standards and Technology’s SP 800-34 standard “Contingency Guide for Information Technology Systems”
- International Organization for Standardization’s standard ISO 24762 (2008) “Guidelines for information and communications technology disaster recovery services”
- International Organization for Standardization’s standard ISO 27031 (2011) “Guidelines for information and communication technology readiness for business continuity”
When building a data center disaster recovery , keep in mind the following actions:
1. Secure senior management support so that your plans can be funded, documented and regularly exercised.
2. Take the data center DR process seriously: Plans do not have to be dozens of pages long, but they should contain current and accurate information.
3. Consider using standards as part of the process, such as the ones previously listed.
4. Keep the process simple by gathering and organizing the right information
5. Review results with key departments, such as facilities, to ensure that your assumptions are correct.
Data center disasters can seriously disrupt business operations. While some firms address data center recovery by building a second data center or leasing specially equipped space at a third-party facility, a careful assessment of data center operations and risks is an important starting point in a DR program. With a well-developed disaster recovery , especially one with well-defined recovery and restoration steps, damage to a data center can be minimized.
About this author:
Paul Kirvan, CISA, CSSP, FBCI, CBCP, has more than 20 years experience in business continuity management as a consultant, author and educator. He has been directly involved with dozens of IT/telecom consulting and audit engagements ranging from governance program development, program exercising, execution and maintenance, and RFP preparation and response. Kirvan currently works as an independent business continuity consultant/auditor and is the secretary of the Business Continuity Institute USA chapter and can be reached at [email protected].