Recovering from a disaster: A data center checklist

Recovering from a disaster is never easy. Arm yourself with our data center checklists to make your disaster recovery process easier.

When you developed your data center disaster recovery (DR) plan, you designed it to protect your organization’s investment in information technology, communications and its staff. Depending on the nature of the disruption, your data center’s overall integrity may be untouched or it could be totally destroyed.

DR plans need to be flexible and scalable to address a broad range of disruption scenarios. In this article, we’ll provide data center checklists with recommended actions you can take in the aftermath of disaster. These checklists will make recovering from a disaster easier. Make sure you have the data center checklist—or a modified version using your own requirements—as you review the effects of a disruptive incident to your data center. Once you have completed an initial assessment of the situation and you are satisfied with the location of your staff, begin executing the DR plan.

Data center disaster recovery planning assumptions

A data center disaster recovery plan focuses exclusively on a data center facility and its infrastructure, such as its physical location, construction, security, power sources, environmental systems and its people. Be sure you’ve factored in the operational aspects of your data center as well as the people supporting it. This means addressing the following as you build your DR plan:

  • Data center technical and management staff, all shifts
  • Data center building (e.g., physical infrastructure, construction, location of entrances and exits, raised floor areas)
  • Building location (e.g., access routes, proximity to highways, rail lines and airports; proximity to fuel storage tanks)
  • Power generation (e.g., commercial power, backup power systems)
  • Power protection (grounding and bonding, lightning arrestors, line conditioners, surge suppressors)
  • Environment (e.g., heating, ventilation and air conditioning)
  • Critical systems (e.g., servers, power distribution units, VoIP systems, call center systems)
  • Network infrastructure (e.g., cabling, connectors, routers, copper and fiber circuits, cable racks)
  • Security (physical access and information security)
  • Work space (e.g., offices, conference rooms, cubicles, furniture, lighting)
  • Fire protection (e.g., fire detectors, smoke detectors, fire extinguishers, FM200 extinguishment systems)
  • Building floors and walls (fire-rated walls, raised floors)
  • Utilities (e.g., water, power, sewer, communications)

Developing the disaster response

When developing disaster response action steps (the incident response part of a DR plan), you should discuss your ideas with building management (if your firm is a tenant) or facilities management (if the building is your own), as well as IT management. Review your response plan with all appropriate internal and external parties (e.g., first responders) to ensure that you are covering all the bases.

Factor in the following items as part of your design process:

  • Relationships with various IT groups, such as the internal technology team, application team and network administrator(s). This ensures that all groups that regularly use the data center’s facilities have input into the disaster response process 
  • Relationships with external stakeholders, such as vendors and managed service providers
  • Relationships with other company offices (if you have them) as they could be an important part of your recovery plan (e.g., providing alternate data center space)
  • Relevant infrastructure documents, e.g., building plans, floor plans, system maps, network diagrams and equipment configurations 

The following items should be factored into your disaster response:

1.      Management’s perception of the most serious data center threats, e.g., fire, human error, loss of power, system failure, security breach. Be aware that initial management assumptions may be wrong, so be prepared to make corrections quickly.

2.      Management’s perception of the most serious vulnerabilities to the data center, e.g., outdated backup power systems.

3.      Results of previous data center outages and disruptions, how they were handled and lessons learned.

4.      Management’s maximum acceptable outage time for a data center disruption.

5.      Established industry practices for responding to data center disruptions.

6.      Experience and lessons learned from other data center disasters.

7.      Data center emergency team(s) that are trained in responding to emergencies.

8.      Emergency response capabilities of your primary and alternate data center vendors and emergency response capabilities. If they have ever been used, did they work properly? Cost of the services and status of service contracts. 

Data center checklist: General response

The following checklist can be used in the initial response stages of a data center disruption. Clearly the nature of the incident will influence which steps you will take and in which sequence. For example, response steps for a power outage will probably be somewhat different than for a fire. Be sure to include these steps in your DR plan.

Scenario 1: Power outage

Step

Action taken

Comments

1

Determine extent of outage, if backup power systems engaged

Contact staff via cell phones, check power supplies, use rechargeable flashlights to move around safely

2

Determine if staff need to evacuate

Meet with key IT staff ASAP

3

Assess potential damage to firm; ensure that critical data is backed up and protected

Meet with key IT staff ASAP

4

Contact senior management

Advise on initial situation

5

Contact utility company

Contact via cell phone, unless PBX system is operational, arrange to have emergency crew dispatched

6

Identify cause of outage, launch 
remediation efforts

Work with utility company, 
electricians, others

7

Assess when data center operations 
can resume

Meet with key IT staff, utility 
company, others

8

Contact senior management, send regular updates on progress

Advise on response, remediation and ongoing efforts post-outage

Scenario 2: Server failure

Step

Action taken

Comments

1

Determine extent of server outage, data loss and other potential outcomes

Contact staff via cell phones, check server(s) in question

2

Launch remediation efforts, e.g., check power supply, attempt server restart, run diagnostics

Contact vendors as needed

3

Assess potential damage to firm, ensure that critical applications and data running on server(s) is backed up and protected

Meet with key IT staff ASAP

4

Identify cause of server outage, continue remediation efforts

Work with staff, vendors

5

Assess when normal server operations can resume

Meet with key IT technical staff, vendors

6

Contact senior management, send regular updates on progress

Advise on response, remediation and ongoing efforts post-outage

Scenario 3: Data center fire

Step

Action taken

Comments

1

Assess nature and extent of fire

Contact staff via cell phones

2

Use existing fire suppression equipment to extinguish fire, e.g., sprinklers, hand-held extinguishers

If it is obvious that the fire is more severe in nature, move quickly to seal off the area if possible

3

Dial 911, advise of situation

 

4

Evacuate building staff

Meet with key IT staff ASAP at designated assembly area(s)

5

If possible, activate data backup measures to protect current data

If offsite storage facilities are available, activate them

6

Once fire is out, begin damage assessment

Meet with IT staff, building staff, facilities management staff

7

Update senior management on status

Advise on response, remediation and efforts post-fire

Response checklist: Major situations

The previous response sequences were for typical data center disasters. The sequence of steps for each situation may vary by organization. Be flexible in your response; modify your actions as dictated by the nature and severity of the incident. 

The following data center checklist offers suggested steps for dealing with a major disruption to a data center. These steps may need to be prefaced by some of the steps in the previous scenarios. And be sure to include these steps in your DR plan.

Response checklist: Building loss

Step

Action taken

Comments

1

Contact affected business units and advise them to prepare to relocate to an alternate location (or whatever is specified in the recovery plan)

Advise key internal and external staff/organizations as defined in DR plan

2

Contact external organizations (e.g., vendors, suppliers, couriers and storage companies) to launch emergency service arrangements

Advise key internal and external staff/organizations as defined in DR plan

3

If hardware systems have been damaged or destroyed activate process for recovering damaged hardware

Advise key internal and external staff/organizations as defined in DR plan

4

If software (e.g., operating systems, applications) has been damaged or destroyed activate process for recovering damaged software

Advise key internal and external staff/organizations as defined in DR plan

5

If communications systems and network services have been damaged or disrupted activate process for recovering those assets

Advise key internal and external staff/organizations as defined in DR plan

6

If email/BlackBerry services have been damaged or destroyed activate process for recovering those operations

Advise key internal and external staff/organizations as defined in DR plan, business units may need to use other means of communicating if email/BlackBerry servers are destroyed, e.g., text messaging, social networks

7

If critical data have been damaged or destroyed activate data recovery and restoration processes

Advise key internal and external staff/organizations as defined in DR plan

8

If paper and other documents have been destroyed activate process for recovering destroyed documents

Advise key internal and external staff/organizations as defined in DR plan, if problem cannot be fixed within one day by recreation from backups, discuss with staff and other stakeholders how to manage current operations on an ad hoc basis

9

If paper and other documents have been damaged, activate process for recovering damaged documents

Advise key internal and external staff/organizations as defined in DR plan, if problem cannot be fixed within one day by recreation from backups, discuss with staff and other stakeholders how to manage current operations on an ad hoc basis

10

Confirm with management that data center staff need to relocate

Meet with key IT staff, company management, others

11

Work with corporate facilities and other internal and external groups to begin process of locating alternate data center space, e.g., temporary managed services arrangement and leased space in backup data center facilities until a new data center can be completed

DR plan will hopefully have addressed this scenario so that suitable primary and alternate data center space is identified

12

Initiation and coordination of activities needed to relocate data center operations

DR plan will hopefully have addressed this

13

Provide regular progress updates to corporate management

DR plan will hopefully have addressed this

14

Organize and conduct regular recovery team meetings

DR plan will hopefully have addressed this

The previous steps assume that specific plans have been developed for the various situations listed, such as email recovery, hardware and software recovery, data recovery, document recovery and relocation to an alternate data center. 

Post-disaster assessments

Once the situation has been mitigated and recovery can begin, assess the event, determine what happened, what worked and what didn’t work. Schedule and conduct meetings as often as practical to compile this important data, as it may be necessary for insurance claims and even possible lawsuits.

Additional data center disaster recovery planning resources

Developing a data center disaster response can be very complex, depending on the amount of detail you elect to include. One way to facilitate this process is to review existing standards and data center practices. Three useful ones are:

 When building a data center disaster recovery plan, keep in mind the following actions:

1.      Secure senior management support so that your plans can be funded, documented and regularly exercised.

2.      Take the data center DR planning process seriously: Plans do not have to be dozens of pages long, but they should contain current and accurate information.

3.      Consider using standards as part of the process, such as the ones previously listed.

4.      Keep the planning process simple by gathering and organizing the right information

5.      Review results with key departments, such as facilities, to ensure that your assumptions are correct.

Data center disasters can seriously disrupt business operations. While some firms address data center recovery by building a second data center or leasing specially equipped space at a third-party facility, a careful assessment of data center operations and risks is an important starting point in a DR program. With a well-developed disaster recovery plan, especially one with well-defined recovery and restoration steps, damage to a data center can be minimized.

About this author:
Paul Kirvan, CISA, CSSP, FBCI, CBCP, has more than 20 years experience in business continuity management as a consultant, author and educator. He has been directly involved with dozens of IT/telecom consulting and audit engagements ranging from governance program development, program exercising, execution and maintenance, and RFP preparation and response. Kirvan currently works as an independent business continuity consultant/auditor and is the secretary of the Business Continuity Institute USA chapter and can be reached at pkirvan@msn.com.

This was first published in May 2011

Dig deeper on Disaster Recovery Planning-Management

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDataBackup

SearchStorage

Close