When planning for disaster recovery (DR), IT professionals put a lot of effort in ensuring that the critical components of their IT infrastructure have the necessary redundancies in place to support the availability or recovery requirements defined by the business.
While this is definitely a requirement at the system and application level, the underlying infrastructure such as power and cooling cannot be overlooked. Disasters do not always take the shape of destructive events such as a tornado, hurricane or fire that can wipe out an entire facility. A power failure lasting many hours is actually considered a disaster by many organizations. Most will agree that failing over to a recovery site due to building maintenance or other services can potentially be a risky and costly exercise. Likewise, replacing an uninterruptible power system (UPS) or servicing an air conditioning unit should not force a company to activate their disaster recovery plan. This is why the data center infrastructure must have a redundancy and meet the same availability requirements as those applicable to the IT infrastructure it supports.
Data center infrastructure redundancies include cooling and power, but must also have a reliable power distribution path. Essentially, an outage at the facility level should not cause IT systems to be unavailable beyond their defined recovery or availability requirements.
Data center availability rating
Data center availability is usually rated by tiers which were originally developed by The Uptime Institute back in 1995, and have since become widely accepted by the industry. Attributes such as Basic (Tier 1), Redundant capacity components (Tier 2), Concurrently maintainable (Tier 3) and Fault tolerant (Tier 4), are used to describe the availability of site infrastructures. It must be noted that a Tier 1 certified facility must include components such as an emergency power generator and UPS to ensure a basic level of availability. Subsequent tier levels build on those basic redundancies all the way to a fully fault tolerant facility (Tier 4).
A facility that does not benefit from a power generator will not achieve any tier rating. Not every facility seeks availability certification, so this is not necessarily bad as long your IT systems availability requirements can tolerate it.
Common mistakes in data center disaster recovery planning
Without seeking to build the most fault tolerant facility, the following are some common mistakes and issues that are often at the root of outages:
UPS batteries: Organizations that do not have access to an emergency power generator are often tempted to try to fill the gap with extended UPS battery runtime, hoping to ride out a power failure. The problem with this is that systems are not much use if they are the only thing running while the rest of the building is without power. Furthermore, systems cannot run very long without air conditioning, which is one piece of equipment that should never be powered by an uninterruptible power system. It's usually a good idea to limit battery runtime to have just enough capacity to allow a graceful system shutdown; anything more will not necessarily provide a great return on investment unless it is for a very specific reason, such as security or life support systems.
Redundant UPS: Better uninterruptible power systems have built in redundancies such as N+1 power modules and maintenance bypass switches to prevent outages resulting from maintenance such as battery replacement. Deploying dual uninterruptible power systems (N+N) for protection against total UPS or power circuit failure is a good practice, but only if there are dual (A +B) power feeds. Implementing a standard for dual-corded servers powered from separate in-rack power distribution units, each powered by an independent uninterruptible power system, will still leave an exposure if the single breaker or subpanel to which they are both connected is a single point of failure.
Cooling redundancy and capacity: Cooling issues are a common source of outages that can quickly turn into a disaster. Many IT environments are in danger because of cooling, simply because they ran out of capacity for it. Capacity must be monitored closely, and shortages must be addressed before issues develop.
Building a redundant data center is always a challenge for smaller organizations that have high-availability requirements for their IT systems, but do not have the budget to implement a highly redundant or fault-tolerant facility. This is where options like hosting or collocation become appealing to smaller organizations. The cost for access to a hardened facility is shared by many users and becomes an operational cost rather than a large capital expenditure.
Ultimately, it is the recovery and availability requirements of the IT infrastructure supporting the business activity that dictates the availability requirements of the data center facility. The higher the impact of an outage on the business, the easier it becomes to justify the cost of redundancy. IT and data center/storage managers working in high risk areas for natural disasters should never forget that even the most redundant data center is still a single point of failure by itself. Having a highly redundant and fault-tolerant facility is not a substitute for a disaster recovery strategy or plan.
Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.
Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful disaster recovery tip, timesaver or workaround? Email the editors to talk about writing for SearchDisasterRecovery.com.