Manage Learn to apply best practices and optimize your operations.

IT disaster recovery and business continuity planning for non-catastrophic disasters

It's equally important for companies to have a DR plan in place for both catastrophic and non-catastrophic events.

For the majority of people, the term disaster evokes pictures of destructions such as hurricanes, tornadoes, fires...

or even terrorist attacks. For this reason, most IT disaster recovery (DR) plans are typically developed based on the assumption that a catastrophic event would cause the destruction of an entire facility and result in the loss of all IT equipment it housed. However, catastrophic events are not the only types of disasters that cause the loss of an entire IT facility. In the context of business continuity (BC) and IT disaster recovery planning, some more common, non-catastrophic events can cause harmful disasters such as a power outage, hardware failure or a computer virus.

More on disaster recovery planning and management
Evaluating remote access in disaster recovery plans before a disaster strikes

Disaster recovery planning fundamentals: DR testing basics

Disaster recovery monitoring software offers visibility into certain DR environments

These more common events are often not caused by a destructive event, and with proper planning, the risk of a prolonged outage can significantly decrease.

Power failure and disaster recovery planning

A power failure is most likely one of the more frequent causes of outage but also one for which planning is relatively straight forward, albeit, potentially costly. The deployment of uninterruptible power supplies (UPS) can provide enough battery power to protect systems from short power failures or drops. However, the ability to maintain the availability of systems during a prolonged outage will require the deployment of emergency power generators. Proper planning for generators would include the evaluation of the need for redundant generators and the reliability of the refueling service.

Hardware failure

A failed IT component can prevent access to important applications for a period that could exceed the defined recovery time objective (RTO). A failed server for which high availability was deemed unnecessary because the RTO is set to 24 hours can become problematic if it is not repaired or replaced within that time frame. Service agreement with vendors can be put in place, but terms should be carefully reviewed as notification, response time and diagnostic delays are often excluded from the agreed upon repair/replacement service levels. The availability of spare components or systems can help circumvent such situations.

Data deletion

Whether it is accidental or malicious, data deletion can lead to the loss of intellectual property, but this can be easily prevented with a proper data backup strategy. Depending on the criticality of the data and the level of difficulty to recreate it, the data backup strategy can vary from traditional tape or disk backup to continuous data protection (CDP). Also remember to keep in mind that synchronous data mirroring offers great protection against disk failure but little protection against deletion.

Computer viruses and hacker attacks

Incidents involving computer viruses and hackers are not uncommon and have been known to occasionally paralyze a number of applications for days at a time. Virus protection and intrusion prevention/detection software and the implementation of security policies are typically viewed as being part of IT security but they are also part of disaster avoidance. This may also be reinforced with a good data backup strategy as a system restore can sometimes be faster than the time it takes to find the and eliminate the virus of damage caused by hacking.

Denial of access

Events such as criminal acts, chemical spills, neighboring fires, etc. can lead to a company's entire staff being denied access to a facility by emergency responders and external authorities. These are some examples of a disaster affecting your facility for an undetermined duration without necessarily causing any damage to the IT environment. This also reinforces the point that a good disaster recovery plan must include developing knowledge of your immediate surroundings and having a good understanding of the impact it can have on your operations. Technologies supporting remote user access, such as Citrix Systems Inc., Web enabled applications and VMware View are a good way to provide basic protection against this type of interruption.

Loss of key staff

Too many IT disaster recovery strategies rely on the availability of key IT staff and their knowledge of the environment to help with recovery in the event of a disaster. This can create a serious exposure if these employees leave the company or move on to pursue other opportunities. Therefore, it's extremely important to maintain documentation of comprehensive configuration and recovery procedures as well as continually designating and training backup personnel.

Pandemics and business continuity

Pandemics are a type of non-destructive disaster that has been getting tremendous coverage and attention lately in business continuity and disaster recovery planning, and rightfully so. A pandemic can put part of a company's staff out of commission for some time and cause the remainder of the staff to work from home. For companies that rely heavily on IT to provide their services, this can potentially paralyze all operations and turn into a financial disaster. Proper planning and the deployment of technologies that provide remote connectivity such as Citrix, VMware View, VPN services and Windows Terminal Services can allow a company to maintain partial availability of services and to some degree, assist in limiting the spread of a virus among its workforce by limiting one on one contact until it becomes safe again to do so.

While it is important to plan for destructive events that can damage an entire data center, disaster recovery plans should first be built on disaster avoidance and be broad enough to address a wide array of threats. But yet, they must also be granular enough to prevent single component failure or non-destructive events from causing costly interruptions that eventually turn into disasters.

About this author: Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.

Next Steps

Free IT DR template download and guide

Dig Deeper on Disaster recovery planning - management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.