Q
Manage Learn to apply best practices and optimize your operations.

How frequently should you conduct failover testing?

Marc Staimer of Dragon Slayer Consulting says failover testing should be part of data protection, disaster recovery and business continuity processes.

The answer is relative. Failover testing should be part of data protection, disaster recovery and business continuity...

processes. It is an essential part of data protection that unfortunately gets short shrift. Failover tests, while often unsuccessful, can reveal errors in processes, designs, architectures, decisions, products and assumptions.

Competent professionals expect issues and problems from failover tests. They learn from the inevitable failures then fix and improve their processes in a low business-risk environment (i.e. the test). Not-so-competent professionals avoid testing or minimize it because they're afraid it will reveal errors in judgment or in promises made to management.

Unfortunately, the absolute worst time to discover failover problems is when a disaster strikes. At that point in time, there is little that can be done -- except to clean up.

Failover testing, in an ideal world, will be performed frequently. Time, personnel, resources and budgets will dictate that frequency. Business continuity best practices recommend quarterly failover testing; however, that schedule may not be pragmatic for many organizations. At the least, failover testing should be performed semi-annually.

The first step required for failover testing starts with management committing to performing the testing on a regular, periodic schedule. The second step is determining application and data value. That value is commonly determined by the recovery point objective (RPO) and recovery time objective (RTO). For example, if the data is backed up, snapped or replicated once a day, then the RPO is 24 hours. If the data is protected 4 times a day, then the RPO is 6 hours. If the data is protected on every write as with CDP (continuous data protection), the RPO is "zero." RTO is simply defining how much application and data downtime is tolerable.

The next step is the determination of application and data prioritization. This is the process of figuring out what applications -- and their data -- must be brought up and made running first. What about the employees? How will they work? How will they access their desktops, applications and data? All of that must be worked out, as well. It does the organization no good if the applications and data are up and running, but no one has access to them.

Once all of the failover processes and procedures have been determined, a written plan clearly specifying in detail the steps required, the primary individual responsible for each step, a backup individual for each step, and the expected timeframes for each step to handle a failover, is mandatory.

Testing requires that a realistic representational portion of that plan be put through its paces multiple times, simulating multiple contingencies, such as the primary person for a process being incapacitated. Following each test, implement what was learned from the test, correct errors and improve the process. The written plan should also be updated after each failover test.

One final step must be part of the process. The plan is a living process. New applications, data and systems are added to IT environments all of the time, while old ones are retired or taken offline. There must be one (preferably more) IT professional responsible for ensuring the failover plan is continuously up to date.

It is ultimately all about prioritization, process, preparation, practice and patience. Create a plan, work the plan, test the plan, evaluate the test, modify the plan, and repeat.

Next Steps

Protect your data center before attempting a failover

Six steps for effective cloud-based DR

Backup power maintenance best practices

This was last published in October 2014

Dig Deeper on Disaster recovery planning - management

PRO+

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

I don't understand how you can have data protection, disaster recovery and business continuity processes *without* having failover testing. Otherwise you're just doing your handwaving "here a miracle occurs" and you have no idea how it's going to work in practice. 
Cancel

-ADS BY GOOGLE

SearchSolidStateStorage

SearchCloudStorage

SearchDataBackup

SearchStorage

Close