Disaster recovery testing best practices: Test thoroughly and often

DR testing is something everyone talks about, but it's useless if you don't actually do it. Follow these DR testing best practices to ensure you're always prepared for a disaster.

Disaster recovery testing is something everyone talks about but few actually do. But your disaster recovery plan...

is useless unless you know it actually works.

Disaster recovery testing best practices dictate regular, consistent and thorough testing, say practitioners. disaster recovery tools can be one part of the solution. According to Ken Cameron, managing director at the Windsor Group, an IT consulting company, disaster recovery testing tools fall into numerous categories like communications, replication, and automation and depend on your specific environment. "Some are specific to storage, some to mainframes, Unix, Linux or Windows," he said. "Recently, some vendors are announcing tools that cross the platform boundaries, but they still have to matched to your environment," he added.

However, according to Cameron, the best possible test for a large, complex environment is to completely switch your production workload between the primary and backup data centers. "Ideally, you are set up so this takes place dynamically as part of a replicated, fail-over environment. If it is not dynamic, you should switch your workload at least once a year, actually running production from your backup site."

In this article, find out how often you should perform a disaster recovery test, if a virtualized approach to disaster recovery testing is right for you and what other disaster recovery testing best practices you can apply to your disaster recovery environment.

Disaster recovery testing best practices table of contents:

>> How often should you perform a disaster recovery test?
>> A virtualized approach to disaster recovery testing
>> Disaster recovery testing best practices

How often should you perform a disaster recovery test?

At the City of Mission Viejo, in Calif., their disaster recovery testing is conducted on an annual basis, supplemented by quarterly reviews and is comparatively simple, thanks in part to the fact that the municipality runs a VMware environment with a disaster recovery site about 50 miles away in San Diego. According to Jackie Alexander, director of information technology, Double-Take (now part of Vision Solutions) DR software, Vizioncore vReplicator (now Quest Software) and VMware Inc. and are the three products used to replicate along with EMC Corp. Data Domain boxes.

"In our testing it takes us about two hours to bring San Diego up as a full site to provide all our services -- and our basic services are up even faster," she said.

The key, according to Alexander, is a detailed, how-to document that is updated quarterly. "We knew we didn't want a DR plan that would just sit on the shelf," she said. That updating process is done manually each quarter -- though some quarters there are actually no changes. Once it is done, it undergoes "dry run" testing by the IT staff to make sure the document makes sense. "We do that by running through the entire IT staff to make sure they know how to activate a DR event and when and what to activate," she said. In addition, Alexander said she involves a small number of individuals from outside of IT. "Most of us live 20 or more miles outside of the city, so in a disaster we might need people who live locally to help out," she said. "We activate them during our dry runs."

However, involving outsiders comes with a caution. "At first, we took it for granted that they would know how to do certain things or would understand certain things that we knew. In fact, we had to rework the document and then test it to make sure they could really follow it," she explained.

And testing is extensive. "We can do it at any time because as long as we aren't replicating it doesn't impact the production environment; we stop short of that," she said. Gasiamis also pointed out that a disaster recovery plan should include vendors and other "outside" resources that may need to be contacted during an event.

Alexander said her organization does not use any special tools, just the capabilities included with the backup and DR products.

A virtualized approach to disaster recovery testing

Kevin Laudan, an IT manager at Silicon Genesis in San Jose, Calif., said virtualization can be a part of the disaster recovery testing picture, and presents its own opportunities. "Testing your disaster recovery into a virtual environment makes testing and retesting much easier. The virtual environment will take more time to set up, but it will be worth it in the long run," he said. Laudan said his production environment is a mixture of virtual and physical servers and the DR system depends on using virtual-to-physical and physical-to-virtual products. "There are a whole bunch of products in the background such as Symantec Backup Exec and System Recover Server. In our disaster recovery plan, we make the worst-case assumption that the building is gone and the scenario calls for getting backup hard drives with company data to a safe facility," he explained.

Laudan said he generally just tests the "most critical aspects" of disaster recovery and has found, happily, that the virtual machines "are less of a hiccup" than the physical machines to restore. "This can be a way to really simplify DR if you make sure to test," he said.

Disaster recovery testing best practices

Another option for disaster recovery testing is to simply hire a business continuity (BC) consultancy, said Steve Gasiamis, an experienced risk management consultant, at HEIT Inc., an IT managed services provider. Gasiamis said when developing a DR testing plan, he starts with a business impact analysis and then categorizes everything based on that and focuses primarily on testing the most critical elements. For his clients, most of whom are in the banking industry, testing frequency -- another critical factor in ensuring that DR plans work -- is generally driven by government (FDIC) requirements.

Norman Snow, an independent business resilience consultant located in the New York metropolitan area, said one of his disaster recovery testing best practices is that DR testing should engage on the technology level, with IT and related functions, as well as with the business unit -- to consider continuity of operations and processes along with validation of actual data availability. Also, he recommends testing semi-annually, once with IT alone and the other with business unit involvement.

'Be prepared' shouldn't be only the motto of the Boy Scouts but the mantra of IT in this day and age


Norman Snow
Independent business resilience consultant

Different aspects of disaster recovery testing also need to be prioritized, he said; test what is critical for restoration, recovery and continuity of operations. Don't take short cuts. "Take notes regarding issues that come up and how they have been resolved," he said. Then, he suggested, "Conduct a post-mortem afterwards with all involved, including vendors where necessary, to address lessons learned and areas for improvement."

"'Be prepared' shouldn't be only the motto of the Boy Scouts but the mantra of IT in this day and age," said Snow. "Falling into complacency is the danger." To avoid complacency, regular testing is the key. Test at least annually or semi-annually. And, said Snow, any time there is a significant change in applications or processes -- anything that will be part of the production environment -- you probably need to test that as soon as possible.

Of course, budgets aren't unlimited. Therefore, Snow advises focusing on the most critical aspects of your operation for full-scale testing, and then in the "off quarters" spend some time testing other aspects of your systems. "You should be especially alert for code dependencies because the things you aren't paying attention to can come back to bite you," he said. Even "tabletop exercises" can help.

Human factors are also very important, including having a "contact list" for your own personnel and any vendors or customers that could be impacted by a disaster. "It goes beyond just making sure you know how to reach your people; in a large scale disaster, your people may be focused on the safety of their families, so you should think about how you could help them," he added.

Remember, said Snow, disaster recovery testing "is an exercise, without any pass/fail; communication is critical during an exercise and an actual event." So be sure to "issue an exercise report as soon as possible [after the test] and file it for future review."

About this author: Alan Earls is a Boston-based freelance writer and a frequent contributor to SearchDataBackup.

Dig Deeper on Disaster recovery planning - management