Disaster recovery testing is something everyone talks about but few actually do. But your disaster recovery plan is useless unless you know it actually works.
However, according to Cameron, the best possible test for a large, complex environment is to completely switch your production workload between the primary and backup data centers. "Ideally, you are set up so this takes place dynamically as part of a replicated, fail-over environment. If it is not dynamic, you should switch your workload at least once a year, actually running production from your backup site."
In this article, find out how often you should perform a disaster recovery test, if a virtualized approach to disaster recovery testing is right for you and what other disaster recovery testing best practices you can apply to your disaster recovery environment.
Disaster recovery testing best practices table of contents:
How often should you perform a disaster recovery test?
At the City of Mission Viejo, in Calif., their disaster recovery testing is conducted on an annual basis, supplemented by quarterly reviews and is comparatively simple, thanks in part to the fact that the municipality runs a VMware environment with a disaster recovery site about 50 miles away in San Diego. According to Jackie Alexander, director of information technology, Double-Take (now part of Vision Solutions) DR software, Vizioncore vReplicator (now Quest Software) and VMware Inc. and are the three products used to replicate along with EMC Corp. Data Domain boxes.
"In our testing it takes us about two hours to bring San Diego up as a full site to provide all our services -- and our basic services are up even faster," she said.
The key, according to Alexander, is a detailed, how-to document that is updated quarterly. "We knew we didn't want a DR plan that would just sit on the shelf," she said. That updating process is done manually each quarter -- though some quarters there are actually no changes. Once it is done, it undergoes "dry run" testing by the IT staff to make sure the document makes sense. "We do that by running through the entire IT staff to make sure they know how to activate a DR event and when and what to activate," she said. In addition, Alexander said she involves a small number of individuals from outside of IT. "Most of us live 20 or more miles outside of the city, so in a disaster we might need people who live locally to help out," she said. "We activate them during our dry runs."
However, involving outsiders comes with a caution. "At first, we took it for granted that they would know how to do certain things or would understand certain things that we knew. In fact, we had to rework the document and then test it to make sure they could really follow it," she explained.
And testing is extensive. "We can do it at any time because as long as we aren't replicating it doesn't impact the production environment; we stop short of that," she said. Gasiamis also pointed out that a disaster recovery plan should include vendors and other "outside" resources that may need to be contacted during an event.
Alexander said her organization does not use any special tools, just the capabilities included with the backup and DR products.
A virtualized approach to disaster recovery testing
Kevin Laudan, an IT manager at Silicon Genesis in San Jose, Calif., said virtualization can be a part of the disaster recovery testing picture, and presents its own opportunities. "Testing your disaster recovery into a virtual environment makes testing and retesting much easier. The virtual environment will take more time to set up, but it will be worth it in the long run," he said. Laudan said his production environment is a mixture of virtual and physical servers and the DR system depends on using virtual-to-physical and physical-to-virtual products. "There are a whole bunch of products in the background such as Symantec Backup Exec and System Recover Server. In our disaster recovery plan, we make the worst-case assumption that the building is gone and the scenario calls for getting backup hard drives with company data to a safe facility," he explained.
Laudan said he generally just tests the "most critical aspects" of disaster recovery and has found, happily, that the virtual machines "are less of a hiccup" than the physical machines to restore. "This can be a way to really simplify DR if you make sure to test," he said.
Disaster recovery testing best practices
Another option for disaster recovery testing is to simply hire a business continuity (BC) consultancy, said Steve Gasiamis, an experienced risk management consultant, at HEIT Inc., an IT managed services provider. Gasiamis said when developing a DR testing plan, he starts with a business impact analysis and then categorizes everything based on that and focuses primarily on testing the most critical elements. For his clients, most of whom are in the banking industry, testing frequency -- another critical factor in ensuring that DR plans work -- is generally driven by government (FDIC) requirements.
Norman Snow, an independent business resilience consultant located in the New York metropolitan area, said one of his disaster recovery testing best practices is that DR testing should engage on the technology level, with IT and related functions, as well as with the business unit -- to consider continuity of operations and processes along with validation of actual data availability. Also, he recommends testing semi-annually, once with IT alone and the other with business unit involvement.
'Be prepared' shouldn't be only the motto of the Boy Scouts but the mantra of IT in this day and age
Independent business resilience consultant
Different aspects of disaster recovery testing also need to be prioritized, he said; test what is critical for restoration, recovery and continuity of operations. Don't take short cuts. "Take notes regarding issues that come up and how they have been resolved," he said. Then, he suggested, "Conduct a post-mortem afterwards with all involved, including vendors where necessary, to address lessons learned and areas for improvement."
"'Be prepared' shouldn't be only the motto of the Boy Scouts but the mantra of IT in this day and age," said Snow. "Falling into complacency is the danger." To avoid complacency, regular testing is the key. Test at least annually or semi-annually. And, said Snow, any time there is a significant change in applications or processes -- anything that will be part of the production environment -- you probably need to test that as soon as possible.
Of course, budgets aren't unlimited. Therefore, Snow advises focusing on the most critical aspects of your operation for full-scale testing, and then in the "off quarters" spend some time testing other aspects of your systems. "You should be especially alert for code dependencies because the things you aren't paying attention to can come back to bite you," he said. Even "tabletop exercises" can help.
Human factors are also very important, including having a "contact list" for your own personnel and any vendors or customers that could be impacted by a disaster. "It goes beyond just making sure you know how to reach your people; in a large scale disaster, your people may be focused on the safety of their families, so you should think about how you could help them," he added.
Remember, said Snow, disaster recovery testing "is an exercise, without any pass/fail; communication is critical during an exercise and an actual event." So be sure to "issue an exercise report as soon as possible [after the test] and file it for future review."
About this author: Alan Earls is a Boston-based freelance writer and a frequent contributor to SearchDataBackup.