Khunatorn - stock.adobe.com

Best practices for a strong disaster recovery testing strategy

Testing is a critical part of the disaster recovery planning process. Without proper testing, IT teams might miss crucial updates or make avoidable missteps in a recovery.

Stuart Burns

By

Stuart Burns

Published: 29 Jan 2024

Good disaster recovery testing comes from thorough planning and preparation. An untested plan is another crisis waiting to happen, so it is critical to have a disaster recovery testing strategy in place.

Full disaster recovery plan testing is not something many organizations can do frequently. To plan and execute a disaster recovery test requires two valuable resources: time and money. For that reason alone, DR teams must be realistic in how many tests they can execute each year. Most major applications are only end-to-end tested once a year at most. Some applications can be tested once every three years. It depends on the DR team's requirements.

This places disaster recovery teams in a dilemma: If they can't test often enough, critical applications or processes could miss out on necessary updates. However, if they spread themselves too thin with extraneous testing, they risk using up the aforementioned valuable resources. A testing strategy must be almost as thorough as the recovery itself. This will ensure DR teams don't miss out on any required changes and can use even limited resources to the fullest.

To get the most out of a disaster recovery testing strategy, consider incorporating these best practices.

Determine the type of test and plan accordingly

Disaster recovery testing comes in two types: full DR test and component test. The difference is that component tests are smaller in nature and test a subset of the application. Most component tests are effectively a smoke test to help ensure the smaller parts of the overall application are working before committing significant resources to a full-blown DR test.

Before talking about the technical aspects of the test, it's critical to understand what is being tested. Is it a full interactive disaster recovery test with users being asked to log in, perform in a crisis scenario and prove that the application works as expected? Or is it enough to verify that the systems and software are available? Depending on the tools or processes in an organization's DR plan, it might be necessary to perform a full run-through of the plan to test how it will run in a crisis.

Ensure everything is in place early -- and double-check

It might seem trivial, but not checking key components before running a full test is one of the most common and preventable mistakes organizations make. The point of a DR test is to ensure things work as expected, but when there is a fix that can be done outside the full test, it's worth it to check that everything is all set beforehand. This is one area where component testing can come in handy.

A frequent example is when an IT team discovers that required firewall ports are not open. This is something they might find during the full DR test, but it's still easier to check ahead of time to preserve time and resources. Remediating firewall issues can be a frustrating process, and it's likely not something security and networking staff want to deal with in the middle of running an end-to-end DR test.

Good documentation is evergreen

The importance of good documentation is paramount. If a DR test is done by less experienced staff, they might face and resolve several problems along the way. However, if they don't document those issues and the remediations, that loss of important information can significantly affect the speed of the DR test or real recovery.

There are four types of documentation DR teams must have for a strong testing strategy:

The current DR plan as written, with discrete steps and a schedule.
Notes on any issues that came up during testing and how they were fixed. If there was a temporary workaround, outline what it was.
Detailed documentation of the testing process. This should include what is being tested and by whom.
Admin sign-off on test completion.

Don't bypass thorough wrap-up and reporting

It might seem simple, but post-test reporting is where many DR teams fall short. Unfortunately, this is the task that has the most impact and presence to the management level.

Management is not often interested in the nuts and bolts of IT, but relaying the success or failure at a high level is a complex undertaking. This is especially true when a production system is taken down to test a DR scenario. Just like with a real disaster, IT teams should create comprehensive documentation throughout the process to inform management of how the test went and any areas they must address.

To avoid overloading management with technical details during wrap-up, timely communication of high-level status during the test is critical. Keep in mind that some DR tests can be quite lengthy in execution, spanning 24 hours or more. Ensuring those key stakeholders stay apprised of what is happening keeps them happy and shows good communication.

Stuart Burns is a virtualization expert at a Fortune 500 company. He specializes in VMware and system integration with additional expertise in disaster recovery and systems management. Burns received vExpert status in 2015.

Next Steps

Disaster recovery plan best practices for any business

Dig Deeper on Disaster recovery planning and management

Part of: The essential guide to BCDR testing

Up Next

Best practices for a strong disaster recovery testing strategy

Testing is a critical part of the disaster recovery planning process. Without proper testing, IT teams might miss crucial updates or make avoidable missteps in a recovery.

What are 5 good reasons to do yearly disaster recovery testing?

Do you think yearly disaster recovery testing is overkill? You're not alone, but you are missing out on some key ways DR testing can help backup and recovery efforts.

Top 5 IT disaster scenarios DR teams must test

While most organizations are prepared to face small-scale interruptions, they cannot overlook a larger, more complex crisis just because it seems less likely to occur.

Free business continuity testing template for IT pros

Business continuity testing can be a major challenge for any organization. This free template offers ways to incorporate testing into the business continuity management process.

8 data protection challenges and how to prevent them
Businesses contend with a combination of issues spawned by data overload, privacy regulations, access rights, cyberattacks, cloud...
Commvault acquires Appranix for recovery automation
Appranix, Commvault's third acquisition, provides automated recovery services for cloud applications including configuration data...
Data protection vs. security vs. privacy: Key differences
Data protection, privacy and security might look alike but their differences can make or break a comprehensive compliance program...

Cyber-resilient storage a final defense against ransomware
Features to enhance storage cyber resiliency should be table stakes for buyers, experts say. But enhancements are needed to stave...
Hammerspace reaches everywhere with erasure coding
Hammerspace has sped up its global file system everywhere it touches, even on white box hardware, with the addition of erasure ...
Could AI be the killer app for cold data?
Enterprises are beginning to use cold data they still need to store as a way to train AI models and gain more value from data ...

Automated patch management: 9 best practices for success
Automating the patching process is almost a necessity, especially in large organizations. Here's why, plus pros and cons, tips ...
4 steps CISOs can take to raise trust in their business
When CISOs align their investments with CIOs' tech investments, both can fuel business success and enable greater trust with ...
CISA: Akira ransomware extorted $42M from 250+ victims
The Akira ransomware gang, which utilizes sophisticated hybrid encryption techniques and multiple ransomware variants, targeted ...

CIO

Ally's generative AI strategy eyes multiple LLMs, AI agents
The digital bank plans to privately host multiple LLMs on its GenAI platform, explore autonomous agent technology and evaluate ...
States act on privacy laws as Congress considers new bill
The American Privacy Rights Act introduced this week aims to establish a national privacy standard that would preempt state ...
CHIPS and Science Act funds TSMC, Intel projects
The Biden administration has awarded billions through the CHIPS and Science Act to five companies to invest in building and ...

Close