Too many disaster recovery (DR) plans end up being a documentation exercise and never rise above the day-to-day priorities of the business. Those that make it to the DR testing phases often encounter problems that if not properly addressed leave a bad mark on the whole DR processes. Here are the top reasons why
Your disaster recovery plan has a failure of scope
One of the biggest challenges with performing DR testing is determining scope. Are you testing the correct hardware platform compatibility, software versions, data storage, data integrity, communications, etc? If you plan too big, you are more likely to miss some of the small and very important details. On the contrary, if you plan too small, you won't test enough.
The best way to solve the scope problem is with a comprehensive analysis that identifies areas that could fail -- you could even set up a wiki and have various teams add information in real time. With a full view of the points of failure, you can develop different strategies for how things can and should be tested. A biannual test will never give an organization all the coverage it needs -- it must be supplemented with daily tasks to make sure you are ready at any time. Your DR test should be integrated into your daily operations.
Your disaster recovery plan fails to focus on data restoration
It is too easy to focus on the system restoration tasks (e.g., restoring servers, checking images, rebuilding configuration files, application installs, etc.) of a disaster recovery test and take for granted that a periodic test of your data backup tapes is sufficient. While being able to read and restore a backup tape is important, if that tape doesn't have the right data on it, the testing will have given you a false positive.
A DR plan is only as good as the business continuity plan that it supports. The testing should have a clear nexus to the data that supports the business. This testing should be performed more frequently and in more detail than what would normally be done in an annual DR test. Business changes too fast and data is stored in too many different places -- the most complete DR testing has a component that occurs on a frequent basis and includes the proper integration with the business owners to ensure the data captured on backup is the right data.
Learn from the mistakes in your disaster recovery plan
No DR plan is perfect; organizations change too fast and are have too many environmental risks, from employee turnover to natural disasters. Because of the inherent inability of the DR plan to be perfect when it is tested, there will be problems. If there are no problems, that should be a sign of insufficient testing. Problems and errors found in DR testing can be viewed as a negative thing if not understood and acted on correctly. Has the organization equated zero errors with a successful recovery test? If so, this is a problem because it leads to under-testing (i.e., testing only what you know will succeed) or hiding/minimizing errors encountered.
When testing a DR plan, something should fail, that is why we test things -- to identify the failure points in a safe environment. I'd much rather flunk a practice test than the real thing. The opposite is also true; acing a practice test can give a false sense of security. The important thing to understand during failures in DR testing is to be successful in learning from the mistakes/failures. For example, if you go to test the plan and realize that no one has any idea how to get the offsite storage, or if it was backed up with software that no longer exists - consider yourself lucky that you found this in testing. Embrace and learn from the failures to avoid them in the future.
As with most problems in life, disasters can't be avoided, but advance preparation goes a long way in minimizing the impact. Organizations must learn how to avoid failures of improper scope and lack of focus on data restoration. They must also learn to embrace the failures experienced when testing occurs to ultimately achieve the promises of a true DR plan -- being prepared for the unexpected.
About this author: Russell Olsen is an IT professional with a solid business foundation. He has a wide range of experience including CIO, VP of Product Development, VP of Operations, and Senior Auditor for a Big Four accounting firm performing technology risk assessments and Sarbanes-Oxley audits. Russell is a CISA, GSNA, and MCP.
Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about
writing for SearchDisasterRecovery.com.
This was first published in January 2009