Small to midsized businesses (SMBs) have a tendency to rely on their IT staffs' knowledge of the environment and feel they are covered in the event of a disaster. However, that same staff may not be available following a disaster and the company might have to rely on other technical resources that are not as familiar with their IT environment to assist with the recovery effort. This is where all the shortcomings of the disaster recovery (DR) plan can be exposed and why it is imperative to test before this happens.
Besides making sure the plan will actually work when needed, regular DR testing offers other benefits, including:
- An opportunity to maintain and update the plan as the environment changes
- A great DR training opportunity
- A raise in the profile of your DR program and a heightened awareness within the company
- Test results that can be available for auditors to review
What to test?
If you are only testing that your backup software can restore data, then you will only prove that the software works as advertised and this is something you will typically do before purchasing it. This is not to say that you should never test your backups to make sure they are complete and useable, but the ability to restore data does not mean you can recover your IT environment and resume business activity within an acceptable timeframe. It only means you can get the data back.
A good DR exercise should cover all the recovery procedures starting with disaster alert management, the disaster declaration procedure, notification or call tree, chain of command and reporting. These elements are not exclusive to enterprise-class DR plans as SMBs have the same recovery requirements; they need to stay in business.
One of the most ambiguous steps of DR is the decision process around when to actually declare a disaster. Disasters are not always as obvious as hurricanes or fires and, like firefighters -- who rehearse their entire command structure for emergency response -- your staff must be ready to respond and understand the process that leads to the recovery effort. You can't have an unprepared person call a disaster at the first sign of trouble nor do you want undue delays in initiating the recovery effort simply because no one really knows what to do or where to start.
There are various types of DR tests that will yield different results. These include:
Tabletop or walkthrough DR exercise:
This is essentially a rehearsal of the documented procedures without actually executing any hands-on recovery tasks. It is a high-level exercise that is designed to ensure the plan is not missing important steps such as calling vendors, notifying employees, recalling backup media from the vault, etc. It is a good first step into DR testing to ensure the documentation is complete before running a full-scale test.
Hands-on recovery test:
This type of exercise is typically aimed at validating the technological aspect of the recovery strategy. It includes the recovery of systems and data.
Full DR test:
This type of exercise combines the technological and procedural aspects of DR included in the previous example. It is the most comprehensive but also the most involved exercise. It should only be performed once the DR plan documentation was validated and the recovery technology has been tested.
Whatever type of test you perform, the same basic elements must be in place for that test to be successful. These include:
- Exercise Type – Will this be a tabletop, a technology recovery or full scale test?
- Planning – Clear objectives must be set and documented prior to conducting the test. What will be monitored and what constitutes success should be established before the test.
- Scenario – Circumstances and assumptions must be set to provide guidelines for the test.
- Monitoring – The execution of the entire test must be documented and include overall flow, level of readiness, gaps in the plan, ambiguities, procedural errors, etc.
- Debriefing – The exercise must be followed by a debriefing session for all the participants to gather additional feedback and review the results.
- Plan revisions – Probably the most important part of the test is to use the results to revise and refine the DR plan. Without this step, DR testing loses most of its value.
Rarely will a DR test reveal a flawless plan, especially in the early stages of testing. The whole idea behind testing is to expose the flaws in a plan and therefore, it is important not to "cheat." If a recovery procedure is incomplete, it must be noted and addressed instead. Simply making the undocumented change to accommodate the flow of the test will not help anyone in the event of a real situation. A successful DR test will have highlighted flaws in the plan and help move it to the next level of maturity. Even if the entire recovery procedure fails, having identified that failure denotes a successful test.
Since IT infrastructure and technology have a tendency to change rapidly, DR exercises should be conducted on an annual basis or as often as major changes that can affect the recovery procedures are made to the IT environment.
A word of caution
SMB IT resources are often stretched thin, which makes it difficult to find cycles to perform a useful DR test. In addition, the unavailability of a test environment can also unnecessarily put the production environment and the business at risk during a hands-on DR test. Therefore, proper care must be taken to avoid having a DR test turn into a disaster.
Technologies such as server virtualization and data replication offer added benefits to SMBs, which may not have access to an alternate recovery site. They do so by first improving their recovery capabilities and then by providing a replica of the production environment at a relatively low cost without affecting the business.
A DR plan is not complete until it has been tested. The plan does not have to be overly complicated nor does the testing. The objective is to ensure that no matter how simple or complex the plan is it will work when it is needed.
Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection.
Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about writing for SearchSMBStorage.com.