Pierre Dorion, data center practice director at Long View Systems Inc., discusses the importance of testing in a disaster recovery (DR) plan and best practices for ensuring the effectiveness of a test.
When we talk about DR planning and we talk about the lifecycle around that, the entry state is disaster avoidance and preparedness. Once you've started building your DR plan, it is not complete until you've actually tested it. That's really the first step to making sure that your plan is effective and complete.
Besides making sure your plan is ready, DR testing also offers a great opportunity for internally training your staff. It creates awareness around DR and allows you to run a potential scenario and make sure that your people are up to speed on all the procedures that need to be implemented in the event of a disaster.
Another aspect is that it ties right into your DR plan maintenance. As you are testing, you are highlighting potential issues or gaps in your plan. But you're also identifying changes that may have taken place in your IT infrastructure as time passes and it gives you a chance to identify changes that you've overlooked, and then adapt your plan without being stuck, should a real disaster take place.
Also, if you look at certain requirements in contractual agreements. Not only do people talk about wanting you to have a DR plan to make sure you're a reliable supplier -- they also want to see that your testing it. Sometimes that may even become the object of an audit or a certain contractual compliance. So it's important to test, even if it's just to demonstrate that you're doing something about it.
I would say that once you've got your plan together, set your goals and objectives. What are we trying to test and what are we trying to demonstrate? So, we need to put together a disaster scenario as part of the planning around the test. What will be our situation? What do we want to test for and be ready for?
Don't do ad hoc testing. You want something to be structured and something to follow along with. This really gives us the opportunity to measure our success. If you don't have a predefined testing schedule, it's very difficult to measure what your success will be. And, of course, we need to establish what the success criteria is and what institutes a successful test.
My first advice would be to keep the test as close as possible to a real situation. A lot of times when a DR test is to take place, you see some folks starting to plan for the test --having things ready, trying to reconfigure the test environments, restoring certain data in advance. If you look at a real disaster situation, you wouldn't get a chance to do all this.
It's important that you do not try to make the test easier. One thing to keep in mind when we're talking about a successful DR test is that the objective is not to demonstrate that everything is flawless. There might be flaws identified in the plan throughout the test and it is important to identify those flaws.
So, it's not necessarily that you restored everything flawlessly and your plan is perfect. This idea sometimes pushes people to try to set things up to ensure that everything is successful and works. The test has to be addressed as a real disaster situation in which you follow your script. This is how you demonstrate that your plan is as complete as possible and how you identify any gaps in the plan.
That's a very good question. When you're about to conduct a DR test, one aspect that should never be overlooked is to make sure that you measure the impact of that test on the company and the business in general. The idea is to protect the company, the IT infrastructure and the business it supports and not to bring it down because you're testing.
When you're preparing for the test, there needs to be "what if" scenarios. For example, if I'm going to mobilize my entire IT staff for a DR test, what is the impact of doing that on the business? There is a lot of planning that needs to be done before we actually run the test to make that sure that we assess any kind of risk.
One good example here is that a lot of times if you are doing a traditional DR test fromtape, you're taking all of your tapes from your vault and bringing them onsite to conduct a DR test. That in itself creates an exposure, and it's a risk that you need to be aware and take proper measures to ensure that nothing happens to the tapes and that they don't get lost. If something should happen while the tapes and the originals are all in the same location, you need to know what kind of exposure that creates.
It has happened in the past where tapes were lost and you must remember that you're taking these tapes from a safe environment and all of a sudden you're bringing them offsite for a test. You need to understand the impact of losing the tapes and take the proper measures to make sure that does not happen.
There are no specific tools that I know of that do DR testing, but there are a lot of tools to assist you in the process. For example, if server virtualization is not already part of your production environment, the ability to virtualize servers gives you a chance to test a DR environment that doesn't affect the production environment. It's always a challenge when running a DR test to try and restore data to some system in a meaningful way without affecting your production.
You can't easily turn around and take your production environment down because you're doing a DR test. It's also very costly to put together a physical environment that is a match of your production environment. Server virtualization really gives you an advantage here in the sense that you can create a good test environment without affecting your production environment.
It also gives you a chance to recreate your production environment at a much lower cost. After all, you're trying to run production so it's quite all right to restore a large number of systems on virtual machines to a single system if we're trying to demonstrate that our recovery script works. That's something you can leverage for DR that really gives us an edge here.
Data replication, especially if it's a tool that's already in your environment, is a way to make your testing a lot easier. This of course depends on what it is that you're testing. If you're already replicating data to an offsite location, you're not necessarily leveraging it as a tool for DR testing -- it is your DR strategy. If you're not using it as part of your strategy, that is a good way to transport data without necessarily exposing your original tapes.
Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and disaster recovery planning services and corporate data protection.
This was first published in November 2008