In an ideal world, what would disaster recovery tests entail?
In my perfect world, we would have defined strategies for data protection and restoration that could be tested in real time, on an ad hoc basis, using either simulated or "live" procedures. Backups to tape should be subjected to read/write verification to ensure that the data replicated is the right data and that it can be restored when needed. Data replicated to disk also must be verified routinely and not as a part of some formal test event.
Ideally, we would select strategies for application rehosting and for network reconnection that also avail themselves of testing at any time during the normal operating day, and without disrupting normal operations. Geoclustering holds out the promise of such a strategy, as do, to a certain extent, server and storage virtualization techniques. Again, you should be able to confirm that system and network recovery capabilities are up to the task without waiting for a formal test event to find out.
If we could accomplish these goals, formal testing would come down to a much simpler set of tasks having to do with logistics -- how the disaster would be identified, who would be contacted and in what order, how customers would be notified, how teams would travel to a recovery facility, how externalized services (e.g., clouds, hot sites, vendors tasked to drop-ship recovery supplies, the phone company, user facilities, and so on) would be activated, how the order of recovery tasks would unfold. That could be tested in a very linear fashion and at much less expense than traditional testing entails today.
This was first published in July 2013