Putting together an effective disaster recovery (DR) plan can be a complex and arduous process; these 10 DR tips from Jon Toigo will help you through the process.
The truth about disaster recovery (DR) planning is that it's a complex and typically underfunded undertaking focused on building a continuity capability that's best organized, not by disaster scenario, but by business process. It's also a task best pursued by an enthusiastic, optimistic and tenacious practitioner well versed in project management techniques (and Middle East peace negotiations) who will work with business stakeholders. One final truth: DR planning must have senior management backing or it won't succeed.
Since we still have some space available, here are 10 essential DR planning tips that you ignore at your own risk.
1. Make a backup. This may seem obvious, especially given that approximately 75% of the world's data is currently protected by copying it to tape and then removing the tape to secure off-site storage. But if you read the stuff that I read or lurk around the back doors of storage analyst conferences and vendor seminars, you've certainly heard many declarations that tape is a deceased technology.
The simple fact is that tape isn't dead, and it's a lynchpin of a successful recovery following just about any disaster. Its price/performance metrics are great and getting better: There's no faster way to write data, no higher data density per raised floor tile, no other media with the reliability of tape and it's dirt cheap. Even if you're mirroring disk to disk locally or replicating asynchronously between two stands of disks, send a backup of the data to tape just to be sure. If you're wondering why, just ask Google, Amazon, the Commonwealth of Virginia or any of the other organizations that have recently brought their systems back to life via tape data restore, despite their investments in lots of disk mirroring gear.
2. Break a mirror. If you're using synchronous disk mirroring (or async replication), break the mirror and check for data deltas. Nobody tests their mirrors because it's a huge hassle. You must quiesce applications using the storage, flush the cache to write data to disk A, replicate it to disk B and then turn everything off. Next, you need to do a file-by-file compare between the primary and secondary disk, and if you like what you find, you can cross your fingers and restart the mirroring or replication process hoping that everything synchronizes again.
Why should you bother with breaking the mirror? Simple. Data's physical location on disk has a tendency to be moved by storage administrators (or these days, server administrators) who may not appreciate the importance of updating the keeper of the DR plan. Thus, you might be mirroring the wrong data or even blank space between disks. At a minimum, you need to know how latency and jitter affect your replication process; these can lead to significant deltas (differences between original and copied data) that can make your recovery data useless.
3. Get real about data archiving. It may seem like moving older data off your primary spindles to an archival repository is outside the scope of DR planning, but you should understand how a little data archiving and grooming can reduce the workload demand on data protection services, making them more efficient. Based on our analysis of more than 3,000 companies, approximately 40% of the data stored on every hard drive in a typical shop is archival quality data. It needs to be retained, but it isn't accessed and could be moved off your spinning rust and onto an energy-efficient platform like tape augmented with the Linear Tape File System (LTFS). Set up an effective archiving system and purge the 30% of data on your disk that's junk, duplicate data or contraband, and you could recover up to 70% of the capacity of every spindle you own. That just might bend the storage cost curve at your company, and management will love you.
4. Consider storage virtualization. Forget what you heard back in the late 1990s when storage hardware vendors spilled so much ink condemning software-based storage virtualization as an ineffective technology that would burn up budget bucks with little benefit. Since then, several array makers have transitioned their own products into stands of disk trays topped by 1U rack servers running RAID software and centralized value-add applications under a Windows or Linux OS. For the money, a software-only offering beats a hardware-centric play.
What does it have to do with disaster recovery? Storage virtualization engines -- or storage hypervisors, as they're more fashionably referred to these days -- provide a convenient software layer for consolidating the assortment of data protection functions that are applied in various ways to different data to deliver "defense in depth." That, in turn, simplifies the management of data protection services and enables them to be selectively applied to different data workloads based on requirements.
5. Try a restore. The biggest problem in DR is when you recover all your backup data only to discover that you don't have everything you need to bring your application back to life.
DR templates to help get you started
These free disaster recovery (DR) planning templates and many others are available at SearchDisasterRecovery.com.
- Business impact analysis (BIA). Use this template as a guide to the questions you should ask during a BIA.
- Risk assessment. After identifying critical apps with your BIA, use this template to determine what internal or external events might disrupt those applications.
- Emergency management plan. This template will help you determine what organizations or individuals will need to be part of the emergency management process.
- Service-level agreement (SLA). This template will help you determine the level of participation expected of disaster recovery participants.
- Business continuity template for SMBs. Small businesses typically have more constraints when assembling a DR plan; this template will help you through the planning process.
It isn't enough to have your mailbox data to recover Microsoft Exchange; you also need the mail software, the right .NET version, the ESE or CRCL files, and the software for your hub transport, client access server, unified message server and Active Directory roles. Are you capturing all this data? Try a restore and find out.
6. Set up a virtual tape library (VTL). Fundamentally, a VTL is just some disk on which you can store 30 days' or so worth of data that has also been copied to physical tape and moved off site. The benefit of a local disk repository is the ability it provides to quickly restore individual files that have become corrupted without having to restore an entire file set from tape. You can also use post-processing deduplication to squeeze the data on your disk and reduce the capacity requirements. Post-process dedupe is usually free with your backup software.
7. Test your plan. Do an "ad hoc" tabletop exercise. Put some sticky notes on various pieces of hardware in your data center or on the monitors of your personnel indicating software or hardware failures. Then call your DR team into a meeting room and walk through the procedures to address the mock disaster scenario. This is a lot cheaper than scheduling a formal DR test event and it allows you to test procedures in a linear, sequential way that provides a great rehearsal for recovery team participants.
8. Be proactive. Maintain logs of server downtime and the root causes for downtime incidents. This data is better than generic data for ensuring that you continue to retain management support for the continuity capability. Over time, you may be able to show how your disaster prevention measures have improved uptime or mitigated what was previously protracted downtime.
9. Check in with DR plan stakeholders. Regularly contact the stakeholders in your DR plan, such as the business process owners and IT management, to see what changes are coming in the next quarter. Many potential disasters can be minimized or avoided if you know about things like new business initiatives (e.g., plans to launch a new marketing campaign), new equipment deployments or other technology rollouts such as a new virtualization scheme planned for the near term. These types of events can upset a DR plan. Contingency plans should be created to cope with disruptive changes; plans should be retested and updated after any significant application or infrastructure changes.
10. Polish your rhetorical skills, especially euphemisms. With lean economic times there's usually a reduction in the amount of management interest in funding business continuity strategies. It isn't that the business is less important, or that dependency on automation has dropped off in a tough economy. In fact, the opposite is true: Do more with less means fewer staff are even more dependent on the proper operation of the machine. The issue is a simple matter of spending money where it will produce the most return on investment, and DR plans are an insurance policy that in the best of circumstances will never need to be used. So, if management is losing interest in DR, call it something else. Call it software quality assurance, your new technology test lab or your cloud strategy pilot -- whatever will get you the funding you need to continue operations.
These 10 DR tips will help you to keep your continuity capability on track and in line with business requirements in 2013 and beyond.
About the author:
Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.