When disaster strikes, or site outages occur, there's absolutely no substitute for a carefully crafted, well-orchestrated...
and thoroughly tested disaster recovery (DR) or business continuity plan. The following list is a helpful reminder of the key points to consider when developing a DR plan.
Make sure to back up all necessary application-specific agents and facilities to capture application states without missing anything, or that shadow volume copy techniques work properly to achieve a workable recovery. Checking on application state, function and integrity should be a key objective in all recovery practice sessions.
Backup systems that use image or other backup tools and capture system state and configuration information as well as file system snapshots. Such images should roll onto bare-metal installs (as long as source and target hardware are 90% or more identical, and more is always better). Store backups offsite, preferably across the WAN to a server of your own far away, or on someone else's server through a remote backup and recovery service.
In Windows Server 2008, you can make use of its improved Windows Server Failover Clustering (WSFC) abilities to support geographical clustering, so that a clustered server may include one or more server elements at another site. Microsoft doesn't provide replication tools necessary to replicate a storage subsystem across geographically dispersed nodes, so you'll need a third-party solution. Not all server applications and services work with WSFC, so be sure to investigate and test thoroughly before committing to this architecture.
If your environment depends on Active Directory (AD) and its domain controllers, recovery of related servers and their data is essential to restoring operations. Because AD holds the keys to the network --especially access controls, permissions, account and group information, and policy data -- it's imperative to make AD recovery one of the first stops on the road to recovery during practice sessions.
Windows recovery, especially from backups, depends on strong congruity between the hardware in the source and target systems. As DR teams conduct practice runs, they should acquire or obtain replacement hardware and make bare-metal restoration of server backups to this hardware part of their practice drills.
This is less critical for an environment with active mirrors or hot standbys, because all those issues will be addressed already. Don't forget that repaired and restored systems will require you to repeat product activation, so be sure to have product keys or license data handy.
Practice not only makes perfect, it also makes sure that your staff and your equipment can perform a successful recovery if a real disaster strikes. Practice at least once a year, if not more often, and perform a postmortem to identify and make necessary recovery plan adjustments.
Make sure you have well-defined data recovery needs and objectives and that these get exercised thoroughly in your recovery practice sessions. Be sure to establish clear and well-defined recovery point objectives (RPOs) and recovery time objectives (RTOs) and then make sure you can meet them in practice as well.
Make sure you've got sufficient backup capacity and practice in dealing with restore and recovery for networked storage solutions, whether you use a storage area network (SAN) or network-attached storage (NAS), or both. Microsoft hasn't released a successor to Windows Storage Server 2003 yet, but the company's System Center Data Protection Manager 2007 should make a good alternative to the older product.
DR requires careful planning, but implementing a recovery also requires lots of testing that generally occurs during practice sessions. Testing the accuracy, validity and coverage of a DR plan is the only way to make sure that it results in a real recovery, and that RPOs and RTOs are met.
Uninterruptible Power Supplies (UPS)
Although they can't help to avert real disasters, UPS are essential for temporary power problems and outages, and backup power generation and failover can help avert even lengthy power failures, or give you more time to recover operations at a remote site. That said, make sure that UPS is an ingredient at both primary IT sites and at recovery sites, whether hot, warm or cold.
About this author: Ed Tittel is a freelance writer based in Round Rock, TX. He writes regularly about Windows and networking topics for various TechTarget Web sites. His most recent book in this area is Windows Server 2008 For Dummies (Wiley, 2008, ISBN-13: 9780470180433).