vaso - Fotolia
Jason Buffington, Jon Toigo, George Crump, Marc Staimer, Paul Kirvan
Published: 04 Dec 2015
Year after year, we read surveys that indicate many organizations are still not confident in their ability to restore normal business operations in a reasonable timeframe following a disaster event. Regardless of statistics that tell us every minute of downtime can cost a business umpteen thousands of dollars, many shops haven't yet established a corporate disaster recovery plan they know they can rely on.
If you are in this boat -- come on, it's OK to admit it -- or if you just need a refresher on DR best practices, we've got you covered. We put together a group of trusted data protection experts, to drop knowledge on five essential parts of any corporate disaster recovery plan.
Class is in session, people. Listen up.
You can't restore what you don't protect
There's no getting around it: The first step of BC/DR planning is to ensure the survival of your data; and the basis for most data protection strategies is backup, meaning a recurring copy of data stored on another on-site system. But to protect against a site-wide outage or disaster event such as a fire or flood, you must send an additional copy of data offsite. Traditionally, this meant sending tapes offsite and, for many organizations, it still does. But shops are increasingly turning to replication to accomplish this task.
Modern backup software products enable this approach in a number of ways. Many products integrate with storage system snapshot and replication functionality to send data offsite. Others offer native replication functionality. Almost every backup software product on the market today has some way to send data to the cloud as well. And, virtual server backup products even have the ability to replicate entire virtual machines to a secondary site or cloud.
The method you choose to replicate copies from your on-premises backup software to your BC/DR site/service will directly affect your recovery speed. So, first establish your recovery time objective, or the amount of downtime your organization can tolerate, and then choose the backup approach that meets those needs.
For example, if you replicate data to an off-premises storage system, you have a storage array with your data on it, but you don't have immediate access to applications. But if your backup server replicates the data to another server instance -- as is the case with many virtual server backup products -- you can run applications off-site while restoring your on-site infrastructure.
BIO: Jason Buffington, senior analyst, Enterprise Strategy Group
Get [data] out of town!
While replicating data to an off-site system or service offers benefits, it comes with a number of challenges. Whether your strategy is to replicate the data across a WAN to a secondary site or a cloud-based DR service provider, there are a number of things you need to consider in your business DR plan.
First, the location of your DR site/service is critical, because the radius of severe weather disasters often exceeds several hundred miles. For example, in 2012, Hurricane Sandy spanned more than 1,100 miles. If you opt for a DR service, the cloud data center to which your data is being copied may be fairly close to your primary facility. Be sure to check it out.
The network can also be a hurdle. Replication exposes you to issues related to bandwidth, latency and jitter when transmitting over distance across a public shared network, and data deltas (differences between the data states at source and target nodes) that can make copied data unusable as a foundation for recovery. Also, restoring a large amount of data to your primary site in a timely manner over the Internet is difficult or impossible. The network link and bandwidth that enabled you to copy data to the remote site in dribs and drabs may prove woefully inadequate to the task of transporting all of your data back to you in a short timeframe following a disaster… and that is assuming the network is still operating in the event of a regional catastrophe, which is by no means assured. As such, it is important that your corporate DR plan consider whether you will be able to continue operations in some manner from your off-site location.
So, if you want to use disk-to-disk copies for DR, you might consider using a tape-based copy as a safety net.
BIO: Jon Toigo, president, Toigo Partners International
The hard part: Bringing data back
Backup is a priority for most organizations today, but ironically, establishing a restore plan is often overlooked. However, a well-documented plan for restoring data and applications is an essential part of every disaster recovery plan. Restores are obviously required at the most stressful time -- when something is broken and it needs to be fixed fast. In other words, the pressure is on. So, having a well-architected corporate disaster recovery plan is critical to meeting your RTO.
The key is to know in advance which servers should be recovered, because not all data, servers or services have the same business value or operational requirements. Know what data needs to be protected best and restored first, and understand your recovery capabilities. For example, will recovery require you to move data from off-site storage before restoring applications? Or will you be able to continue operations in some way from your DR system/service?
For most organizations, resuming critical day-to-day operations involves recovering less than 5% of the servers and 5% of the data. These are typically critical databases and the applications used to access those databases. The remaining 95% of data is reference or archival data. While often important, it is not critical to resume immediate operations. This data can be recovered later as time allows. It is important to understand the application eco-system, though. For example, if an application is accessed through a Web interface, it is likely that the application, the database and a Web server will all need to be recovered. As a result, it is important to prioritize recovery in application groups.
Finally, it is important to understand if these applications can be recovered to a virtual environment since many of the top 5% applications are not virtualized while in production. If they can be virtualized, this will make the recovery effort easier, requiring less hardware at the DR site. But if they must remain bare-metal for compatibility or performance reasons, then part of the corporate disaster recovery plan has to factor in having a standby server available to recover to.
It is also important to map out a variety of restore scenarios. Your recovery plan for restoring data if your primary systems are completely destroyed will look different from restoring operations following a power outage, for example.
BIO: George Crump, president, Storage Switzerland
Mid-term exams for your DR plans
Every IT pro worth their salt knows deep down that you need to test a corporate disaster recovery plan. It goes back to that old saw: Plan the work and work the plan. Unfortunately, many don't follow through or test in an effective manner. Testing is the only way to reveal flaws, issues, problems, shortcomings, mistakes and holes in your restore plan.
At least once a year -- preferably twice -- a truly representative test should be conducted. Representative could potentially mean recovering all of the mission-critical applications, their data, systems and processes. Or it could mean less. The key is to make the tests as close to reality as possible.
Take the example of restoring a mission-critical relational database. This might require you to bring up that database on a different a physical, virtual or cloud platform. How long does it take to get the database application consistent, up and running? Are there automated (preferred) or manual tests that verify the data is sound and that the database is running properly? If not, how will it be known if the test succeeded or failed?
After the database is up and running and everything is good, the test is not over. It is imperative to test VPN re-routing, DNS updates and so on. Giving users access to applications and data is just as important to test as bringing up the database.
BIO: Marc Staimer, CDS, Dragon Slayer Consulting
Care and feeding of your plan
BC/DR plans should be considered "living documents" which need to be periodically reviewed and updated to ensure the plan is accurate and the procedures defined in the corporate disaster recovery plan will facilitate recovery when performed.
DR plan maintenance and updating activities should be implemented on both a scheduled and an ad hoc basis. The former activity establishes regular content reviews so that plans are examined at least annually and preferably twice a year. The latter addresses real-time changes in the business that could have a material effect on corporate DR plans and their associated program activities.
Change management is a formal process that ensures changes to a product, process or system are introduced and implemented in a controlled and coordinated manner. While many organizations have a formal change management process, disaster recovery is rarely included. To ensure that DR plans are kept up-to-date, include them as part of the overall change management process.
As noted in the previous section, corporate disaster recovery plan testing is an essential tool to validate the plan and its procedures. Following each test, examine the DR plan to see if lessons learned from the test can be used to update the plan.
BIO: Paul Kirvan, independent BC/DR consultant, Paul Kirvan Associates
DR planning for your small- to medium-sized business
Avoid these DR planning pitfalls
What your disaster recovery plan must include