Looking at trends in disaster recovery (DR) planning from 2009, it has become clear that virtualization has become...
a critical consideration in DR planning and design. A 2009 DR survey from Symantec Corp. found that 64% of organizations are re-evaluating disaster recovery plans based on virtualization. But why? The benefit of leveraging virtualization is obvious: more virtual servers at DR sites means reduced capital expense for disaster recovery as fewer physical servers need to be purchased, thus mitigating a major pain point in DR -- the expense of idle assets.
However, capital cost reduction is only one of the benefits that virtualization brings to DR. Virtualization can also help to simplify the failover and recovery process, shorten recovery times, improve the likelihood of recovery, and enable more comprehensive DR testing. With this windfall of potential improvements, it makes sense that as virtualization becomes more widely adopted in production environments, DR planners are reexamining traditional disaster recovery approaches and practices.
VIRTUALIZATION BEST PRACTICES FOR DR PLANNERS: TABLE OF CONTENTS
At its most basic level, the disaster recovery process involves moving data from a source to a recovery location, bringing up systems and other infrastructure at the remote location, starting and validating applications and making the applications available to users. Virtualization, with supporting tools and technologies, can play a substantial role in each of these activities. But before considering the various technical options, it is important to consider these key questions:
- Are your recovery point objectives (RPOs) and recovery time objectives (RTOs) defined? This is a critical first step and should be addressed before exploring any technical options, as they will have a big impact on technology direction.
- What are the critical applications in your environment and how many are there? Certain applications may have unique requirements for disaster recovery and may lend themselves to specific solutions. Scalability considerations will also influence technology direction.
- Is data replication already being used in your environment? If so, a DR solution that supports the existing approach may be preferred. For example, some solutions interoperate with storage array-based replication, while others provide their own replication capability.
- How virtualized is your environment? Some solutions are targeted specifically to virtualized environments while others can accommodate physical servers as well. If a single approach for both physical and virtual systems is required, this will impact the solution.
- To what degree should/must the solution be automated? More automation can mean faster recovery, but may require better controls and testing. It's great if needed, but paying extra for automation features that can't be leveraged effectively may not be a good idea.
Virtualization can minimize or eliminate hardware dependencies, which goes a long way toward simplifying DR by removing constraints like the necessity of like-hardware at primary and disaster recovery sites. However, in order to fully take advantage of virtualization for DR requires some means of moving or copying virtual machine (VM) images and data to DR locations. If you want the shortest recovery time, this means you'll have to explore data replication options.
Fundamentally, there are two approaches to replication: storage-based and host-based. For larger enterprises, the predominant approach is to replicate data between data storage arrays. Some virtualization technologies, most notably VMware's Site Recovery Manager (SRM), are specifically designed to support this approach. Site Recovery Manager supports a variety of SAN- and NAS-based storage products that can replicate data at the LUN (logical unit number) level (for SAN) or volume level (for NAS). It automates elements of disaster recovery planning, management, testing and execution within a virtualized environment. SRM can also go beyond data movement and virtual server image copies. It can be used to establish DR policies to manage prioritization, startup sequencing of VMs, fix IP addresses, and generally control the entire end-to-end disaster recovery process. Additionally, it leverages storage array snapshot capabilities to enable non-disruptive testing.
It should be noted that the true value of Site Recovery Manager is its ability to automate error-prone tasks. Given that replication is handled by the storage subsystem, all of the functionality provided by SRM could be accomplished manually. However, disaster recovery processes always come with challenges. For example, IT personnel are called on to perform duties that are often outside of their normal day-to-day activities and to complete them under perhaps the worst possible circumstances. The idea that some standardization and consistency can be applied to such a scenario should not be underestimated.
If storage-array replication is not feasible or preferred, there are three main host-based replication technologies available for replication in virtual environments.
First, the Vizioncore Inc. family of products from Quest Software represents a suite of tools that include data backup and replication optimized for VMware environments. Its vReplicator product performs replication on an individual virtual machine basis, thereby enabling simple, selective replication in environments where production and non-production VMs may co-reside in the same ESX cluster. The product is designed to move the minimal amounts of data necessary thereby minimizing bandwidth requirements.
Second, for environments where both physical and virtual server replication is required, technologies like Double-Take Software Inc. and CA XOsoft offer a variety of functions. A popular option for replicating Windows platforms, Double-Take software also supports both VMware and Microsoft Hyper-V virtual environments making it a worthwhile consideration for hybrid physical and virtual infrastructures. CA XOsoft provides application-aware replication capabilities for Exchange, SQL Server, SharePoint and more.
And third, both Double-Take and XOsoft offer continuous data protection (CDP) options, making it possible to "roll back" to earlier versions of a file or database. This not only offers valuable protection against logical data corruption, data which replication alone does not provide protection, but it could potentially obviate the need for traditional backups. In addition, CDP can help to simplify the process of creating independent data copies for non-disruptive disaster recovery testing purposes.
One additional feature of the host-based replication products is the ability to enable hybrid physical-virtual DR designs. While often not ideal, some organizations may prefer deploying key applications on physical servers at their primary site, but find it acceptable to run virtually at their DR location. The host-based offerings discussed here have the ability to convert and replicate between physical and virtual (P2V) servers and back (V2P), thus supporting a hybrid DR model.
While this list is far from complete, it is intended to illustrate the variety of options available to make the job of disaster recovery more effective in terms of both capability and cost. There are other products and even other approaches, but the recommended path for developing a DR strategy is to begin by understanding the range of recovery needs. Factors such as the sheer size of the environment and numbers and types of applications to support will begin to provide some direction to the kinds of solutions that are likely to fit. Business needs should drive recovery objectives and a set of desired DR service-levels defined. Then, a logical architecture should be designed to meet disaster recovery and scalability objectives, and appropriate technology options should be evaluated and selected.
For many, the process of DR planning and design may still bring to mind unpleasant memories of past forays into this area. However, the disaster recovery world, particularly with possibilities brought forth through virtualization, is being redefined for the better.
About this author: Jim Damoulakis is CTO at GlassHouse Technologies, a leading independent provider of storage and infrastructure services. He can be reached at firstname.lastname@example.org.