One of the features that the current version of Hyper-V is missing is the ability to create replicas of virtual machines. Microsoft plans to introduce both synchronous and asynchronous virtual machine replication capabilities in Hyper-V 3.0, but that release is still likely to be about a year away. In the meantime, organizations which seek to create virtual machine backups will have to use third-party products for Hyper-V high availability.
There are two main solutions for virtual machine replication in Hyper-V environments. Organizations can perform hardware- or software-based replication. You will likely have to make some choices pertaining to the replication architecture. This article is will discuss some of the replication options that are available for Hyper-V.
Although replication between two on-premise hosts is useful, organizations often replicate virtual machines between a primary and a secondary datacenter. That way, if one datacenter happens to get wiped out as the result of a disaster the secondary datacenter is available to take over operations.
Synchronous vs. asynchronous replication
One of the most important decisions around Hyper-V high availability that you will have to make pertains to whether you want to perform synchronous or asynchronous replication.
Synchronous replication can be slow but it has the advantage of being able to guarantee that any data written to the primary data center has already been replicated to the secondary data center.
But that reliability requires considerable network bandwidth, and network latency can limit the host server’s performance in both the primary and the secondary data centers.
Because latency is of such a concern, synchronous replication is generally considered to be suitable only when a high-bandwidth connection is available between two data centers that are separated by about 100 miles or less.
It is easy to assume that replicating virtual machines means copying data from the primary data center to the secondary data center. However, things work a little bit differently than you might expect when synchronous replication is used. When a write operation occurs the data is initially written to a cache. The storage cache then writes the data to the storage array at the secondary data center.
Only when the write operation has been confirmed at the secondary data center is the data written to the storage array at the primary data center. The write operation cannot be considered to be complete until the data has been written to the primary data center’s storage array.
The fact that the server must first write the data to a remote storage array means that performance within the primary data center is directly tied to the speed at which data can be replicated to the secondary data center.
Asynchronous replication offers much better performance than synchronous replication does, but it does have the potential for a small amount of data loss.
The process works by sending data from the primary data center to the secondary data center in batches. These batches (which are sometimes referred to as deltas) include all of the write operations that have been performed within a specific length of time. In other words, the Hyper-V host at the primary data center is free to write as much data to the storage array as might be required, and the write operations are in no way impacted by the speed of the replication process. At periodic intervals copies of all of the new write operations are replicated to the secondary data center and applied to the replica storage.
The problem with this type of replication is that data is not replicated in real time. Therefore, if a failure were to occur in the primary data center then there is a chance that the replica in the secondary data center might not be completely up to date. The amount of data that could be lost varies depending upon the replication frequency.
Hardware vs. software replication
While it is important to decide whether you want to use synchronous or asynchronous replication for Hyper-V high availability, it is equally important to decide whether to use a hardware- or a software-based replication solution.
Hardware-based replication typically occurs at the storage level. The nice thing about this type of replication is that you might be able to simply enable a replication feature on your existing hardware without having to purchase anything. Storage-level replication almost always optimizes the replication process by using data deduplication and/or compression.
The main disadvantage to using hardware-based replication is vendor lock: storage replication may only be supported if all of the hardware from the same vendor.
Another issue is that a hardware-level replication solution is not going to be Hyper-V aware. While this may not necessarily cause a problem, it does mean that the replication process will not be optimized for Hyper-V.
Software-based replication has the distinct advantage of allowing for greater flexibility. Rather than an organization being locked into using whatever replication features their storage vendor happens to provide, software-based solutions give organizations the freedom to choose the solution that best meets their needs.
Like hardware solutions, many of the software-based replication products offer data deduplication and compression features. Some of the software-based replication products even allow you to choose between byte and block-level replication. As a general rule, byte-level replication is more efficient because it has less overhead. However, block-level replication (which is sometimes based on VSS) allows for rollback and point-in-time recovery capabilities.
These types of capabilities could prove to be especially useful in Hyper-V environments. While it is true that Hyper-V offers a snapshot feature that allows point-in-time rollback, the use of snapshots is generally discouraged because they impede system performance.
Hyper-V snapshots work by redirecting write operations to a special type of virtual hard disk file called a differencing disk. When a read operation occurs, Hyper-V first looks at the differencing disk to determine whether or not it contains the requested data. If the data is not found then Hyper-V attempts to read the data from the original virtual hard disk file. As such, snapshots have a direct impact on virtual machine read performance. Performance further diminishes as more and more snapshots are created. Block-level replication could conceivably provide a way for organizations to achieve virtual machine rollback capabilities without incurring the performance penalties of creating a snapshot.
Keep replication optimized for Hyper-V
It is usually best to use a software replication solution, especially if it has been optimized for Hyper-V. Software-based replication products generally offer more features and greater flexibility than hardware-based solutions.
About the author: Brien M. Posey, MCSE, has previously received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the Department of Information Management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.