There's increased interest in high-availability (HA) technologies due to the havoc and lost productivity server outages can cause in both physical and virtual server environments. Virtual server technology can alleviate outage pain by making it easier to restart stalled virtual servers and move resource-strained virtual machines (VMs) to different physical hosts before they crash.
In this tutorial, learn about how to choose the right HA solution for your virtual environment. Learn about the best solutions for VMware, Hyper-V and Citrix environments, and the pros and cons of each.
HA SOLUTIONS TUTORIAL: TABLE OF CONTENTS
>> An introduction to HA solutions
>> VMware HA and vMotion
>> Microsoft Windows Server 2008 Failover Cluster
>> Citrix XenServer HA
>> Third-party HA solutions
HA solutions typically employ host-failure detection technologies, resource monitoring, server clustering, shared storage, and automatic server restarts to keep VMs and applications running when hardware or software resources are deficient or fail.
Host-failure detection checks for a physical host or virtual server "heartbeat," which indicates whether or not a host or VM has stopped running. Resource monitoring keeps an eye on used and available physical host resources. Server clustering provides mirrored virtual servers for failover operations.
Shared storage allows restarted servers to access data from failed VMs and is required for all HA solutions. It is the foundation of a virtual-server HA system. According to Jeff Boles, a senior analyst and director of validation services for Taneja Group, shared storage allows the movement of VMs across physical hosts. "Since the storage connection thinks that [it's] still attached in the same way, despite a different piece of physical hardware, you have more flexibility there."
Automatic server restarting technologies allow HA solutions to re-boot failed VMs on the same or different physical hosts.
All three major virtual server environments -- VMware Inc.'s vSphere, Microsoft Corp.'s Hyper-V Server 2008 and Citrix Systems Inc.'s XenServer -- have at least some native tools to deal with both planned and unplanned outages, and there are a number of third-party products that work in conjunction with the virtual platform hypervisors.
VMware's vSphere hypervisor platform utilizes VMware Distributed Resource Scheduler (DRS), VMware HA and vMotion for virtual environment high availability. DRS monitors computing resources including CPU, memory and I/O usage. VMware HA creates VM clusters to monitor VM heartbeats and restart failed VMs. Agents monitor each VM in the cluster and report to a primary host. The primary host determines when to restart failed VMs and keeps track of restart attempts. If the primary host fails, a secondary host is elevated to replace it. VMware HA can be purchased as a component of all vSphere editions.
Live migration tools exist in each hypervisor technology. VMware has vMotion, Microsoft Hyper-V has its Live Migration technology, and XenServer has XenMotion. Live migration technology allows you to avoid crashes due to lagging physical host resources. "[It gives you] the ability to move a virtual machine and its application to any system you want, when you want it," said Marc Staimer, founder and senior analyst at Dragon Slayer Consulting.
If a host's available computing resources drop below configurable thresholds, vMotion can move running VMs to other physical hosts to avoid crashes. DRS is available in the vSphere Enterprise and Enterprise Plus editions, and both DRS and vMotion require VMware's vCenter management console. vMotion is a separate purchasable component of all vSphere editions.
Microsoft uses similar clustering and VM-migration technologies to provide HA for its Hyper-V virtualization technology. Windows Failover Cluster uses resource monitors to check the physical host or VM heartbeat. In the case of a host or VM failure, the remaining cluster hosts will restart the failed host or VM. Administrators choose which physical hosts are included in a Failover Cluster. From the Failover Cluster Manager console, administrators can also select which services, applications, and VMs are to be managed by the cluster.
Hyper-V Live Migration allows you to move running VMs from one node in a cluster to another node without disruption. Live Migration was added to Windows Server 2008 R2 Hyper-V. If you don't have the R2 version, you can still use Hyper-V Quick Migration, which saves, moves and restores VMs and results in some downtime. A Live Migration VM move completes in less time than the TCP timeout for the VM being moved, so users don't see any disruption.
Administrators can initiate a Live Migration VM move using the Failover Cluster management console or from the Microsoft System Center Virtual Machine Manager (SCVMM) console. Anil Desai, an independent IT consultant and writer, said you can also create your own scripts if you are on a budget. "If you are an administrator that has Hyper-V but you don't want to pay for SCVMM, you can do these functions using either GUI tool commands or commands from the Windows PowerShell command line utility to initiate a failover or failback," he said.
Citrix introduced high-availability features in XenServer 5.0. The hypervisor creates Resource Pools, which are similar to VMware and Microsoft Hyper-V clusters. The VMs in the pool are monitored for heartbeats, and when a VM failure is detected, the remaining pool nodes will reboot the VM on another node. XenServer HA is included in the XenServer Advanced, Enterprise, and Platinum editions.
XenMotion is Citrix's live migration tool, and its functionality is very similar to both VMware and Microsoft's live migration tools. However, XenMotion only supports manual migrations out of the box. All editions of XenServer support XenMotion. The Essentials for XenServer, Enterprise and Platinum editions also include Workload Balancing for distributing VM workloads across multiple hosts to avoid overloading a single host.
A few applications have their own high-availability features, and there are a number of vendors that offer products that build upon the hypervisors' native abilities.
Application-specific HA systems have both good and bad points, said Desai. "The benefit there is that you often get efficient methods of high availability and failover and data replication," Desai explained, "but the drawback is that every one of your applications or VMs has to be treated differently based on the type of application you are supporting."
Oracle Corp.'s Maximum Availability Architecture (MAA) is built in to the Oracle Databases, Exadata Database Machine, Oracle Fusion Middleware, Oracle Applications, Grid Control and Oracle Partners. Its Oracle Real Application Clusters (RAC) automatically transfers and rebalances workloads from failed servers to surviving cluster servers. Oracle Data Guard creates, synchronizes, and monitors standby data bases, which can be located at remote sites for disaster recovery purposes. The standby databases can also run queries and backups to relieve demand on the primary database.
Microsoft's SQL Server 2008 also has HA features outside of the Hyper-V failover clustering functions including database mirroring, log shipping, and peer-to-peer replication.
Stratus Technologies Inc.'s quad-core ftServer system for enterprise data centers provides redundant processors, memory, I/O interfaces, power supplies and fans to eliminate any single point of hardware failure. The ftServers can be remotely serviced and have problem reporting. They can even order their own replacement parts. Boles appreciates Stratus' hardware approach. "[It's] a simplified approach versus all these software layers," he explained.
The company's Avance HA software for small- and medium-sized businesses (SMBs) builds a two-node HA cluster with non-identical x86 servers on top of which you load the vSphere Windows or Linux guest OSes. Avance features automated resource sharing between nodes and no requirement for a storage area network (SAN).
Double-Take Software Inc., a Vision Solutions Inc. company, offers Double-Take Availability for vSphere and Hyper-V environments. Double-Take Availability auto provisions and monitors vSphere and Hyper-V environments, captures byte-level server changes and replicates VMs between local or remote hosts in real time, and enables failover to any virtual server configuration. The product requires Windows Server 2008 Standard , Enterprise, or Data Center editions and Microsoft .NET 3.5 SP1 for Hyper-V environments, and VMware vCenter 2.0 or later and ESX Server 3.0 or later for vSphere.
Marathon Technologies Corp.'s everRun MX software for Windows environments also combines two x86 servers into a single OS with redundant hardware for automated HA protection. The product protects applications, VMs, off-the-shelf storage, and network resources, and also does not require a SAN. Marathon's everRun VM for Citrix XenServer provide application and VM protection with automated mirroring and an always-available application instance physically separated from the original VM host.
Other third-party solutions include BakBone Software Inc.'s NetVault Fast Recover, and NeverFail Ltd.
Whether you stick with the hypervisors' native tools or employ third-party HA solutions, there are a number of options to help you keep your virtual server environment up and running despite scheduled downtime or unplanned outages.