Uninterruptible power supply (UPS) devices have long been used to protect servers against power failures. Normally,...
a UPS will initiate an automatic shutdown of a server once its battery begins to run low. In a virtualized environment, however, administrators must carefully plan for UPS-initiated shutdowns because shutting down a virtualization host can impact multiple workloads.
Each hypervisor vendor has its own way of doing things, but it is normally relatively easy to make sure that virtual machines (VMs) are shut down gracefully in the event of a power failure.
Windows Server, for example, automatically recognizes the presence of UPS devices and is able to monitor the amount of battery power remaining. Microsoft's Hyper-V also allows administrators to configure automatic stop actions. When the server begins to shut down, Hyper-V can be configured to turn off the virtual machines (a non-graceful shutdown), automatically place VMs into a saved state, or initiate a shutdown of the guest operating system.
Given these capabilities, it would be simple to configure a Hyper-V server to begin shutting down VMs when the UPS device power remaining drops to 50%. However, this is not the end of the story. There are other considerations that must be taken into account.
How long does it take VMs to shut down?
One of the first questions you must consider is the amount of time it takes to shut down the VMs. This can be a surprisingly difficult question to answer because of two factors. First, VMs are portable. A virtual machine can live-migrate from one virtualization host to another, so unless you have an elaborate set of affinity and anti-affinity rules in place, you never know which host is going to be running which VMs.
The other factor that complicates things is that bulk VM shutdown results in major storage demand spikes. The contention for storage I/O can cause the shutdown process to take longer to complete than it might if the VMs were shut down in small batches.
Do other infrastructure components need to remain functional?
UPS devices are able to communicate with servers directly without any external components. However, most hypervisor deployments are clustered. In order for the virtualization infrastructure to remain functional throughout the shutdown process, the host servers must maintain connectivity to storage and to one another. As such, you must take steps to ensure that network switches and storage arrays have enough power to remain online throughout the shutdown process.
Do VMs need to be started or shut down in a specific order?
Sometimes VMs need to be shut down and started up in a specific order. This is especially true for multi-tier application servers with elaborate dependencies. For example, Microsoft's SharePoint Server depends on SQL Server, so you would want to keep SQL Server online until your SharePoint Server has shut down. Conversely, when it is time to bring everything back online, the SQL Server would need to start before the SharePoint Server.
Because such dependencies exist, it is sometimes necessary to launch a script in response to dwindling battery power rather than allow the operating system to begin performing a non-discriminant shutdown of VMs.
What about failover clusters?
Virtualization hosts are almost always clustered. In a failover clustering environment, there are a minimum number of hosts that must remain online in order for the cluster to retain quorum (remain functional and provide failover capabilities).
In the event of a power failure, some cluster nodes will inevitably shut down before others. Assuming that the VMs running on those nodes have shut down gracefully, this should not result in a huge I/O storm. The virtual machines running on those nodes will failover, but the failover action consists of little more than the VMs changing owners because the VMs are not running at that point.
Eventually, enough nodes will shut down to cause the cluster to lose quorum. When this happens, the Failover Clustering service will cease to provide failover capabilities, but the hypervisor will continue to function on cluster nodes that are still online and the VMs running on those nodes can continue to shut down gracefully.
Larger organizations may have clusters that span multiple data centers. In these types of environments, it is possible to respond to a power failure by running a script that live-migrates VMs to remote cluster nodes before the battery power runs out.
Although it is relatively easy to power down VMs in response to dwindling battery power, there are typically other factors that must be taken into account. Shutting down VMs and host servers indiscriminately can have adverse effects on application servers that have a hierarchical set of dependencies.
Right-size your UPS for effective disaster recovery
Keep a close eye on your data center's power consumption
How to choose the best UPS