The two vendors first published a proof of concept of distance VMotion in late June via the blog of VMware's senior technical marketing architect for networking Guy Brunsdon.
If it can achieve practical applicability, distance VMotion has the potential to let customers carry out disaster avoidance plans when hurricanes or other disasters are forecasted rather than waiting until after a failure. It could also enable workload balancing over multiple data centers automatically and easily move applications in and out of internal and external clouds.
VMware will support customers if they deploy distance VMotion using the Cisco network, but its support statement includes extensive fine print. For starters, it requires a minimum network bandwidth of 622 Mbps, or an OC12 connection. And that's only the beginning of a long list of caveats.
Storage VMotion over distance a thornier problem
Moving the application data used by virtual hosts in a distance VMotion scenario remains a more significant hurdle. During a presentation with vCloud service provider partners Tuesday, a demo was shown of a running SQL Server being migrated between California data centers in Sacramento and San Jose. The server, using Dell Inc.'s DVDStore e-Commerce application to simulate 16,000 concurrent transactions, continued to run although there was some performance degradation during the migration.
Cisco manager of product marketing Balaji Sivasubramanian said the demo had included a 5 GB data store that remained at the primary data center and was accessed by the migrated SQL Server over the wide-area network (WAN).
Customers currently have two choices for moving application data along with VMotioned servers over distance, according to a presentation Wednesday afternoon by Chad Sakac, EMC Corp.'s VP of VMware technology alliance. They can access data at the primary location using WAN acceleration to cut down on latency while stretching the SAN fabric along with the IP network over both locations. Or, they can perform Storage VMotion of the data store prior to the VMotion of the virtual machine.
Both of these options can be done today, but Sakac acknowledged application latency is an obvious factor in the first option. In the second example, data migration time is much more extensive than the time it takes to move the virtual machine. In the VMware/Cisco/EMC test bed, the Storage VMotion of a 300 GB SQL server took 15 minutes over 200 kilometers. But the Storage VMotion time can be orders of magnitude longer than that in real-world use.
Glenn Exline, enterprise network manager for Computer Sciences Raytheon, based at Patrick Air Force Base in Florida, already had a stretched SAN over 40 kilometers between the base and a secondary location in Cape Carnaveral, Fla. As an experiment, Exline said, he tried distance VMotion of a 500 GB SQL server between the two locations last year.
"Moving data with Storage VMotion -- that's where the latency will truly kill you," he said. Exline said the storage migration took 14 hours over a 2 Gbps fabric using Finisar long wave SFP transcievers to transmit the data.
For smaller WAN pipes, F5 Networks and NetEx Inc. demonstrated products they claimed could cut down on latency and bandwidth issues while moving data. NetEx claims that its UDP-based WAN acceleration can maintain local VMotion performance and speed Storage VMotion times. F5's Big-IP 10 was also positioned in a demo on the show floor as an alternative to Cisco's extended IP network. Big-IP also includes data deduplication and compression to cut down on migration times. However, VMware so far only supports Cisco network extension or WAN acceleration products for distance VMotion.
VMware vice president of product marketing Jon Bock said VMware is working with other partners to leverage mirroring and replication to improve Storage VMotion over long distances but no announcements are imminent.
EMC developing active-active storage, but still in early stages
Sakac said EMC is working on a stretched single-array model of Symmetrix V-Max and a plan for supporting active-active Virtual Machine File System (VMFS) at both ends of the wire to make data stores in two locations look like one. However, these improvements for Storage VMotion over long distances remain a long way off.
Cisco's Sivasubramanian said even after the storage and networking problems are solved, policy management techniques for virtual machines must be updated to make policies follow VMs over distance.
VMware senior staff engineer Shudong Zhou added more caveats in the presentation of VMware's official support statement in Wednesday's session. VMware Fault Tolerance is latency sensitive and not supported across sites today. Because read and write access is required on both sides of the wire, VMware HA is not supported because of the potential for a "split-brain" condition, with VMware HA automatically booting up virtual machines on both sides of the wire. Similarly, VMware Distributed Resource Scheduler (DRS) doesn't incorporate the idea of multiple locations, which means it could put the VM and storage at separate sites as it tries to load-balance for performance and power management among hosts.
"This is basically still a proof of concept," said Andrew Storrs, principal consultant for Storrs and Associates. "Companies looking to do this at this stage will be large companies, which tend to have separate storage, server and networking teams, and manually re-routing policies around migrated virtual machines requires coordination among all three of those areas…the whole thing seems like it supports a rare use case, and it sounds like a heck of a lot of work to make sure everything goes off smoothly."