According to a blog posted by a VMware official this week, VMware Inc. and Cisco Systems Inc. have demonstrated a proof of concept for performing live virtual server migrations between data centers. Industry experts see the potential for disaster recovery, but say distance VMotion is years away from being realistic.
"Last year, Cisco and VMware began the task of trying to solve these long-distance VMotion issues with the target of seamlessly migrating a virtual machine (VM) between two data centers separated by a reasonable distance," wrote Guy Brunsdon,
Brunsdon wrote that customers often ask whether VMotion, which continues virtual machine operations and maintains user sessions while moving from one physical host to another, can be done between data centers for load balancing, failover during data center maintenance, and disaster recovery -- or, ideally, avoidance. Today, virtual servers can migrate and fail over between data centers, but it requires at least a short disruption to the application to restart the virtual machine.
Users already using virtual servers for disaster recovery said they can see the long-term potential for distance VMotion to provide automated high availability between geographic locations. "We would benefit [from that] if we do data center updates for something like fiber paths, air conditioning or power," said Alex Musicante, network analyst for the City of Pittsburgh.
"We could definitely use such a tool -- it's one step closer to the dream of true grid technology, rather than islands of resources," said Nasser Mirzai, technology vice president for Tradebeam Inc.
"In theory, this could get us to nondisruptive DR," said Burton Group analyst Chris Wolf. "Historically the only way to do that is with synchronous mirroring."
Major hurdles before production viability
Today, however, "in theory" is still the operative phrase. As VMware's Brunsdon himself wrote, "This, of course, is a non-trivial thing to do. There is the challenge of moving a VM over distance [which involves some degree of additional latency] without dropping sessions. To maintain sessions with existing technologies means stretching the L2 domain between the sites -- not pretty from a network architecture standpoint. And then there is the storage piece. If you move the virtual machine, it has to remotely access its disk in the other site until a Storage VMotion (SVMotion) occurs."
Getting that SVMotion synced up with the VMotion and sending data over distance in time to avoid a disaster with data intact will be the big task for VMware to work out going forward, Brunsdon wrote. "Remember, this is a proof of concept, so we still have work to do in multiple areas."
Being able to move a virtual machine over distance without dropping user sessions essentially requires synchronizing data that's in memory used by the virtual server at both ends of the wire, and that is the chief rub when it comes to making distance VMotion work. Wolf said he sees VMware potentially drawing on its existing shared-memory technology to make that happen. Today, when virtual servers share memory resources on a physical hosts, VMware's ESX server deduplicates the common data they store there, such as operating system images that are the same. "If they can eliminate duplicate operating system and application libraries, they could shorten the time it takes to get that data over the WAN," he said, but warned, "It's a pretty early concept -- especially if you're talking about using this for preemptive enterprise DR. I'd say it'll be the 2011 to 2012 timeframe before it's a viable option."
EMC, NetApp and VMware user Tom Becchetti, who asked that his company not be named because he is not authorized to represent it in the press, doesn't see it happening over long distances even by then. The 50 mile simulation from VMware and Cisco "still won't cut it for people working in, say, San Francisco or Texas, where they have to get data further away than that [for disaster recovery]," and Becchetti doesn't see VMware overcoming the latency issues for wide-area VMotion in the near future. "Strictly because of the laws of physics, I don't think that all hosts are going to be able to go over distance."
But for short distances, "It's certainly doable, but you also need a good process around what's viable, what will work, and some practicality to it according to uptime requirements."
Musicante also said he saw some risks involved. "If you lose connectivity during the SVMotion part of the process, it could be devastating," he said. "Spanning Layer 2 over multiple data centers is not a particular issue for us, but it's definitely an understood security risk."