Exploring Microsoft Windows clustering and high-availability tools in disaster recovery

Microsoft has beefed up high availability for Windows, Exchange and Hyper-V, but organizations still turn to third-party tools to reduce complexity and cost.

By Dave Raffo, Senior News Director

While Microsoft Corp. has made great strides adding clustering and high availability (HA) to Windows Server, Exchange and Hyper-V, organizations are a long way from relying solely on its products to ensure high availability.

The expense of Windows clustering can scare off smaller companies, and larger organizations want extra protection offered from third-party replication and failover tools such as BakBone Software Inc.'s NetVault FASTRecover, CA XOsoft, Double-Take Software Inc., InMage Scout, Marathon Technologies Corp.'s everRun, Neverfail Ltd.'s Continuous Availability Suite, Steeleye Technology Inc.'s Protection Suite and Teneros Inc.

When it comes to keeping email running, a belt-and-suspenders approach is often required.

"There's nothing worse than going to the owner of the company and saying 'We'll be down eight to 10 hours,'" said Martin Silverman, director of IT at New York-based furniture distributor EvensonBest LLC, underscoring the need for HA for Exchange and other mission critical apps.


 Exploring Microsoft Windows clustering and high-availability tools in disaster recovery: Table of contents
 Microsoft high-availability upgrades 
 Microsoft Exchange failover 
 Application failover is the big thing 
 High-availability alternatives for SMBs

 Microsoft high-availability upgrades 

Windows has added clustering capabilities for high availability to its operating systems and applications such as Exchange in recent years.  Windows Server 2008 R2 failover cluster makes it easier to set up and manage clustered nodes, and eliminates a single point of failure for a server, cluster or application.

Hyper-V Server 2008 R2 supports clustering for virtual servers for free, and can be managed via the Hyper-V Manager and Failover Cluster Manager, or it can be integrated into a System Center Virtual Machine Manager infrastructure. Clustered Shared Volumes (CSVs) and Live Migration also promise to make Hyper-V more HA-friendly. CSVs remove the limit of one virtual machine per LUN that hinder the high-availability capabilities of the first version of Hyper-V. Live Migration relocates virtual machine hosts with no downtime.

Microsoft also redesigned Exchange 2010 clustering, moving failover to the database level rather than the server level. Exchange 2010 clustering also removes scalability issues from Exchange 2007 by allowing 100 databases per server and supporting 16 copies of a database in one cluster. Exchange 2010 also combines cluster continuous replication (CCR) and standby continuous replication (SCR) into a new type of clustering called Database Availability Group (DAG), which provides automatic failover for local and off-site replication for the first time in Exchange.

Still, Windows clustering remains expensive, and Windows 2008 failover clustering requires Enterprise or Datacenter editions and is not available on Windows 2008 Standard edition. Hyper-V Live Migration needs a storage area network (SAN) or high-speed network. Exchange 2010 looks like an improvement for high availability, but the jury is still out until it is widely tested and implemented.

Third-party applications also have data protection features missing from native Windows apps, such as continuous data protection (CDP), that come in handy when recovering from server outages.

 Microsoft Exchange failover 

EvensonBest began using XOsoft WANsync in 2003, three years before CA acquired XOSoft and rebranded the product CA XOSoft High Availability. Director of IT Silverman said the furniture company replicates Exchange between headquarters in New York City, and offices in Albany, N.Y., Berkeley Heights, N.J., and Washington, D.C., plus about 20 satellite offices all tied into a Multiprotocol Label Switching (MPLS) ring.

More on disaster recovery planning and storage
Iowa Health System uses 'cloud' for disaster recovery to survive flood

What are the pros and cons of using tape storage for disaster recovery purposes?

Disaster recovery essentials: E-Guide on DR planning and testing strategies

Silverman said he wanted to guarantee availability of Exchange, SQL and file servers in the wake of the September 11 attacks in 2001. When branch offices go down, XOsoft software allows headquarters to take over their operations, Silverman said. It helped EvensonBest recover from the Northeast power grid failure in 2003 by moving email to a server in its Albany, N.Y. office. It also kept Exchange running after an explosion in its headquarters required the building to be evacuated for three days.

Silverman said using XOsoft is simpler than clustering Exchange (he's running 2003). "If we had Exchange in a cluster, we would have a SAN as a data store and two boxes to cluster," he said. "In our situation, we have Exchange Server A and Exchange Server B in the same location, and they replicate instantaneously. If the first sever goes down, the second server snaps in within a minute or two. You have the same capability of a cluster, but much less complexity."

His setup also includes other Exchange servers across the WAN in case both New York Exchange servers are lost. "We configure Exchange A and all changes are seamless going into box B," he said. "Box A goes down, B lights up and has all the configuration changes. We only need to ask our end users to stop and restart Outlook so Outlook understands it is connecting to a different server. If we lose New York City completely, the next hop is Albany. The only difference is users in New York will see a slight speed hit, especially on big files, because they're going over the WAN. "

Silverman said he could use his Dell EqualLogic iSCSI SAN to replicate data between sites, but there's no automatic failover."We'd have to script failover," he said. "Replicating the bits is only half the battle. You have to have a way to seamlessly get the boxes switched over. With [XOsoft], that's simple."

EvensonBest is about 70% Windows Server 2008 R2 with the rest Windows 2003, and runs Exchange 2003. Silverman said he didn't consider Exchange 2007's clustering features robust enough for his needs, but will take a look at Exchange 2010. "By the time they get to version two or three of any feature, Microsoft usually makes it competitive," he said. "We'll see where Exchange 2010 leads us."

 Application failover is the big thing

Setting up high availability was one of the top priorities for Lathrop & Gage LLP CIO Ben Weinberger when he moved to the Kansas City, Mo.-based law firm from Florida-based Ruden McClosky in July 2008. Weinberger said his current firm is less likely to suffer natural disasters as his previous one, but nobody's immune to fires or power outages.

He said Lathrop is a straight Windows shop, mostly Windows 2003 with some Windows 2008. He plans to move to Exchange 2010 early next year, in hopes of setting up active-active high availability for headquarters and 12 remote offices.

Weinberger was an XOsoft customer at his former firm, and he's evaluating XOsoft and Double-Take at his new firm. He said he's paying particular attention to how they handle applications. "Everybody does replication now, but application failover is the big thing," he said.

The most important applications for Lathrop are Exchange and SQL, but Weinberger said the firm is running 300 applications. He also prefers host-based data replication over SAN-based replication.

"The advantage of host-based replication over SAN-based replication is it tends to be application aware," Weinberger said. "It knows it's running Exchange; it can check for the consistency of the database. It's also less expensive than SAN-based replication, and you're not locked into identical hardware on both sides. XOsoft seems more slick with the other components, like Assured Recovery, that lets you pause replication for testing. And with its CDP you can roll back to any point in time."

Another key to the equation is the licensing around virtual servers, said Weinberger.

"We were trying to shove as many guests onto a single host box as possible on our secondary site," Weinberger said. "We upgraded our Windows Enterprise license to Windows Datacenter to support a lot of guest virtual machines. By changing to Datacenter Edition, we could run an almost unlimited version of Windows. That was the starting point. Then we looked at what software for replication we could use.

"Double-Take has a good licensing model; you can run an unlimited number of guests. XOSoft did not. CA came back and said they would modify the licensing model so we could run 20-plus guests on a single host. Realistically, we'll probably fit 15 guests on that host. We didn't want to have to buy an individual license for every guest there, that's too much to manage aside from the cost factor."

Weinberger said he's taking a wait-and-see attitude on Exchange 2010's clustering capabilities.

"I'd love to believe that we can get it all done with the native tools, but, we won't truly know that until we see it in action," he said. "CCR/SCR in Exchange 2007 was supposed to negate that need [for third-party HA], but, it did not."

 High-availability alternatives for SMBs 

Cem Kursunoglu, president of San Francisco-based BayNode Technology Consulting, recommends Neverfail's HA solution to his clients. BayNode concentrates on small- to midsized businesses (SMBs) that are almost totally Windows shops.

Kursunoglu said Microsoft's clustering is too complicated and expensive for smaller businesses, and third-party applications scale better and simplify the process. He recommends Neverfail because it integrates better with Windows applications than other alternatives.

"Depending on the customer's budget, we'll use Neverfail for offsite replication," he said. "If there's no budget, we use CCR and put the nodes in the main office and the disaster recovery [DR] site. But that requires manual tweaking if they need to switch to the DR site. Neverfail's compression technology can squeeze a lot of bandwidth. It will transfer the load from one server to the other, not just make a file transfer."

Kursunoglu also recommends Hyper-V 2 to some of his smaller clients. To keep down storage costs, he recommends they run DataCore Software Corp.'s SANmelody or Hewlett-Packard (HP) Co.'s LeftHand Networks SANiQ to turn commodity hardware into an iSCSI SAN.

"Normally if I want to protect an application server, I have to put it in a cluster or use a third-party solution," he said. "Now I can do that with this free tool [Hyper-V], and pair it up with a DataCore SAN." Taneja Group analyst Jeff Boles agrees with Kursunoglu that third-party tools can simplify and scale clusters better than Microsoft's software, especially for off-site replication.

"You can get part of the way there with Microsoft's cluster, but it's primarily designed for in-site clustering," Boles said. "Other products have more sophisticated application capabilities. They support more nodes, a larger pool of servers -- or they massively simplify the process. Microsoft clustering historically has been complex. Products that introduce a simple layer to clustering add a lot of value."

Dig Deeper on Disaster recovery storage