Online credit card transaction company Cayan set up effective disaster recovery/high availability using only its...
Windows Server and DataKeeper software or for asynchronous replication between two sites.
Cayan's business requires that it stay online without exception, so high availability is as important as recovery from a large disaster. The firm uses Microsoft Windows Server as storage for the SQL database that its customer engagement system runs on. After experimenting with Microsoft's native replication but finding it to be too much work, Cayan avoided buying a SAN by installing SIOS Technologies DataKeeper Cluster Edition software. DataKeeper works with Windows Server Failover Clustering (WSFC) to protect data through real-time block-level replication.
Cayan began testing DataKeeper in late 2013 and put it into production in 2014. At the time, it added a colocation site provided by CenturyLink as part of a data center upgrade. The company uses DataKeeper to asynchronously replicate between databases at its Waltham, Mass., facility and a mirror colocation site in Oak Grove, Illinois.
The secondary site is passive, but Cayan is testing what Paul Vienneau calls a "quasi active-active" setup for load balancing, enabling live traffic at both sites. The database node will be active at one site, and all traffic will go to that site. If the active site goes down, its SQL Server node will fail over to the other site in less than a minute, Vienneau said.
"If the primary database node is in Waltham, the traffic that goes into Oak Grove will be re-directed from the application servers that sit in Oak Grove across the WAN to the database server in Waltham," Vienneau said. "That gives us minimal [recovery time] there as far as getting back up. If there's a disaster scenario in Waltham, the SQL server failover will take in 30 seconds and we'll be live in the Oak Grove site. Because both sites are quasi-active, there's not a lot of heavy lifting other than the database failing over, and away we go."
Paul VienneauCTO, Cayan
Vienneau said Cayan is in the final stages of testing that setup and will soon go live.
Even without the active-active setup, DataKeeper's replication makes for quicker recovery than Cayan had before. Vienneau said it could take hours to restore his database the old way when Cayan relied on a Rackspace managed service.
"There wasn't any of what I call a plausible DR strategy," he said. "We built this from the ground up when we moved out of Rackspace to CenturyLink. We went from zero to 60 with our HA and DR strategy."
Vienneau said Cayan hasn't had any disasters since using DataKeeper but "we've tested considerably."
Vienneau calls his set up high availability instead of only DR "because we can set up fairly quickly in the Oak Grove site, so you can think of it has geo-diverse HA. We have redundancy throughout the entire stack. Because DataKeeper keeps both sites in sync at any point in time and because we can minimize failover, we have two data centers acting as HA."
"It's not a traditional DR where your recover time is two hours. It's taking us less than a minute. We can make it seamless for the end user. They might see a little lag where the cut over is taking place, but they're not seeing any loss of transactions."
Those end users consist mostly of small businesses that rely on Cayan to process their credit card transactions.
"We're a 24/7/365 shop," he said. "We're not afforded a lot of downtime. We process over $100 billion in credit card transactions a year. If we're down a couple of hours, it's not good for us or our merchants."
Mavis Discount Tire goes SAN-free with SIOS DataKeeper