SaaS vendor avoids downtime with SIOS high availability failover

News

SaaS vendor avoids downtime with SIOS high availability failover

Andrew Burton, Senior Site Editor

San Francisco-based ezRez Software, an online travel Software-as-a-Service (SaaS) vendor, runs SIOS Technology Corp.'s SteelEye Protection Suite (SPS) for Linux for high availability failover of its PostgreSQL-based platform. According to Michael Bruttig, senior director of technical operations with ezRez, the suite's LifeKeeper software caught an issue that could have resulted in a global outage.

The ezRez platform allows clients to launch online travel services, such as airline, car and hotel reservations, with the look and feel of a customized implementation. Their clients include AirAsia, American Airlines, American Express, JetBlue, Intercontinental Hotels, LAN Airlines, Starwood Hotels and Resorts and United Airlines.

    Requires Free Membership to View

    When you register for SearchDisasterRecovery.com, you’ll also receive targeted emails from my team of award-winning editorial writers. As you know, an interruption can threaten your organization at any time – and it’s our goal to ensure you’re armed with the right tips and information to help you ensure a swift recovery.

    Rich Castagna, Editorial Director

    By submitting your registration information to SearchDisasterRecovery.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDisasterRecovery.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

EzRez's data center is set up in a collocation facility operated by Savvis Inc. The environment is made up of six primary database servers and a dedicated high availability failover server (Dell PowerEdge R610 and C1100), attached to an EMC CX4 Clarion with Fibre Channel. "The LifeKeeper software monitors the LUN assignments so if one server goes down, it moves [that server's workload] over to the failover box," Bruttig said.

"Our data center is on the East Coast, so we rely on Savvis' remote hands service a lot," said Bruttig. "Our cage, at that time, wasn't well labeled, and remote hands moved the wrong server because it was mislabeled."

ezRes maintains separate databases for each of their clients, however, one server contains what ezRes calls a "shared DB"—a resource that all of their clients use. "The [shared] box was unplugged, it automatically failed over, we got a notification from LifeKeeper, but no client calls and no web metrics alerts," said Bruttig. "Our clients monitor us to the minute," he said. "It was probably a 30-second failover. Without the LifeKeeper software, we would have to get into the SAN, move the LUN over, bring up the LUN on the standby box, check the database for consistency, and change the IP on the box." He estimated that it would have resulted in multiple hours of downtime, but could not comment on what that would mean financially for the company.

ezRes has SPS tuned to send SMS and email notifications when a failover happens. "We might actually have it tuned too far," Bruttig said. "We get notifications about connections between all the servers, LUNs, all that good stuff."

After the failover, ezRes ran operations on the failover box until the next scheduled maintenance window, at which point they moved the workload back to the primary server. "There's two ways you can perform the failback [in SPS]," said Bruttig. "There is a command line interface—that's probably the quickest way, because we are PCI compliant and have a lot of restrictions on getting into our production environment. But there is also a really good GUI that's easy to use to move around resources."

As noted above, ezRez's platform is based on the PostgreSQL open-source database management system. SPS' ability to easily integrate with the company’s PostgreSQL-based platform was a major factor in choosing the product. Bruttig also had previous experience using SPS in a position at another organization.

"It didn’t require any file system modification, there were no special scripts to run," said Bruttig. "[Symantec] Veritas [Cluster Server] requires you to switch your file system to their proprietary file system."

SPS is available on either a perpetual license or a subscription basis. Pricing is based on the number of nodes in a cluster. Each node requires a license, and a there is an additional 20% fee per node for support.