Hurricane Sandy, which ravaged parts of the East Coast last month, proved that companies with a good disaster recovery plan don't have to feel powerless, even when they lose power in their offices or data centers.
Prepared firms used a variety of business continuity best
24 Seven Talent, an international staffing company for creative industries, lost power at its downtown Manhattan headquarters on the evening of Oct. 29, and did not regain it until Nov. 3. Doug Feltman, 24 Seven's director of systems and applications, said the office had closed Monday, Oct. 29 as a precaution, but he needed to perform the company's main IT job -- payroll -- the following day.
Because paying its staff is its main cost, 24 Seven handles it internally instead of outsourcing it to a company, such as ADP or Paychex. With staffers around the world expecting checks to go out on Oct. 30, Feltman put the company's disaster recovery (DR) plans into effect that morning.
"When we knew it would be a serious issue, that Tuesday (Oct. 30) around 1 p.m., we started bringing up servers in our DR center in our Los Angeles office," he said. "We know we cannot ever miss our payroll by more than a day because we would lose goodwill with our [employees], and that's what drives our business. New York people would understand if they did not get paid during Sandy, but the rest of America and London, Toronto and Paris wouldn't understand. It would be unacceptable."
24 Seven has Dell EqualLogic SANs in its New York and Los Angeles offices, and also maintains Quorum onQ high availability appliances at both sites. Feltman also keeps a check printer off-site that he retrieved and moved to a shared office space the company maintains at another site in Manhattan with power. The Los Angeles office brought up two database servers required to process payroll using onQ appliances, and Feltman used remote access servers in New York to get the payroll done on time and print the checks to mail.
24 Seven licensed the onQ appliances earlier in October, but had time to test the failover and failback processes before the storm hit. "We tested each recovery node to make sure it would come up OK, and we tested the synchronization between offices," he said. "During the storm, everything came up right away, according to plan. Working with the staff in LA, we were able to bring everything up within an hour."
Cloud DR plans help BUMI
While 24 Seven only had a power outage to worry about, the downtown Manhattan office of BUMI (Back Up My Info) had 35 feet of sea water in the building. Fortunately, the backup and DR service provider uses data centers in Toronto and Kelowna in Canada to store and replicate customer data with NetApp storage and Asigra backup software.
"When the disaster hit our office, we invoked the BUMI cloud recovery plan that we built last year," said BUMI CEO Jennifer Walzer. "We can take backups in our Toronto data center and start up a virtual environment to create the exact environment through Citrix clients that our team could work off of. We brought that up quickly so we could work through our servers."
BUMI employees worked from their homes to restore customer data remotely for clients in the New York metropolitan area. Many of the customers had sites outside of New York that they used to recover to. Walzer said some of her staff had no power at home but could access BUMI remote servers through their cell phones, for as long as they lasted.
Walzer said she sent Cisco IP phones to her staffers at home via FedEx that they could charge if they had power. "Those phones can plug-and-play anywhere," she said. "One lesson I learned was that everybody needed to have one of those phones at home."
She said her staff will not be able to return to their office until sometime in December at the earliest. Now, her nine New York employees work from home, and meet every Wednesday night for dinner to "bond."
"It's the new normal for us," she said.
Affigent fails over before the storm
Affigent LLC, a technology consulting firm for government agencies, escaped the brunt of the storm at its Herndon, Va., headquarters, but failed over to its Chicago secondary data center as a precaution on the first day of the storm. Through its managed service provider Integrity Virtual IT, Affigent redesigned its infrastructure with an eye on DR in 2011. Affigent now uses Zerto Virtual Replication to protect data on its SAN at Integrity's Reston, Va.-based data center.
Matthew Friedman, Affigent's business operations director, said he was prepared for the worst because of advance warnings of Sandy. His management team decided over the weekend they would begin failing over Monday afternoon Oct. 29 unless the storm took a turn out to sea. The final decision came Monday morning.
Affigent's offices never lost power, but Friedman said it would have been up and running in any case.
"We executed the flip over from the primary site to the DR site Monday afternoon," he said. "We had an hour downtime as planned. It took 30 or 40 minutes to move from one data center to the other, and the other 20 minutes were for testing if the applications were performing as they were supposed to. After that, we were fully operational in the recovery site."
Affigent ran its IT from the Chicago site until switching back on the night of Oct. 31, with that process also taking about an hour to switch and test all the applications.
Friedman said Affigent took a better-safe-than-sorry approach because it can't afford much downtime. The firm bids on its contracts, often on tight deadlines. "An outage is a big deal for us," he said. "Our business is transaction-based. We have to get our quotes out, and our customers don't waive deadlines for those. If you don't hit your deadline, you lose the deal by default. If there's a large procurement on the street and we need to answer it and we miss the deadline, it could mean the loss of millions of dollars of potential business."
He said enacting the DR plans gave him confidence if other disasters hit. "This was a precautionary move," Friedman said. "If we lose connectivity, we know we [are] able to execute from our recovery site. If we had an unplanned event like somebody hit a utility pole and we lost power, we would still be able to initiate transfers from a remote site."