The largest electricity and natural gas distribution company in New England has already gotten to a point many IT pros are still only dreaming of: a cohesive disaster recovery plan and an infrastructure to match. But reaching infrastructure Nirvana, as Northeast Utilities discovered, is only the beginning of effective disaster recovery.
The company has around 500 TB of data on EMC DMX 3000 arrays at its main data center just north of Hartford, Conn., and a secondary site south of Hartford. Northeast Utilities serves Connecticut, Western Massachusetts and New Hampshire through subsidiaries Connecticut Light and Power, Public Service of New Hampshire, Yankee Gas and Western Massachusetts Electric Co.
Four years ago, Northeast hired EMC to help it classify its data by application, determine the recovery time objective (RTO) and the recovery point objective (RPO), and set up failover between data centers, said Ed Goldberg, the company's business continuity/disaster recovery coordinator. Currently, Northeast has four tiers of recovery: Tier 1 has a two-hour recovery time objective. Because it's reserved for the company's most mission-critical applications that control the electrical grid, the failover process to hot standby servers at the secondary data center requires no manual intervention.
In addition to setting up the disaster recovery plan, EMC consultants helped conduct its first live test several years ago. But since then, the company's been on its own with testing, and that's where the rest of the disaster recovery battle began. While the disaster recovery plan held still, like most infrastructures, Northeast's data centers did not.
"Over the years as we've added new systems and done technology refreshes, it's become more difficult for us to do real DR tests," Goldberg said. A detailed tabletop exercise still takes place annually, but the company hasn't been able to do many live failovers over the last two years, owing to the nature of the company's business. "We can't cause an outage for the test, and we don't have a means of failing back once the server's over at the secondary data center."
With this issue in mind, Goldberg met a representative from Continuity Software at a disaster-planning conference last August. Continuity Software offers prospects a free evaluation that promises to find misallocated storage; enough to pay for the software's licensing. "They came in and did their foot-in-the-door scheme," Goldberg said. "But whether or not we've misallocated storage isn't where we saw the value – we see the value in having a scan of our disaster recovery capabilities every night."
But there was another catch. "I told them, you run it," he said. "We didn't have the resources to go around deploying software licenses and agents, and monitoring it every day." Continuity Software offered to manage Northeast's disaster recovery monitoring at its facilities, connecting to Northeast's network through a private VPN and calling if there's a critical problem. Issues that don't need an immediate response are brought to the company's management during a conference call once a month. The service has been in production since January.
"They've already found gaps," Goldberg said, such as forgetting to associate a certain disaster recovery copy of data with a new server after a hardware refresh. "And if we can explain to them why we meant to configure something a certain way, they'll squelch the alert on it so it doesn't keep coming up."
Goldberg wasn't able to disclose what he paid for the service, but Continuity Software has since opened up the remote monitoring service to any customer through its Disaster Recovery Assurance offering, which was announced in February. The list price is $3,200 per protected server per year. "It's less than the cost of having a consultant come in once a month, and this also checks up on us every night," Goldberg said.
Goldberg admitted there was some nervousness about allowing a third party to access its network. Careful proof-of-concept testing using network "sniffers," while Continuity Software monitored the systems and firewalls that remain between Continuity Software and anything it's not supposed to access, put that to rest.
In the meantime, there are some items on Goldberg's wish list, chief among them mainframe support. "The mainframe is a more stable system, both by nature and because there's less fingers in it – the open systems are always in major flux," he said. "But I'd love for Continuity to be able to monitor that environment also."