SAN FRANCISCO -- Many users at Storage Decisions this week said that disaster recovery has become their top priority for next year. To refresh their disaster recovery plans, they're searching for new approaches to getting data offsite and fine-tuning the disaster recovery testing process.
The renewed focus on disaster recovery planning springs from the completion of primary storage projects and the need to shore up those new storage systems with better data
protection. "Our company just spent close to $600 million on a new electronic health records system," said Rodney Willms, senior storage engineer for Sutter Health. "The implications for that data are just too big to still be relying on tape." Setting up data replication for that application "is the only thing I'm going to be working on in the next 12 to 18 months, he said."
In addition, the urgency for restore is outpacing the capabilities of older disaster recovery approaches. "Ten years ago if you lost a mainframe, restoring from backup tapes would take about a week or so, and even five years ago one week was usually OK," said Jeffrey Scheib, systems administrator for Arizona State University (ASU). "Now, the 'globalness' of enterprises makes it so that a week means you're out of the game, you're competing with companies on the other side of the world who aren't affected by the disaster."
A constant theme among users was the need to move away from tape to get data offsite. As the amount of data increases, recovery from tape becomes less practical. With the 30,000 tapes it takes to hold the 400 TB of data making up his environment, Dan Stillmaker, director of storage systems for Stanford University, said his shop "would be hard-pressed to meet our two week recovery window -- very hard-pressed." Stanford is currently testing virtual tape libraries (VTLs) from IBM, EMC Corp., Network Appliance Inc. (NetApp) and Sepaton Inc., as well as data replication products.
Another issue is the security of tape and the increasing frequency of high-profile data breaches, as well as regulations mandating new security measures. According to Dan Stafford, systems programmer for Delta Dental of California, his company, too, is moving to VTL and data replication to replace tape and is leaning toward ProtecTier from Diligent Technologies Inc.
In the meantime, Delta Dental is getting up to speed on IBM's key management application for the TS1120 self-encrypting drives it has installed. At issue now is synchronizing keys between dual local key management systems with the goal of separating them geographically. "It's all new technology, and it's a learning process," Stafford said.
Searching for new disaster recovery tools
Some users looking for new disaster recovery tools have encountered frustration. ASU is charged with storing half a petabyte of video footage from the Apollo space missions and replicating that data between two sites for disaster recovery. ASU ran into two limitations on its NetApp filers: a 16 TB file system limit and the fact that researchers move file directories around frequently, something that can wreak havoc on performance when hundreds of terabytes of data are attached.
ASU tried NetApp's OnTap GX product to overcome the file system limit and provide a global namespace, but Scheib grew weary of the fact that OnTap GX doesn't support NetApp disaster recovery tools, such as SnapMirror. Scheib found other clustered NAS systems beyond his budget, especially when it meant losing investments in the NetApp storage already on the floor.
In the end, Scheib designed his own architecture using open source ZFS layered over the NetApp filers, spanning the two locations. ZFS is a 128-bit file system with an exponentially larger namespace, and its performance allows for the quick movement of folders and directories. Because ZFS spans the two sites, data replication can be accomplished by dragging folders from one directory to another. The actual migration of data over trunked Ethernet pipes takes much longer, but users still have access during the process.
The irony of pairing ZFS with NetApp, when ZFS creator Sun Microsystems Inc. and NetApp are suing each other over the file system, isn't lost on Scheib, but he isn't concerned about that. "By the time that's settled, I hope there will be more prepackaged alternatives to meet my particular needs."
Disaster recovery testing tips
During a session at Storage Decisions, analyst Jon Toigo, of Toigo Partners, encouraged attendees to approach disaster recovery testing as an ongoing, piecemeal process, rather than a once-a-year disruption. "There are many ways to test and most are nondisruptive," he said. Toigo advocated simulated failures, checklists and structured walk-throughs over literally pulling the plug on systems.
"Some people think that's the only way to test," Toigo said. "Actually, I don't recommend it. Disaster recovery testing should be a rehearsal that trains people to behave well and respond effectively in 'the smoke of battle.'"
Gerald Rogers, senior storage engineer for Progress Energy, said his company is taking a piece-by-piece approach to developing disaster recovery tests for all of its systems and business units. The goal is to then integrate the pieces into a comprehensive disaster recovery plan by 2009.