As data deduplication becomes more prevalent in enterprise data storage, it's changing the way companies look at disaster recovery (DR) and data replication. Deduplication reduces the amount of backed up data, which in turn requires a smaller amount of information to be sent over the wire, making disaster recovery scenarios and data replication less daunting for data storage managers.
The more that information is deduped, the less information is backed up, and that's better for disaster recovery, said Matthew Lodge, senior director of product marketing for information management at Symantec Corp. "Because you've got so much less information, you can use deduped backup as second-tier disaster recovery technology," Lodge said. "You can replicate to an outside site, and use backup to restore the system in an event. If there's less information live in the system [after deduping], the applications are smaller and run on fewer servers. It's much easier to do disaster recovery for those applications."
Dedupe vendors also generally replicate only the changed elements in data that's already been deduped, which accelerates the replication process. "We don't take a block of data as a block," said Fadi Albatal, VP of product marketing at FalconStor Software. "We only take sector level changes."
Full "disaster recovery copies by replication is something that wasn't practical, and deduplication really makes it practical," said Steve Whitner, product marketing manager at Quantum Corp. "Very few people have the bandwidth to move whole backup sets from one place to other." With dedupe, he said, "you could have a DR copy of data by moving a little bit of data," or only what's changed since the last backup, which cuts down on lengthy transmissions over distance to a disaster recovery site.
Taneja Group analyst Jeff Boles said that data deduplication can help companies achieve better recovery time objectives (RTOs) and recovery point objectives (RPOs) at disaster recovery sites. "Dedupe is great for optimizing costs at the DR site," he said. Mission-critical data is a different story, though. "Data you have to restart or fail over in an hour or two, that type of data is replicating outside of dedupe," said Boles. "Dedupe doesn't play a role there yet."
"Ultimately, the DR part of dedupe is going to become more important than the stuff that happens with local restores," said Whitner. He also said that 99% of recoveries are local. "It's rare to have the whole building go away or all the equipment incinerate."
Deduping to tape still in use by some for disaster recovery
Deduping to disk can often cut tape out of the equation, though in disaster recovery planning that's not always the case. "Over the past few years, dedupe has justified for many users taking backup data off of tape as their go-to element for accessing data," said Boles.
"It's a tiny percentage that gets rid of all their tape," said Whitner. "Less tape is easier to manage." But removable media has its place. "If you're truly doing a DR rebuild, people don't do large-scale recovery of systems remotely."
At Pro-Dex Inc., which makes miniature motors, motion controllers and anesthesia delivery systems, IT manager Jamie Rosewitz said they installed Quantum appliances to replace tape backups. "Deduplication has helped reduce the need for backup capacity. We were able to back up more and stop sending tape offsite for disaster recovery protection," he wrote in an email. "We can replicate two sites to one site and that one site replicates back to one of the other sites for a complete off-site DR solution." When they need to restore data, "recovering that data has been faster than tape since all the data is right at hand for the backup software to use," Rosewitz wrote.
The upsurge in data deduplication over the past few years has benefited SMBs and remote or branch offices as well as larger companies. Boles thinks that dedupe has "opened the door to good disaster recovery practices for the SMB market who still can't afford to do DR."
Albatal agrees. "Even the smallest organizations and branch offices are able to replicate data in a cost-effective manner," he said. "They're automating a lot of DR processes based on deduplication. Prior to dedupe, they could only afford one or two full backups -- more than that, it was unsustainable to maintain disk resources."
Albatal said it's important to design deduplication products to ensure faster restores. "With dedupe, data becomes spread all over the repository, so you don't benefit from sequential read," he said. "It's important that you account for that and have your random read process be as efficient as your sequential read."
When it comes to disaster recovery planning, "you get lulled into a false sense of security," said Richard Booth, network manager for the North Kingstown, R.I. school department. They maintain about 3 TB of data in a 60s-era bomb shelter building with concrete walls and explosion-proof glass. But a roof leak was the wakeup call that prompted Booth to explore off-premise backup and move data off of tapes, which had previously required that an employee switch on weekends to capture all backups. North Kingstown moved to ExaGrid boxes, with a plan in the works to link the main data center with another to automate disaster recovery. Booth likes that his most current backup is always in a deduplication "landing zone," enabling quick restores from the previous day.
The future of data deduplication and disaster recovery
The next steps for the deduplication and DR markets might include tackling the potential issue of having siloed dedupe repositories. "The management challenge becomes more difficult over time," said Lodge. "We see a lot of organizations get started with backup dedupe, reduce the size of backup stores and get a quick ROI, but as they get more and more data it's harder and harder to manage." He said Symantec is focusing on integrated management to automate keeping track of multiple copies of data.
Global deduplication repositories let all applications see copies of the available stored data with multiple nodes accessing the repository at the same time. Albatal said that during the replication process, the replication sources are aware of the composition of the global dedupe repository. "You're trying to build a duplicate-free data center," he said, with a clustered, single repository that services all applications at the same time.
Down the road, "I look for all dedupe vendors to become increasingly sophisticated about how they think about data movement," said Boles. "Vendors are getting better capabilities to get deduped data on a secondary repository, and better at moving that data in an optimized format to another repository."
Bob Petrocelli, CEO of GreenBytes, which recently announced a dedupe offering, said that for the product's next release they're focusing on a "prologue to the replication process" that will check to see if replication targets have already received blocks of data to cut down on duplicates. "It's optimized replication on a global basis," said Petrocelli.
The basics of data storage still apply with deduping and disaster recovery. Users often forget "how important it is to actually delete things," said Lodge, as "they're focusing on keeping all this information around for disaster recovery."
About this author: Christine Cignoli is a Boston-based technology writer. Visit her at www.christinecignoli.com.