How does data deduplication play a part in using disk backup for disaster recovery?
Data deduplication is very similar to WAN optimization. At the disk level, it recognizes identical data segments and creates references and pointers to it. If it sees a segment of blocks or sequence of bytes, it will write to disk a much smaller pointer instead of the entire data segment. Data deduplication is a good replacement for tape media without spending a fortune on disk. Also, if you're considering using disk as your backup target instead of tape, instead of having to buy a lot of disk space, it allows you to reduce your data.
The one thing you need to watch out for with data deduplication is that it's not very well suited for short-term retention backups because the efficiency of data deduplication relies on history. The more it recognizes an identical byte sequence, the more data reduction you get. So, for example, if you back up the same word document 20 times, there ends up being a large portion of that word document that never changes, so instead of having 20 times that file, deduplicating it makes sure there's only one unique byte sequence. Deduplication reduces data storage 15:1 or 20:1. In some cases you could even double that.
A lot of deduplication technologies are replication capable. By replicating deduplicated data, network bandwidth requirements are reduced by copying or replicating a reduced data set across the network. In the end, when data deduplication is combined with WAN optimization, you can significantly reduce your data stream to two locations.
What about continuous data protection (CDP)? Who should consider that?
Some people confuse CDP with mirroring, or your traditional data replication. Continuous data protection is a point-in-time copy. Any kind of traditional backup will do a point in time copy whether it's to tape or disk. At the other extreme, mirroring is a constant copy of all the changes that are made to the data. The problem with mirroring is that if the original copy is corrupted, the replicated copies will also be corrupted. CDP, on the other had, captures every bit of info it changes, and allows the user to create certain recovery points as opposed to traditional backup where you only have one recovery point. In the example of corrupted data, CDP gives the user the ability to roll back certain changes to the point where the data was valid.
Who should use this? These technologies are meant for people who need continuous protection, but also need to recover data at a certain point in time. It really depends on how your recovery time objectives (RTOs) are defined. If you have very stringent requirements and have the ability to roll back in a granular fashion, CDP is better than mirroring, which has its limitations in the sense that it copies blindly without knowing if the data at the source is usable or not.