With the increasing availability of replication products, many are turning to this technology as part of a disaster recovery (DR) strategy to address tighter recovery time objectives (RTOs). W. Curtis Preston, Vice President of Data Protection Services at GlassHouse Technologies, discusses replication in the DR space in this FAQ. His answers are also available below as an MP3 download.
Table of contents:
>>Leveraging replication for DR
>>Synchronous replication vs. asynchronous replication
>>Overcoming the limitations asynchronous replication
>>Best practices for effective replication
Replication is just simply the concept of copying data as it is changing over to another site. So as you change a file that changes a few blocks, then the idea is that those blocks get sent incrementally over to another system that then overwrites the same blocks on the target system. So, you continually have an updated copy of what you have on one site, living on another site.
For those people who have tried to use tape and traditional methods of backup and have found that they're simply not able to meet their RTOs, those companies have gone to replication for DR, especially for their critical applications. So basically they're using synchronous or asynchronous replication so that they constantly have a copy of their data offsite. This way, they don't have to do a restore when they're in a DR situation. They simply have to switch whichever copy of the data that they are using.
It's not so much pros and cons as it is that each one is appropriate for a different application. Synchronous replication is meant to have the updated copy continuously, completely up to date. It doesn't give the acknowledgement of the write to the application until that write has been copied over to the target site. So that's going to make sure that the target system is always up to date with the source system.
The disadvantage to that is that it can create latency in the application because if the round trip from the source system to the target system takes a long time, you're going to be slowing down your primary application. So that's what asynchronous replication is for; it goes ahead and tells the application that the bite has been stored and then it asynchronously is copying that over to another site.
The challenge with that is (depending on a factors such as how busy your application is versus how well your connection is) that you can be anywhere from seconds to hours behind the source machine. That will result in a loss of data. So it is a tradeoff of how much data you want to lose versus how much you want to slow down your primary application.
Generally, people use synchronous only when they're going local or almost local, where something is only a couple of miles away. As soon as we start talking about a double digit number of miles, people generally switch to asynchronous replication because it's simply too much of a hit on their primary application.
For true DR, you've got to be outside of the blast radius. So if this is Florida and you're liable to get hit anywhere in Florida with a hurricane, then your DR sites needs to be outside of Florida. If this is California and you want to be outside of the wild fire range, then you've got to be outside of California.
As soon as you start doing that, then you have to go to asynchronous replication. The challenge again is; how far behind do you want it to go? There are a lot of technologies to help make long term trips faster, to minimize things like the number of round trips. You don't want your asynchronous replication to work exactly like it was local because when you believe the storage is right next to you, you do things like ask eight questions when you could ask all eight questions as one transmission.
So basically I would say that it is absolutely an alternative, what is the other alternative; go to tapes and FedEx them? That certainly is not going to be as good as asynchronous replication. I would add to that CDP is as well, which is basically replication with a back button, so it has the ability to go back in time.
At a minimum, this stuff needs to be monitored. Probably the biggest problem most people have is that they set up that initial replication and then they're not doing regular monitoring. I would say that it starts with having consistent replication storage.
There are lots of different ways to replicate, whether it's storage array replication, hardware in the SAN replication or software on the host replication. If at all possible, pick one, or at least stay with a common brand. One of the things to do when you select that brand is check out what they have in terms of monitoring and reporting and letting you know when you're behind, a link has been broken or when the target system that you're replicating to isn't there anymore because somebody turned it off.
Also, to make replication affordable, a lot of people are doing are making the target system an older system that has been moved out of the data center or a less expensive storage system using Fibre Channel in the primary site and SATA in the secondary site. You must understand that there will be a significant difference in performance when changing the storage, because it's where a lot of the performance comes from.
W. Curtis Preston is the Vice President of Data Protection Services at GlassHouse Technologies and is the author of "Using SANs and NAS" and "Unix Backup and Recovery," the seminal O'Reilly book on backup.