By Megan Kellett and Andrew Burton
Remote replication is the process of copying data across a wide-area network (WAN) to a disk array at a secondary location. It is suited for organizations that have
In this tutorial on replication and disaster recovery, learn about the different types of replication options, how WAN optimization can work with remote replication, and trends in replication and disaster recovery today.
REMOTE REPLICATION AND DISASTER RECOVERY TUTORIAL: TABLE OF CONTENTS
>> Array-based vs. host-based data replication tools
>> Network-based replication
>> Synchronous replication vs. asynchronous replication
>> WAN optimization
>> More people are replicating their data, but not correctly
In array-based replication, the replication software runs on the storage array's controller. It is often used by medium- and large-sized firms, because these companies have deployed storage arrays that offer data replication as a built-in feature or an option. And most people look for that "checkbox feature," said Jon Toigo, CEO of Toigo Partners International.
Steven Ross, executive principal of Risk Masters Inc., thinks array-based replication is the way to go. "It's much better if you're SAN-based [for disaster recovery]," added Ross. "Because you say, we've got all the data we need for all our applications in one format."
However, there are several downsides to hardware replication. According to Toigo, array-based replication can be difficult to test and can be expensive due to the investment in hardware. Array-based replication solutions typically require homogeneous storage systems at the primary and disaster recovery site. "[Host-based replication], which is accomplished by running replication software on a standard server, is much less costly than buying proprietary hardware and having to commit yourself to the same vendor's gear at all different locations," said Toigo. Also, host-based replication is typically performed in real time and restores can be faster.
However, while host-based replication can work in heterogeneous storage environments, not every product is supported by every hardware vendor, so it is important to check for compatibility. Host-based replication can also impact the performance of other applications running on the same server.
>> Editor's Tip: Learn about the pros and cons of array-based replication in this tip.
Network-based replication occurs in the network between storage arrays and servers and can be implemented using a dedicated appliance or a director switch. Network-based replication combines some of the benefits of array-based and host-based replication. Because the replication is offloaded from servers and arrays, it can work across different servers and storage arrays, which is ideal for heterogeneous environments.
However, network-based replication has yet to see much adoption. If this class of products begins to gain traction, "it will be largely driven by the need by companies to rid themselves of a vendor lock-in for data replication," said Ashish Nadkarni, analyst with Glasshouse Technologies, in an email to SearchDisasterRecovery.com. "So far that FUD has not really stuck with customers. Some factors driving the lack of adoption of these technologies are cost, increase in complexity of the environment, and maintenance overhead."
>> Editor's Tip: Learn about the pros and cons of network-based replication in this tip.
Besides deciding where replication will be implemented, you may also need to determine whether synchronous or asynchronous replication will work best. With synchronous replication, data is written to the primary and secondary storage systems at the same time -- and the write is not considered complete until it's acknowledged by both local and remote storage systems. Synchronous replication requires considerable bandwidth, which makes it a considerably more expensive proposition. Also, synchronous replication is limited to shorter distances as latency increases over distance. This can limit your ability to get your secondary site far enough from your primary site making it less suitable for disaster recovery. Instead, synchronous replication is typically used for high-availability clustering and mission-critical applications.
Asynchronous replication, on the other hand, copies data at scheduled intervals to the disaster recovery site. It's designed to work over long distances and greatly reduces bandwidth requirements. However, because the write is considered complete as soon as the primary system acknowledges it, it is not guaranteed that the secondary system will have the most recent copy of the data if the primary storage fails.
Most array- and network-based replication products support both synchronous and asynchronous replication. However, host-based replication offerings typically only offer asynchronous replication.
>> Editor's Tip: Learn more in this case study from SearchStorage about synchronous replication.
Though it involves its own set of products with varying functionality, many remote replication schemes today employ WAN optimization. WAN optimization products offer a combination of deduplication, compression and caching to speed transmission of data over the WAN. WAN optimization products also streamline "chatty" protocols and can prioritize data transmission based on how frequently the data is accessed.
While WAN optimization is critical for certain environments, it may not be necessary for every organization.
>> Editor's Tip: Learn how WAN optimization can improve virtual disaster recovery in this tip.
The type of disaster recovery replication you choose ultimately depends on your individual needs and your budget; both array-based and host-based replication work well in certain environments. The bigger issue isn't who's using what type of data replication -- it's what data people are replicating.
As data storage environments grow, so does the amount of data being replicated. "What is a growing realization in many companies is: How much of their data is actual critical data? How much of that massive growth of data is actually because we have critical stuff or how much is just stuff we don't want to throw away?" asked Ross.
Toigo agreed, and noted that most people are replicating all of their data "because they're too lazy or don't have enough time to actually segregate which data goes to which application and which data is mission critical." Both Toigo and Ross said that if people continue to replicate all of their data, it's going to unnecessarily drive up their costs.
To keep costs down and data replication simple, users should distinguish their mission-critical data from the rest of their data. According to Toigo, the best way to figure this out is to do an upfront analysis of your business processes. Determine which business processes are the most critical to your company, and then find out which applications support those business processes. "Everything inherits its criticality from the business process it serves," said Toigo. "Disaster recovery is not intended to recover data, disaster recovery is intended to recover the business."
Once the mission-critical data is distinguished, it's important to back up and replicate it. Cem Kursunoglu, CEO of San Francisco-based consulting company BayNode, said his company "replicates all mission-critical customer data to off-site disaster recovery sites to ensure business continuity." Most of his customers are located in the San Francisco Bay Area where strong earthquakes are expected, so "it is the logical path for everyone."
>> Editor's Tip: For more information on remote replication, download our free essential guide on disaster recovery and replication.
This was first published in October 2010