Data replication is the process of copying data from one location to another over a storage area network (SAN), LAN or local WAN so you have multiple up-to-date copies of the data. For disaster recovery
Replication is a key technology for disaster recovery, and often works in combination with data deduplication, virtual servers, or the cloud to carry out its DR role.
In this tutorial on data replication and disaster recovery, learn how to choose the best replication product, the differences between host-, array-, and network-based data replication, and about how new technologies like data deduplication and virtual servers are changing replication and disaster recovery.
DATA REPLICATION AND IT DISASTER RECOVERY TECHNOLOGY TUTORIAL TABLE OF CONTENTS
Choosing a replication product for disaster recovery
Array-, host-, and network-based replication
Data deduplication and data replication
Virtual servers and data replication
Cloud storage and data replication
Several factors determine what type of data replication product is best for an organization. There are two types of data replication products: synchronous or asynchronous. Synchronous replication writes data to the primary and secondary sites at the same time. With asynchronous replication, there is a delay before the data gets written to the secondary site.
Both types have advantages and disadvantages. While data is always current between sites with synchronous replication, it is more expensive than asynchronous replication, introduces latency that slows down the primary application, and only works over distances of 50 km to 300 km. Synchronous replication is preferred for applications with low recovery time objectives (RTOs) that can't abide data loss.
Because asynchronous replication is designed to work over distances and requires less bandwidth, it is often a better option for disaster recovery. However, asynchronous replication risks a loss of data during a system outage because data at the target device isn't up to date with the source data.
Editor's Tip: For more information about choosing a data replication product, check out our tip onchoosing a data replication solution for disaster recovery .
Another differentiator is where the replication takes places. Replication occurs in one of three places: in the storage array, at the host (server), or in the network. Most replication still occurs at the array, although that is changing as host- and network-based options improve.
Storage array-based replication: Most enterprise data storage vendors include replication software on their high-end and midrange storage arrays. Examples include Dell EqualLogic's built-in replication; Symmetrix Remote Data Facility (SRDF) for EMC Corp. high-end arrays, EMC MirrorView for EMC's midrange Clariion arrays; Hewlett-Packard (HP) Co. StorageWorks XP Continuous Access for HP XP arrays and Continuous Access EVA for HP's midrange arrays; Hitachi Data Systems (HDS) Universal Replicator for asynchronous replication and TrueCopy for synchronous replication; and IBM Corp. Global Mirror for asynchronous replication and IBM Metro Mirror for synchronous replication.
The main drawback of array-based replication is lack of heterogeneous support. Except for HDS, which supports other vendors' storage with its USP V systems, vendors' software only allows replication between like arrays. EMC won't even allow replication between high-end Symmetrix and midrange Clariion systems.
Host-based replication: Host-based replication software runs on standard servers, making it the cheapest and easiest type of replication to manage. However, it does tax the server CPU. Host replication products include CA XOsoft, Double-Take Software, InMage System Scout, Neverfail and SteelEye Data Replication. Data backup software applications often include host-based options, such as BakBone's Real-Time Data Protector, CommVault Continuous Data Replication (CDR), EMC RepliStor and Symantec Corp. Backup Exec Continuous Protection Server (CPS) for NetBackup.
Network-based replication: Replication on the network requires an additional device, either an intelligent switch or an inline appliance such as IBM SAN Volume Controller (SVC) or LSI Corp.'s StoreAge Storage Virtualization Manager (SVM). Replication software that runs on intelligent switches include EMC RecoverPoint and FalconStor Continuous Data Protector (CDP).
Disk array-based replication usually replicates only between the same type of array, while network-based and array-based replication works across storage platforms. Another difference is host replication only supports asynchronous replication, while array and network replication can be asynchronous or synchronous.
Editor's Tip: For more on array-based replication, read our article on The pros and cons of disk array-based replication.
Data deduplication -- the hottest technology in data backup today -- is often combined with replication for disaster recovery purposes. Deduplication reduces the amount of data that gets replicated and lowers the bandwidth requirement to copy data offsite.
There are some drawbacks to the dedupe/replication combination. Inline deduplication, which takes place while data is being written to disk, can impact backup performance. Post-process dedupe, which takes place after the backup completes, can delay replication. Still, data that is deduplicated and replicated offsite can be recovered much faster than data backed up to tapes and stored offsite for disaster recovery.
WAN optimization devices such as those from Cisco Systems, Blue Coat Systems, Riverbed Technology and Silver Peak Systems also use data deduplication to complement replication over the WAN for DR.
Editor's Tip: For more on data deduplication, download our freedata deduplication chapter download.
Server virtualization is a driver for disaster recovery because virtualization reduces the number of servers required for a disaster recovery site. Virtual servers are stored as files or virtual machine (VM) images on the host, and can be moved by copying the VM image file and booting it on another host while physical servers require the same hardware at the DR site.
Tools for replicating virtual machines include PHD Virtual esXpress, Vizioncore vReplicator or VMware Site Recovery Manager -- if your array supports it -- or tools built into applications such as Oracle that replicate data between servers.
Editor's Tip: For more on server virtualization, read our article onserver virtualization strategies for disaster recovery.
The cloud also fits with replication, because it can remove cost and complexity from disaster recovery. It alleviates the need to acquire and manage an off-site location.
Host-based replication is generally the best fit for disaster recovery through the cloud, because storage array- and network-based replication require devices at the source and target locations. Host-based replication lets you move data from standard servers in your environment to the provider's servers off-site.
Editor's Tip: For more on cloud storage and disaster recovery, read our article on how one company turned to a cloud disaster recovery service.
This was first published in September 2009