Published: 01 Feb 2011
Once an expensive option, replication is now available in many forms and is more affordable and effective than ever.
The success -- and adoption -- of replication technology can be largely attributed to advances in local-area, wide-area and storage-area networking, as well as server virtualization and cloud computing. From replicating virtual machine (VM) images for data protection and high availability to the exchange of information with cloud services, replication has proved to be the most suitable and agile data transfer and protection method in increasingly virtualized IT environments.
But it's not just coincidental progress that has elevated replication's role in data management. It's as much, and perhaps more, due to changes in business requirements.
For example, downtime tolerance has been shrinking. A 2010 data protection research survey conducted by Milford, Mass.-based Enterprise Strategy Group (ESG) revealed that 18% of respondents can't accept downtime for tier 1 applications, up from 8% only three years ago. Disaster recovery (DR), a prime user of replication, is becoming a basic requirement, and the same ESG survey identified it as the No. 1 area of investment. Compliance obligations of public companies and certain industries drive the use of replication to ensure accessibility and recoverability of critical data. Globalization, and the resulting need for firms to operate globally with satellite offices dispersed throughout the world, is also spurring replication-based data protection.
Another indicator of the increasing relevance of replication is the diversity and multiplicity in which it has found its way into products: as an array feature, backup application option, business application capability, network appliance add-on and even as a virtualization product option. Able to run in arrays; in the network; on hosts or within applications; asynchronously or synchronously; and at a block, file or sub-file level, replication products and choices are plentiful, maybe even overwhelming at first. But replication offerings can be grouped into categories that offer varying benefits and value propositions for different use cases and environments.
Asynchronous and synchronous replication
Asynchronous replication is the most broadly supported replication mode, supported by array-, network- and host-based replication products. Committed to the source array first, then buffered or journaled for subsequent replication to the target array, data arrives at the replication target with a delay, ranging from nearly instantaneous to minutes or even hours. Its network latency and bandwidth tolerance make it fit for long-distance replication.
Not all asynchronous replication implementations are equal, though. Key areas of differentiation are how a product deals with network outages, if it supports transaction recovery or if it simply creates a crash-consistent replica that depends on the target OS and application to resolve inconsistencies. For instance, both IBM Global Mirror for the IBM System Storage DS8000 and the Hitachi Data Systems Universal Replicator have provisions to maintain the sequence of writes. "Hitachi Universal Replicator guarantees transaction recovery by sequencing replicated data within consistency groups," explained Sarah Hamilton, Hitachi Data Systems' senior product marketing manager, data resilience and security.
Rarely supported in host-based replication products, synchronous replication is the hallmark of high-end block-based storage arrays and also supported by most network-based replication products, including the Hewlett-Packard (HP) Co. StorageWorks SAN Virtualization Services Platform (SVSP), IBM SAN Volume Controller (SVC) and LSI Corp. StoreAge Storage Virtualization Manager (SVM). Committing data to the replication source only after committing it successfully to the replication target, synchronous replication guarantees synchronicity between the replication source and target. A reliable network and low latency are prerequisites and supported distances can't exceed 50km to 300km, depending on the array vendor. Its primary use is for high-end transactional applications that require instantaneous failover if the primary node fails. It's less relevant in network-attached storage (NAS) unless the NAS can also serve as block-based storage for high-end transactional applications. A pure NAS play, such as a BlueArc Corp. Titan or Mercury system, usually lacks synchronous replication support. "A NAS doesn't require synchronous replication," asserted Ravi Chalaka, BlueArc's senior director of solutions marketing. Conversely, NetApp filers with their support for NAS and block-based protocols, especially Fibre Channel (FC), support synchronous replication, allowing NetApp arrays to compete with very high-end block-based storage systems from EMC Corp., Hitachi Data Systems and IBM.
Replication and the network
Available bandwidth and latency between the replication source and target systems greatly impacts replication performance. For short distances under 50km, latency is negligible and available bandwidth determines replication performance. The bigger the pipe, the faster a given amount of data can be replicated. Besides buying larger connections, compression and data deduplication are critical techniques to get the best out of given bandwidth. Deduplication and compression help shorten backup and restore times, and can result in significant connection and storage savings.
For long-distance replication, latency is the determining factor. The longer the distance, the longer the communication between replication source and target takes. Less of an issue for asynchronous replication, latency is the determining factor in synchronous replication where a distance of 50km is usually the maximum acceptable range. While replication implementations make a difference (for instance, IBM Metro Mirror supports up to 300km between source and target), the impact of latency can be reduced and supported distances extended through the use of WAN accelerators, which perform TCP optimizations and implement techniques to minimize roundtrips. WAN accelerators are available from Cisco Systems Inc., F5 Networks Inc., Juniper Networks Inc., Riverbed Technology Inc. and Silver Peak Systems Inc
Early on, a mechanism to replicate data from one array to another emerged as a necessity and array vendors quickly added replication to their storage systems -- to high-end arrays first, where it's standard now, and then to midrange and lower-end arrays. Dell Inc. is a perfect illustration of the trend of replication filtering down into the low end of the data storage market. Today, all of Dell's storage systems, with the exception of the lower-end PowerVault arrays, support replication, from Dell/EMC SAN Storage and Dell EqualLogic arrays to the Dell DX Object Storage Platform.
Having replication be a part of the array has many merits. For storage managers it's simply another array feature. Managed similarly as other array functions and options, it takes little effort to leverage replication. Because it's an array function, deploying it requires very little cross-departmental coordination; it's the storage group that makes it happen and supports it once deployed. Provided by the same supplier, array-based replication is supported by a single vendor, thereby eliminating a great deal of finger-pointing when problems occur. Furthermore, array-based replication is less likely to be disrupted by extraneous activities such as patching and other changes, which are more likely to pester host-based replication products, giving it a higher degree of resilience. "Application failures won't impact array-based replication because the storage system isn't impacted," said Mark Welke, director of data protection solutions at NetApp.
Array-based replication's greatest shortcoming is its requirement for similar source and target arrays, limiting its use to homogeneous storage environments. Most storage vendors don't even support replication between their own array families. Among major storage vendors, NetApp is the lone exception, supporting array-based replication between any of its arrays. Another noteworthy vendor is Hitachi Data Systems, whose Virtual Storage Platform (VSP) and Universal Storage Platform (USP) are able to reach out to other arrays via storage virtualization. And with very few exceptions, such as Dell EqualLogic arrays, replication is an extra-cost option that's charged for by device or replicated capacity.
Block-based FC and iSCSI arrays replicate block changes on volumes and LUNs. Since only changed blocks of a few hundred bytes need to be replicated, it's very fast and efficient. Executed beneath the file system, block-based replication is operating system agnostic and supports replication between any platforms attached to the array. Block-based replication has the potential to take advantage of advanced array features such as deduplication, compression and encryption, and some vendors have enhanced their replication offerings accordingly. For instance, NetApp, with the 8.0.1 release of Data Ontap, added the ability to only replicate data changes in FlexClone volumes between parent and clone images. A FlexClone volume is a thin-provisioned clone, requiring very little actual disk space; but until this latest release, the complete volume had to be replicated instead of the disk-efficient FlexClone.
NAS systems usually replicate at the file system-level, which has the benefit of file system metadata awareness, which can be leveraged during the replication process and enables replication based on criteria such as file size and file type. But it's slower and usually less efficient than block-based replication. The performance impact increases with the number of files and folders in a replication set that need to be parsed, and the larger the tree, the longer it takes to parse it. For that reason, BlueArc introduced the object-based JetMirror technology, replacing time-consuming sequential file parsing with an object-based metadata store. "Backups with JetMirror are 2.8 times faster than with NDMP [Network Data Management Protocol] and replication times for very large file stores can be reduced by an order of magnitude," BlueArc's Chalaka claimed.
KEY CRITERIA FOR SELECTING A REPLICATION PRODUCT
Enlarge KEY CRITERIA FOR SELECTING A REPLICATION PRODUCT diagram.
Network-based replication usually comes into play in heterogeneous storage environments. It'll work with anyone's array and supports any host platform. Situated in the network, between hosts and arrays, the splitting of I/Os is performed in either an inline appliance or in a Fibre Channel fabric. The I/O splitter looks at the destination address of an incoming write I/O and, if it's part of a replication volume, forwards a copy of the I/O to the replication target. In many ways, network-based replication combines the merits of array- and host-based replication. Having only arrived on the market several years ago, it has the smallest market share, trailing both array-based and host-based replication in revenue and numbers, but it's growing at a quicker rate than array-based replication, according to IDC.
Compared to the multitude of array- and host-based replication offerings, there are fewer network-based replication products on the market, and they can be broken into two groups: inline appliances and fabric-based replication products.
Inline appliances, such as the IBM SVC, don't depend on intelligent switches from Brocade Communications Systems Inc. or Cisco Systems Inc. for splitting I/Os; instead, I/Os are terminated and forwarded in the appliance to storage targets. Unlike the wire-speed splitting of fabric-based products, the overhead of terminating and initiating new I/Os causes a small delay. While fabric-based products are based on a split-path architecture where data that isn't part of a replication or virtualized volume is simply passed through, in an inline appliance all traffic has to traverse the replication appliance. As a result, they're more likely to hit a scalability threshold than their fabric-based counterparts. "A variety of hardware options, including cache and number and speed of processors, have enabled IBM to address scalability and performance concerns for the most part," said Greg Schulz, founder and senior analyst at Stillwater, Minn.-based StorageIO Group.
While fabric-based replication products may be technologically superior, with better performance and scalability, they're significantly more complex and require intelligent switches. To use them in environments that don't have intelligent switches, fabric-based replication products usually provide host agents that perform the splitting of I/Os on hosts instead of in the fabric. EMC RecoverPoint, with its continuous data protection (CDP) and remote replication capabilities, is the most prominent fabric-based replication product.
HP StorageWorks SVSP and LSI StoreAge SVM -- the former being an OEM product of the latter -- combine the simplicity of an inline appliance with the performance and scalability of a fabric-based product. The products use a split-path approach where management is handled in-band; however, data movement and normal data flow occur out of band, leading to improved scaling and performance.
Other network-based replication players are FalconStor Software Inc. with its MicroScan and Delta Resync replication features, and InMage Systems Inc.
In host-based replication products, the replication is performed by so-called filter drivers on servers that intercept write I/Os so they can forward file or block changes to replication targets. Host-based replication has the lowest entry cost and complexity, but these increase proportionally as the number of replication nodes expand. While it's easy to manage for a small number of servers, managing a large number of nodes, from initial installation and rollout to ongoing support and monitoring, can be a daunting task.
Compared to array- and network-based replication, host-based replication is less isolated and its execution environment less controlled and, as a result, it can be adversely impacted by other applications and server events. Virus infections, resource shortages and application crashes can't bring down array- or network-based replication, but they can definitely stop host-based replication. Standalone host-based replication products are available from the likes of CA with ARCserve, Double-Take Software Inc. (now part of Vision Solutions Inc.), Neverfail Ltd., Quest Software Inc. and SIOS Technology Corp. (formerly SteelEye Technologies Inc.). The products differ in platform support, with Windows supported by all, and features, such as throttling, compression, deduplication, encryption, high-availability (HA) capabilities and management options.
Host-based replication is the nimblest of the three approaches. It's software only; works with any type of storage, including DAS and cloud storage; and is capable of supporting a wide range of platforms, depending on the replication product and vendor. Even though it competes in scenarios where array- and network-based replication may be used, the fact that it's pure software allows it to extend into areas where the other two simply can't compete. First, it's the ideal replication method for cloud storage. Cloud services like the Amazon Elastic Compute Cloud (EC2) simply can't deal with hardware-based replication, but they allow running replication software to exchange data between clouds, and between the cloud and on-premise servers.
Secondly, host-based replication software can be incorporated into other apps. Prime examples are backup apps that have been adding replication to provide and manage both replication-based and traditional data protection. Replicas aren't usually standalone images, but are likely part of a larger data protection process. As replication-based data protection moves mainstream, being able to manage and monitor them along with traditional backups becomes more relevant. As a result, CA, EMC, IBM, Symantec Corp. and others have been adding replication-based data protection to their backup suites.
The evolution of replication
Replication has been used for high availability and data protection of critical data and applications for a long time, and it has been slowly eating away at tape as the media of choice for data protection. As a consequence of shrinking recovery time objectives (RTOs) and increased need for 24/7 availability of applications and data, this trend is likely to continue, if not accelerate. On the technology side, cloud computing, the virtualization of IT infrastructure, and a flurry of replication options and offerings are aiding this trend.
BIO: Jacob Gsoedl is a freelance writer and a corporate director for business systems. He can be reached at email@example.com.