Replicating data using host-based replication

Replicating data using storage controller-based and network-based products may be the most popular options, but don't overlook host-based replication. Here's why.


Replication options
@exe When you hear "replication," what technology first comes to mind? Your answer may be the Symmetrix Remote Data Facility, given EMC's prominence and the product's lengthy history. Other choices might be Hitachi Data Systems' TrueCopy, IBM's Peer to Peer Remote Copy and Network Appliance's SnapMirror. Further down the list, you might include host-based products like EMC/ Legato's RepliStor or Veritas' Volume Replicator.

Within any discipline, there's always the risk of groupthink--doing something a certain way because that's the way it's always been done. But sometimes "the way we've always done it" may not be the best approach. When it comes to replication, many of us may have developed such an attitude.

Host-based replication is a mature technology, but to those of us who design and recommend SANs, it's often not considered an option for architectures that support quick recovery. Has this technology been supplanted by newer and better options? If not, when should host-based replication be considered?

The many tiers of replication
One of the challenges related to replication is that there are so many choices and approaches. Selecting a given approach can have a significant impact on the architecture of the solution, as well as its management and support (see Replication options, this page).

Let's review the non-host-based approaches:

Application--Replication at the application level requires the application to offer this capability. There are potential benefits to this approach, most notably the opportunity for truly transparent failover and recovery from a user perspective. Only a few applications provide replication services, however, and these are built primarily around some type of transaction monitor or messaging service.

Database--Major database vendors and several third parties offer database-level replication. Typically, the full database is replicated periodically and log entries are shipped to the remote standby database as they're created. If communication links are interrupted, transactions are buffered and sent when the links are restored. At the remote location, log entries are either applied immediately or after some scheduled delay (to avoid replicating a corruption of the database).

Network--Replication within the network is a newer, but increasingly attractive approach. In a SAN environment, this is done via a dedicated appliance or an app running in an intelligent switch. The goal is to migrate the functionality available in top-tier storage systems to the network level to allow host-transparent synchronous and asynchronous replication across heterogeneous devices.

Storage--A key goal of controller-based replication at the storage system level is to provide a service that's completely transparent to applications and databases, with little or no impact on hosts. Although expensive, it's generally viewed as the current best practice for protecting tier-one applications. Major features include:

  • Providing synchronous and asynchronous service. If a business requires local and remote copies of data to be continuously in sync on a real-time basis, then storage replication is probably the best solution.
  • Support for consistency groups. Enterprise applications can span multiple volumes assigned to several hosts, each handling different components of the application. If the volumes aren't replicated consistently, the target data may be unusable. Consistency groups ensure that interdependent and interrelated data is kept in sync.
  • Configuration flexibility. Storage-based replication usually provides many configuration options and automation capabilities, including a command-line interface (CLI) and a GUI. The CLI is useful for automation and standardization through scripting.
  • Centralized operation. The ability to manage and monitor all replication functionality across all storage from a single location simplifies operations and enables consistent protection policies across the enterprise.

Examining the host side
Host-based replication is often used in smaller environments or on a departmental basis. A host-based solution can be deployed at a fraction of the cost of a high-end, storage-based implementation, even considering the cost of software licenses, hardware and implementation services.

Host-based replication typically resides at the file system or logical volume level within the operating system. Like storage-based offerings, it's usually transparent to the application, but certainly not transparent to the host operating system or hardware. In most cases, asynchronous replication over the LAN or WAN to a similar (not necessarily identical) system is provided. Distance isn't usually an issue and heterogeneous storage can be deployed easily in a given configuration. Also, most products can transfer data at the byte level rather than the data block level, potentially requiring less bandwidth.

So, what's the downside? One issue is that it's a decentralized solution in a time when most of IT is trending toward centralization. We're consolidating data centers, servers and storage, so why would we decentralize an important disaster recovery (DR) function like replication? However, costs often outweigh this concern.

More practical concerns are scalability and performance. Host-based solutions, by definition, have some degree of impact on the CPU and other server resources. A good understanding of data change rates, resource utilization patterns and available capacity is important when determining if this type of solution will meet your needs.

Manageability and support must also be considered. It's easy to deploy host-based replication on a few servers, but if widely deployed it can complicate server management.

Where does it fit?
When is host-based replication an option? If synchronous replication is required, most host-based solutions are ruled out. Similarly, host-based solutions don't support the notion of consistency groups. But there are many cases where a host-based option may fit, including:

  • DR protection for a few key applications, particularly where cluster-aware solutions are required.
  • Low-cost DR protection of file and print servers, often leveraging the many-to-one replication capabilities of these products.
  • Remote office data protection to centralize data and eliminate local backup.
  • To provide the missing replication capability for Windows-powered NAS devices.
  • Replication for heterogeneous storage devices.

All replication options should be evaluated with the same scrutiny. Key features, such as a CLI, can significantly impact the usability of any solution.

The replication landscape is evolving. Functionality once found only on high-end arrays is now offered on midtier Fibre Channel and iSCSI platforms. And advances in network-based solutions may change the economics of replication.

In any event, we should strive to find the appropriate tool for the job. For many, host-based replication is the right tool at the right price.

Dig Deeper on Disaster recovery storage