There are a number of ways in which redundant copies of data can be created. Most methods rely on data replication technologies such as host-based replication, array-based replication and network-based data replication.
Regardless of which of these technologies is used, administrators must take steps to ensure that the available infrastructure can replicate data quickly enough to keep pace with data growth and the data change rate.
Host-based replication comes in a number of forms, but is tied to Server virtualization. Although a hypervisor's main job is to allow virtual machines (VMs) to utilize a subset of a host server's hardware resources, modern hypervisors provide capabilities that extend beyond the creation and execution of VMs. One such capability is host-based replication.
Virtualization vendors generally do not use the term host-based replication. Microsoft, for example, simply refers to replication or VM replication. Regardless of the term used, the basic concept behind host-based replication is to allow a virtual machine to function during a disaster.
In very simple virtualized environments, VMs run on a host server. But this host server can become a single point of failure. If the host server fails, all the VMs running on the host will also fail.
Standalone virtualization hosts are not suitable for running production workloads because there is simply too much risk involved in doing so. That said, virtualization vendors allow host servers to be clustered so VMs can fail over to another host server in the event of a host-level failure.
The problem with failover clustering is that, while it is undeniably better than relying on a standalone host, the failover cluster can become a single point of failure. For example, many clusters depend on a centralized storage mechanism known as a Cluster Shared Volume or CSV. If the CSV fails, the entire cluster will fail. Even if a cluster does not have a dependency upon a CSV, a data center-level disaster such as a fire or hurricane can bring down a failover cluster and the VMs running on it. Host-based replication provides an extra degree of protection against these types of situations.
The basic idea behind host-based replication is that VMs -- or, in some cases, individual virtual hard disks -- are replicated from one host server or host cluster to another. This allows the hypervisor to create a fully synchronized standby copy of the VM. This virtual server replica usually exists off-site and remains powered off until a disaster requires its activation. At that point, a manual failover process is initiated and the replica VM is brought online.
Array-based replication usually happens at the storage level. Some enterprise-class storage arrays have built-in functionality that can be used to replicate inbound data to a secondary storage array.
This form of replication tends to be easy to configure. In addition, because the replication process is handled by the storage hardware, there is no impact on servers.
The primary disadvantage to array-based replication is that it has to be supported by storage hardware. If your storage array does not include a replication feature, there probably isn't a way to add it; although a firmware update could theoretically add replication capabilities. In addition, storage replication features tend to exist primarily in higher-end storage products.
Network-based data replication
Network-based data replication is probably the least familiar of the data replication technologies. While it has been around for quite some time, it doesn't seem to receive as much attention as host- and array-based replication.
There are different varieties of network-based data replication. One common flavor involves the use of a data replication appliance that works to intercept and then redirect I/O write operations. In doing so, the appliance essentially functions as an I/O splitter.
To understand how this form of network-based data replication works, imagine that a server sends a write request to a network-based storage array. Normally, that request would pass through a network switch and then be sent directly to the storage array.
In the case of network-based data replication, the instruction would not be sent to the storage array, but to the data replication appliance. This appliance would act as the termination point for the write operation.
However, the appliance does not contain any significant amount of storage of its own and therefore does not directly handle the write operation. Instead, it forwards the write operation instruction to two or more storage arrays, creating redundant streams of network traffic that are identical except for the destination address. This allows write operations to occur simultaneously on multiple storage devices.
The result is storage devices that contain identical data, which is kept in sync through the network-based data replication process.
Like any of the data replication technologies, network-based data replication has its advantages and its disadvantages. The primary advantage is that it works at the network level, and not at the storage or host level. It also offers nearly universal compatibility, which is ideal for organizations that rely on a wide variety of host operating systems (OSes). Administrators can replicate data without having to worry about OS-specific compatibility issues or hardware-specific device drivers.
The primary disadvantage to using network-based data replication is that it offers limited scalability. After all, a network appliance can only accommodate so many inbound and outbound connections. The availability of network ports and network bandwidth is usually the limiting factor.
Another potential disadvantage is performance. Remember, the replication appliance acts as a termination point for write operations. This appliance then has to regenerate the write request and send it across multiple network paths. These extra steps mean write operations do not reach the storage devices as quickly as they would if replication were not being performed. The end result is latency for write operations.
Network-based data replication comes in various forms. Not all available products depend on the use of an inline network appliance; some are specifically designed to work with SANs. The basic idea behind this approach is that data is already flowing from the servers to the storage through Fibre Channel (FC), host bus adapters and FC switches. The FC switches handle only storage traffic, not general network traffic. And since the FC switches have some intelligence built in, it is easy to see how data replication can be implemented at the FC switch level. Essentially, the FC switch can be programmed to create redundant streams of write I/O destined for multiple storage locations.
Data volume and goals are factors in your replication decision
Regardless of which of these three data replication technologies is used, backup administrators must consider the sheer volume of the data being replicated. This is especially true if the organization's goal is to perform replication to a remote data center. Replication operations always start with an initial synchronization. Synchronizing large amounts of data across a low-speed connection may prove to be highly impractical. In these types of situations, offline seeding may be an option for prepopulating the replication target without transmitting vast quantities of data across a low-bandwidth connection. Seeding typically involves copying data to removable media and then to the target device so as to avoid replicating data across a slow connection.
Host- and array-based data replication technologies compared
Replication technology software can help with redundancy
Use remote data replication technologies for DR