How is an enterprise to choose which clustering technique, if any? This article will compare three methods of storage clustering, including examples of where each particular method is effectively applied. The three methods are: paired failover, or "pseudo-clustering," non-distributed clustering and distributed clustering. Included here are examples of where each technique can add to or detract from the basic business, improving productivity and efficiency and reducing complexity of operations.
Method one: Paired failover
The paired failover method is not really a clustering technique. However, since many examples of paired failover exist in the marketplace, many end-users think of it as clustering.
Paired controller failover may be accurately described as a redundant technique rather than a clustering technique. There is no decision making during paired failover, only one option. Pairs of controllers cannot exist in a clustered fashion, they are autonomous within the network. Clustering cannot be accomplished in a paired failover architecture because there is no connection whatsoever between a given pair of controllers and storage (disk drives) which is controlled by a different pair of controllers. From one controller pair to another, data cannot be accessed directly -- it can only be replicated.
Given this, the impact to business may be severe. For example, in a disaster recovery situation, paired failover necessitates the physical relocation and complex reconnection of servers. Server reconnection or reverse replication may take hours or even days for large volumes of data. The expense and risk of lost time, productivity, and transaction opportunity is significant in such situations.
Components of paired failover architecture
In this architecture, a given set of physical disks is managed and accessed by one and only one given, fixed pair of controllers. In addition, any given LUN is managed and made accessible by one and only one controller. Failover (of a controller to its paired counterpart) may occur as a planned event typically triggered by an administrator, or as an unplanned event due to any failure between the server (initiator) and the controller. The controller proceeds to redirect all traffic to its LUNs over to its counterpart, and then must inform the servers using those LUNs to alter their communication path(s), since the paired controller has a different SAN address (e.g., a Fibre Channel worldwide name) than the failing controller. This process takes anywhere from several tens of seconds to minutes. The end result is that all servers using those LUNs from the failed controller now communicate with its paired counterpart. The reverse process is similar, although it is always initiated by administrator intervention.
This architecture forces the enterprise to decide which servers to connect to which controller pair, and how many (and of what size and speed) disk drives to place behind each controller pair. Once selected, this cannot be changed without downtime and data loss. Since the typical application and operating system cannot tolerate a change in LUN address, as advertised by the controller, host software is required to facilitate failover. Without this software, paired failover would result in loss of access to data volumes.
The placement of LUNs (volumes) across the network is also an important decision. LUN movement cannot be facilitated in a paired-controller architecture. In other words, a LUN created and accessed via pair A cannot be either physically or logically moved to pair B. If attempted, the result is loss of access to the LUN. This phenomenon is the leading cause of poor storage utilization.
Best fit for paired failover
The best fit for paired failover is an environment consisting of a small number (typically 4 or fewer) of servers whose basic requirement is to use a small, fixed (unchanging over time) number of storage volumes, which must be accessible at higher levels of availability than is possible with direct-attached storage (DAS). These servers, by definition, will have dual HBAs that directly connect into the paired failover subsystem, as well as proprietary, host (operating system)-specific software for each server. This method does not scale into environments with many servers and/or many storage volumes. The operational difficulties of LUN assignments, as well as cost of software licenses, prevents effective usage of the method in all other cases.
Next time: Non-distributed clusteringAbout the author
Robert Peglar is the Chief Architect for XIOtech Corporation. He is responsible for storage architecture, healthcare technology and strategic direction. Robert is XIOtech's principal member of the SNIA, the IP Storage Forum and the Shared Solutions Forum.