What you will learn in this tip: Storage clustering can take on many different shapes, depending on the design and architecture. Learn how to leverage clustered storage in your disaster recovery (DR) environment.
Storage clustering can mean different things based on the vendor and the architecture which the technology is built on, but the features or benefits are usually similar across all platforms. Some of the main features or selling points of storage clustering typically include increased scalability, higher availability and increased performance.
What is storage clustering?
Storage clustering can be a grouping of nodes that share a common storage device and file system. For example, multiple servers accessing data via a distributed file system are sometimes considered a storage cluster; also, multiple systems configured to use the Microsoft Distributed File System (DFS) or Linux Global File System (GFS) are often referred to as storage clusters. In this context, the true cluster components are the servers or nodes as opposed to the storage devices. Network-attached storage (NAS) devices can be grouped to provide file-level clustered storage such as NetApp's MetroCluster. Clustering can also be achieved at the block level where I/O is distributed across multiple storage nodes as is the case for Hewlett Packard (HP) Co.'s StorageWorks P4000 G2 SAN, (formerly LeftHand Networks).
The main difference between multiple systems (e.g., file servers) grouped in a cluster to provide access to storage such as a distributed file system and a cluster made of storage devices is where the actual clustering and load distribution take place. For example, four clustered servers could share a single disk array. Conversely, a number of standalone systems could access storage spread across four clustered storage devices.
A storage cluster can be defined as a grouping of storage components that members can be added to for increased capacity, performance and availability. A multi-controller storage array could also be considered a storage cluster, but it is typically limited by its number of ports, cache size, I/O performance and number of disks, which eventually limits growth capacity and performance. Multiple physical storage arrays grouped in a virtualized storage environment is probably the most common form of storage clustering we see today.
Storage clustering aims to increase availability, capacity and performance which can be achieved by the addition of storage devices, controllers, etc. Keep in mind that storage clustering and a storage grid are two different things. The distinction is similar to clustered computing and grid computing. While there is not necessarily a clear line separating one from the other, a storage grid is more scalable and covers greater distances than a storage cluster.
Disaster recovery strategies and storage clustering
As mentioned earlier, the common attributes of the different storage clustering approaches are greater scalability, increased performance and higher availability. However, not all clusters scale performance and storage -- at least not independently. From a disaster recovery perspective, availability is the clear focal point, but be aware that not all storage clustering options offer the same type of redundancy. As part of disaster recovery planning, one of the main goals is to eliminate as many single points of failure as possible.
When looking at a cluster made of multiple nodes in a distributed file system environment, redundancy exists at the node or server level. For example, in a four-node cluster accessing a distributed file system on a shared storage array, failure of a single node would be alleviated by the remaining three nodes. With that said, the shared storage itself could be the single point of failure if no redundancies exist at the array level; it could cause all clustered nodes to lose access to data in the event of a storage array failure. From that perspective, the storage array requires redundant attributes such as redundant controllers, multiple I/O paths, RAID, hot-spare disks, replication capabilities etc. But this doesn't make it a storage cluster because the clustered nodes alone do not guarantee storage availability.
Taking a closer look at the storage clustering options, notice that the clustering takes place at the storage level rather than through nodes accessing the storage. For instance, HP StorageWorks P4000 is composed of multiple independent storage devices grouped in a cluster. It uses a technology called Network RAID, which creates a logical volume distributed across the storage cluster. Depending on the RAID level selected for the logical volume, single or multiple copies of a data block are created and striped across the cluster. Because the cluster can span beyond a data center and across buildings or even geographies, these replicated data blocks offer a good foundation for a DR strategy.
The NetApp MetroCluster combines the redundancy of an active-active storage cluster (dual controllers) with synchronous data replication across a campus or metropolitan area. Replication can take place between clusters of conventional NetApp filers or V-Series open-storage controllers interfacing third-party storage arrays from EMC Corp., Hitachi Data Systems (HDS), IBM Corp. and other supported vendors. There are other storage clustering options available such as HP's PolyServe scalable NAS and LSI's ONStor clustered NAS gateway using SAN storage. Another option is 3PAR Utility File Services that is built on technologies like HP PolyServe, NetApp, OnStor and more. Each option aims at providing the storage clustering key attributes: scalability, performance and availability.
If the adoption of storage clustering is considered part of a disaster recovery strategy, data availability is the key factor and will have to be carefully reviewed. The right storage clustering option should not only include redundancies to provide protection from single cluster member or node failure, but must also offer site-wide failure data recovery options. In other words, local clustering will only support local failure without a geo-clustering or remote data replication component.
About this author: Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.