When looking for tools to enhance an IT recovery strategy, business continuity and disaster professionals may want...
to consider IT solutions already in place that were originally designed to accelerate network performance. But there is an entire list of caveats to consider.
Wide area file services (WAFS) appliances have been in use by IT departments to store copies of files that have been accessed over a wide area network (WAN) in local devices, so that subsequent users get accelerated performance. WAN optimization products seek to accelerate a broad range of applications accessed by distributed enterprise users via eliminating redundant transmissions, staging data in local caches, compressing and prioritizing data and streamlining chatty protocols.
Both WAFS appliances and WAN optimizers can enhance an IT recovery strategy, particularly where there are many branch offices using local file servers and performing local tape backups. For example, in the event of local data corruption, a new and perhaps more up-to-date version of the data could be pushed down to the local server faster than retrieval of an offsite backup and restoration of the corrupted data.
Even if replacement of the local server is necessary, the WAFS appliance/WAN optimizer-enhanced recovery strategy would be able to re-create the server and its data from the central site. Similarly, the local copy could be used for a period of time if the primary version at the central data center were to become corrupted or unavailable. Alternately, WAFS/WAN optimizers can also be used to eliminate local servers and local backups.
In addition to considering capability and possible cost savings, an enhanced IT recovery plan should offer improved reliability, increased speed of recovery and data recovery closer to the point of failure. The single greatest issue with leveraging the existing WAFS/WAN optimization tools is its introduction of greater dependence of both daily operations and backup/recovery on network availability. This makes the network a single point of failure for both operations and backup. This is an increase in vulnerability for a process, backup, that is designed to protect the organization from vulnerability. This necessitates safety countermeasures that may cancel all projected cost savings.
The following characteristics and facilities of IT operations are recommended to provide a reasonable level of network reliability when using a WAFS/WAN optimizer strategy to enhance data backup/recovery. Without these provisions in place, the use of these products to enhance an IT recovery strategy is not recommended.
High-speed redundant network connections must be in place to all points served by the WAFS/WAN products. Redundant paths for all network segments must be physically as well as logically separated. Redundant and different types of network connections from different providers is advisable.
Mirrored Redundant Data Center
A fully redundant data center where all critical data is mirrored in at least near-real-time as required by business-driven recovery point objectives (RPO) is necessary, since none of the branch offices will function indefinitely without the primary data center. This backup data center must run idle except for ongoing data mirroring and offsite archive backup. Its primary purpose is to act as the near-real-time selective or full failover for the primary data center. It must be located in a hardened site at a sufficient distance from the primary data center to remain unaffected by an event that renders the primary data center or its network inoperable.
Network Staff Reliability
The level of network support staff reliability must be very high: full cross-training to eliminate single points of failure is required, including pre-implementation training on these devices. Recovery tests should be run frequently to prove that recovery time objectives (RTO) and RPO assumptions in various scenarios can be met. Network SWAT teams should be continuously available to diagnose and quickly repair identified network issues.
IT Operations Maturity
The general level of maturity of IT operations must be very high: full operational visibility at all times; tightly controlled changes in infrastructure and application environments; thorough and "hostile" QA process for all proposed changes; continuous monitoring and alarming of all abnormalities prior to the occurrence of failures whenever possible.
Data Classification Policy
A comprehensive data classification policy must be implemented to determine the correct data backup targets and timing. Data management staff must have significant experience in the use of tools to monitor data usage and continuously optimize access and backup based on usage patterns. Both the Data Classification Policy and data management staff must have the full support of IT management as well as senior business management. A data classification policy is necessary to determine the right time interval for asynchronous backup in order to meet RPO targets balanced against a constrained bandwidth environment.
Note that this is classification of the data for backup frequency and not for sensitivity. It is extremely unlikely that a real-time synchronous approach will be used for all data because of the performance impact of this technique. More frequent intervals will transmit smaller quantities of data per transmission and therefore require less bandwidth. Less frequent intervals will result in higher data volumes and therefore require more bandwidth, even if it is used less frequently. This approach is very different from the once-a-day tape backup solution presumably used before in each office.
If all of the above are in place, a WAFS/WAN optimization implementation can provide additional recovery ease and flexibility, and may increase your capability to meet business-driven RTOs and RPOs. It may also provide lower costs and increased reliability and flexibility for daily operations.
It's up to each organization to determine whether or not it is prepared to do what is necessary to reasonably counter the major risk associated with the use of this strategy: increased dependence on the network for daily operations as well as backup and recovery. Perhaps an organization already has many of the necessary countermeasures outlined above already in place. Running the numbers will determine if the increased soft and hard costs are worth the benefits. The last thing any organization wants is to increase the probability of a self-induced IT service interruption!
About this author: Kathleen Lucey, FBCI, is the president of Montague Risk Management and is president of the Business Continuity Institute USA Chapter.