Published: 10 Aug 2006
Put policy managers to work to automate various rule-based data recovery functions.
Almost all storage hardware or software products support some form of rules-based policy storage management. The benefits of automated policy-based storage management include faster failover to standby and redundant systems, shortened recovery times and less human error. But before heading down the road to automated data protection for business continuity (BC) and disaster recovery (DR), it's crucial to thoroughly understand what types of policies various products support, where the policy manager resides and what it's capable of doing.
Automated policy management relies on rules (policies) that describe what actions to take or enforce for different events and conditions. Policies are enforced by policy managers also known as policy engines. It's not uncommon to have multiple and overlapping policy managers found in host-based software, network devices and storage systems.
The level of automation will vary depending on the task being addressed, the criticality of the application and data, and other factors necessary to meet a given level of service. For example, some levels of service may need more automation and tighter integration across different technology domains or technology-specific policy engines, while other tiers may involve a combination of automated and manual intervention.
Automation needs to extend horizontally across different technology domains and functions to support application-wide BC and DR. What to automate, together with when and where to automate, will depend on the application and business service requirements. Basically, you want to automate the tasks that occur most often, such as host bus adapter (HBA) path failover, server failover in a cluster, disk drive rebuild, and the coordination of updates and replication across different locations.
Determining what and where to automate involves assessing and classifying application and business service requirements, including recovery time objectives (RTOs) and recovery point objectives (RPOs). Another part of the assessment process entails classifying the various applications and their accompanying data to establish the level of BC and DR an application and its data requires. A principal foundation for automated BC and DR is to leverage fault-containment and fault-isolation configuration practices. In other words, eliminate single points of failure using redundant design principles that enable automated and transparent fault isolation and containment to avoid rolling disasters. For example, dual HBAs with automated path failover managers can isolate and contain faults from spreading into a larger disaster.
|Policy managers and their functions|
The types of activities that can be automated include:
- Job and task scheduling
- Application and cluster availability management
- Path and component management
- Data discovery and data classification
- Data migration and archiving
- Backup and replication
- Change control and management
The food to fuel a policy manager and to set up and identify rules is information obtained via activity alerts, including event notifications from different sources. In some cases, different policy managers can signal or call on other policy managers to perform or invoke some function. Data classification can also help to identify data to apply policy rules. Classification tools, including traditional storage resource management (SRM) products, focus more on file, data path and access information. They collect and store meta data. Advanced data discovery, analysis and content classification tools, including those from Abrevity Inc., Index Engines Inc., Kazeon Systems Inc., Scentric Inc. and StoredIQ Corp., among others, can support BC and DR automation by helping you understand what data and information may exist in locations that you weren't aware of and thus not adequately protecting. The benefit of a deeper view into your data ensures that the data is being protected based on its content and context, and not just because of its file name or where it's stored.
Classified data can be used to support DR in one of two ways: passive and active. The passive method acts on the data after it has been stored; the active method applies policy rules to the data when it arrives for storage. An example of the passive approach is a virtual tape library that transparently creates a single-instance repository to compact and reduce the amount of data stored when creating a replicated copy for resiliency. An example of an active approach is when data is automatically replicated to multiple locations along with enforcing security, retention and other policy rules as the data is saved to a storage system. Data classification is also instrumental for a wide variety of storage functions, such as data archiving, data search and retrieval, migrating data to different tiers of storage and other information lifecycle management activities.
Additional policy engine considerations
When automating BC and DR, it's important to know what notification mechanisms exist between different hardware and software technologies that will trigger the automation. For example, in an electrical power grid, there are supervisory control and data acquisition mechanisms to detect, monitor and enable management of the grid. For a data and storage infrastructure to support BC and DR, monitoring, analysis and event-correlation tools provide similar management capabilities, such as those from Aptare Inc. (StorageConsole), CentrePath Inc. (Magellan), EMC Corp. (Smarts), Hewlett-Packard Co. (Storage Essentials), Onaro Inc. (SANscreen) and WysDM Software Inc. (WysDM), among others.
|Policies for business continuity and disaster recovery|
When setting up rules and policies for business continuity and disaster recovery, consider the following:
You should consider adding fault and event-correlation analysis and monitoring tools, and change management software to your automated DR/BC arsenal. Event-correlation tools can be used to report on the general health and status of your storage and data infrastructure, as well as servers and networks in general. In addition, these tools can shed light on how automated data protection tasks such as backup and replication--and storage in general--are performing.
When setting up policies, sometimes it's helpful to construct a decision tree, flow chart or similar mechanism to guide you through the process of data classification and technology assignment. This can be an effective tool for a storage manager to decide which techniques and technologies are most appropriate to use to support the specific level of automation for BC and DR. Essentially, you will create a decision tree that becomes the rules that dictate what actions the various policy managers will take for different events.
Depending on how sensitive your applications are to downtime, you should determine how much of a delay in recovery or restart you can tolerate before a policy manager initiates automated failover, restart or recovery. As part of automated recovery, consider the latency, if any, that exists from the time a policy manager decides to take action and for the action to occur. For example, a policy manager in a storage system may determine that a primary network link has failed, so it will initiate a switch to a slower secondary network link for remote data replication. You should know how much time will elapse between the policy manager's initial determination and the time required to switch to a secondary network path (and perhaps switch from synchronous to asynchronous modes of operation).
Automating BC and DR using policies reduces downtime and errors, but it takes time and planning to decide what to automate and to set up policies and the rules. Automated policies usually don't control only storage, but touch almost everything within the data center: servers, networks, applications, databases and lines of business. It's a complicated and difficult task to automate BC and DR procedures; once done, however, the payback is enormous.