Disaster recovery readiness monitoring applications

Planning, developing and implementing disaster recovery plans can be complex, but a new class of apps can help you determine if DR plans are synchronized with your IT operations.

Planning, developing and implementing disaster recovery plans can be complex, but a new class of apps can help...

you determine if DR plans are synchronized with your IT operations.

Disaster recovery (DR) planning is typically a significant undertaking that requires many work hours and often a considerable budget. But creating a plan is only part of the process; the plan must be tested frequently enough to ensure it will work as expected. But testing is a time-consuming and often disruptive activity, so it’s often not done as regularly as it should be. A new category of applications -- DR readiness monitoring apps -- can facilitate the testing process.

DR monitoring applications address one of the key causes of recovery failures: configuration drift. Configuration drift occurs when storage and other IT gear is upgraded or replaced, but the DR documentation and procedures aren’t updated to accommodate those changes. With the DR docs out of synch with the real-world setup, recovery efforts are more likely to fail. The purpose of monitoring applications is to find those discrepancies and improve the odds of a successful test or actual recovery.

What should be monitored?

From a disaster recovery perspective, the logical first point of focus would be protecting critical data. This means backing up data on-site and off-site using a variety of methods, including disk-to-disk (D2D) and disk-to-tape (D2T) backup; data mirroring to an off-site location or cloud service; and shipping backup tapes to an off-site storage facility.

The next focal point would be hardware and applications. You want to ensure these assets are regularly monitored for proper performance. After that, but not necessarily in order of relevance, are network-based assets, such as local-area networks (LANs); wide-area networks (WANs); storage-area networks (SANs); premises- based systems such as routers and switches; and voice systems such as PBX systems and Voice over IP (VoIP) systems.

In a typical data center, operational assets are linked together in a variety of ways, mostly via networking assets. The links and relationships among the various resources may be complex, making it even more difficult to detect configuration drift or other discrepancies.

Successful disaster recovery plans are worthless unless they’re regularly exercised to ensure everything that’s critical to the business is being addressed. But the majority of organizations typically exercise their DR plans only once a year, or even less frequently.

To maintain a careful vigil over your critical data protection activities, as well as systems, applications and networks, you’ll need timely performance data from all critical assets. If you know the health of your IT infrastructure in real-time, you’ll be better prepared to respond quickly and effectively when something disrupts operations.

Existing performance monitoring systems

If your organization is in the medium- to large-scale range, chances are you have already invested in a variety of performance monitoring tools. You may have applications that scan only one system and others that may monitor a broad range of activities, such as network performance or information security monitors.

But if you add disaster recovery on top of your existing monitoring activities, you’ll have to determine if the existing performance monitors can integrate with the DR plans. Even more important is the need for performance data to be optimized for DR plan requirements, such as validating that data replication activities are being completed as planned. And what if you have hundreds or thousands of applications distributed everywhere? How can you determine that everything is performing normally? If a critical system begins to malfunction, it needs to be brought to your attention as quickly as possible.


DR monitoring systems explained

A relatively new category of software products has emerged that can provide actionable performance data that can be synchronized with DR plans. According to Jon Toigo, CEO at Toigo Partners International, there are three different types of disaster recovery monitoring tools. “There are software products that store information about your plan and create the plan documents. Next are tools that set up scenarios to help you fail over from one set of technology to another set, while providing data replication services. And third, there are passive tools that monitor the data protection processes.” We’ll focus on the third type of DR monitoring tools.

As a storage manager, your primary concern is likely data protection, so you’ll want something that monitors all protection-related activities, as described earlier. You may assume that normal due diligence activities would be sufficient and another specialized system wouldn’t be necessary.

“Despite your due diligence and efforts to build a high-availability data replica, you may still not have an exact duplicate of your production environment,” said Kathleen Lucey, FBCI, president of Montague Risk Management, vice president of the Business Continuity Institute’s (BCI) USA Chapter and vice chairperson of the BCI Global Membership Council. “There could conceivably be undetected incompatibilities among some components. If incompatibilities do exist, you may not know about them until you switch to the backup site and things don’t work.”

That same kind of attention must be paid to critical IT operational activities such as change management and configuration management. “Are today’s practices in service management and change management necessary to ensure the integrity of DR capabilities and plan documentation?” asked Douglas Weldon, FBCI, an IT executive with a major financial services firm and president of the BCI’s USA Chapter. “The answer is yes, these practices are indeed necessary, but additional tools are needed to fill in the gaps of on-going monitoring.”

“Aside from detecting operating weaknesses, the monitoring product should be able to flag all changes, regardless of size,” said Harvey Betan, MBCI and president of H. Betan Inc., a New York City-based business continuity consultancy. “Being an automated product, it can also inspect the IT environment faster than an individual.”

In an ideal world, data center managers have a single interface that links all monitoring systems and provides a concise, integrated dashboard of all infrastructure performance. Reports on disaster recovery performance metrics would be one of the outputs.


Disaster recovery monitoring systems perform four primary functions:

  1. Data capture and discovery
  2. Data compilation
  3. Data analysis using predefined configuration data and performance metrics
  4. Data presentation

DR monitoring systems typically connect to their intended systems via internal (e.g., LANs) and external networks (e.g., the Internet). Systems “sniff” for specific activities as defined in their logic by sending out specially designed packets to look for specific activities.

Data captured during the discovery process is analyzed according to predefined parameters. “These products collect information on applications, systems, hardware configurations, links between systems, etc., to produce a map of the IT infrastructure and the linkages,” Toigo Partners’ Toigo said. “They can also be integrated with configuration management database [CMDB] software for easy reference.” The CMDB stores data about IT infrastructure assets, relationships and configurations. But since it doesn’t have analytic capabilities, it’s difficult to effectively use that data to protect data and ensure business continuity.

Some examples of DR monitoring systems

Aptare Inc. StorageConsole 8 Fabric Manager. StorageConsole 8 is a monitoring system that addresses data storage; it’s designed to provide greater visibility and management capabilities into SANs. It’s part of an Aptare portfolio that also includes Backup Manager, Capacity Manager, Virtualization Manager and Replication Manager. The SAN mapping capability gives administrators a view of the SAN topology from server to fabric to storage systems. A change management feature performs a dependency analysis based on proposed changes to a SAN.

BMC Software Inc. Atrium Discovery and Dependency Mapping. The system features a library of predefined product configurations (32,000-plus, updated monthly); reference data for hardware power consumption and heat dissipation, and software end-of-life dates; as well as automated diagnostics that identify the location and cause of discovery issues. Software doesn’t have to be installed on discovered devices.

Continuity Software RecoverGuard. The latest version of this product, RecoverGuard 4.0, leverages data contained in the CMDB by scanning the infrastructure, performing analyses of the information it collects, and identifying issues that could impact availability, recoverability or data protection. RecoverGuard’s knowledge base contains more than 2,000 gap signatures and hundreds of potential data protection gaps.


Hewlett-Packard (HP) Co.’s Discovery and Dependency Mapping Advanced Edition (DDMA). This system automates discovery and dependency mapping of services, applications and underlying infrastructure. Mapping helps facilitate failure impact analyses, which minimize downtime. Improved visibility into the existing IT infrastructure helps reduce operational expense, defers capital expense and improves business uptime. According to HP, 80% of all service disruptions are caused by faulty changes. This product can provide visibility for improved change management.

VMware Inc.’s vCenter Application Discovery Manager. VMware’s product provides continuous discovery and mapping of applications, their dependencies and configurations. The system provides real-time visibility into the data center from an application standpoint. VMware support enables discovery of application services and configurations in virtual environments, and complements VMware vCenter Server by mapping the physical to virtual dependencies.

Pre-purchase planning for DR monitoring systems

As with any system acquisition, successfully implementing a DR monitoring application takes a fair amount of planning. Perhaps the most important part of the process is determining specific needs and what you hope to accomplish with a product. For more tips about evaluating and buying a DR monitoring app, see the chart below.

DR monitoring points to consider, buying a DR monitoring application

“Do the basics: Make sure the product works and works on your configuration,” Montague Risk Management’s Lucey recommended. “Use change control and the monitoring tool to ensure that all production changes are also concurrently applied to backup configurations.”

Betan from H. Betan agreed and added, “The product should also not use too many resources that could slow down normal processing. Be sure to test the system thoroughly before it enters production.”

New tools for an old problem

To ensure your disaster recovery efforts are performing optimally, DR monitoring tools can help by proactively spotting potential problems before they occur. “A DR monitoring system can help users achieve several key goals -- they need to be able to recover and restore their data, re-host the applications and reconnect their users,” Toigo Partners’ Toigo said. Added executive Weldon: “These systems are at an early level of maturity, and since each IT environment will be different, remember that one size will not fit all.”

By identifying potential threats in real-time, DR monitoring tools can enhance your ability to respond to and recover from emergencies, thus providing a higher level of disaster recovery and overall preparedness.

BIO: Paul Kirvan is an independent consultant/IT auditor and technical writer/editor/educator with more than 22 years of experience in business continuity and disaster recovery.

Dig Deeper on Disaster recovery storage