freshidea - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Why monitoring your data protection policy is like herding cats

Ensuring your data protection policy will actually work when needed may require the use of multiple monitoring tools.

As much as the server hypervisor and software-defined data center evangelists would like to argue that high-availability (HA) clustering with failover has replaced traditional data protection and business continuity (BC) techniques, the truth is quite a bit more complex. HA has always been part of a spectrum of techniques available for building data protection policy and application recovery strategies, but not every application requires the "always on" method of recovery or merits the high costs associated with HA.

Done properly, data protection requires the assignment of the right set of protection services to the right data. The "right" service in this context is defined as the most appropriate, given the criticality of the application or business process being protected and its associated recovery time requirements. The "right" data refers to data that fits in a particular class. Data has no value except for what it inherits -- like so much DNA -- from the business process it serves, and since not all business processes are equally critical to an organization, it stands to reason that data is not all of the same class. So, planners probably don't need to spend the money or effort delivering HA services to non-critical workloads that don't need to be restored for days -- or even weeks -- following an interruption event.

The bottom line is that there's no "one size fits most" data protection strategy. For most shops, a data protection strategy should combine different technologies to deliver various recovery times deemed appropriate to a given workload. Often, a data protection policy evolves over time, with different tools and utility software being brought to bear to provide layer over layer of data protection, usually in response to perceived levels of threat.

Beyond replication

With some data, organizations go to great pains to protect and validate data at the time it's first created, as a hedge against errors in interfaces and protocols that might corrupt bits or to protect against faulty media. Above the level of media I/O, other techniques may be applied to replicate data as a way to circumvent application errors (continuous data protection), equipment failures (mirroring) or facility issues (backups with offsite storage or synchronous replication across metropolitan area networks). And with 100-year disasters (severe weather events, floods and so on) now seemingly occurring every year, remote asynchronous replication to a backup facility at least 80 kilometers distant is considered the gold standard for protection against big geography events.

Read more about disaster recovery/business continuity

Tech tip: Tools to assess your DR readiness

Expert Q&A: Best practices for DR testing

Testing tips: Advice on DR testing scenarios

Truth be told, a mix of data protection processes is probably being used for most workloads today. Some processes are initiated and controlled by the app software itself, the database undergirding the app, or by the operating system or hypervisor software if you're running a virtual server. Add in third-party utility software (e.g., backup software from ARCserve [formerly from CA Technologies], IBM Tivoli, Symantec and so on) that's been purchased to perform backups or take snapshots, or data protection services provided at a data management layer (e.g., CommVault or Tarmin) or at a storage virtualization layer (e.g., DataCore), and managing a multi-layer "defense in-depth" strategy can quickly become complicated.

And that's not all. If you add in hardware-based data protection services -- including on-array mirroring, replication and snapshots performed using value-added software "joined at the hip" to proprietary array controllers -- the data protection service mix mutates into something approximating a herd of cats that is increasingly difficult to monitor and manage.

Cutting through the complexity

Therein lies the appeal of the myth of a single "HA strategy for everything" strategy, enabled by server virtualization. While it would be nice to simplify data protection with clustered failover, the facts on the ground don't support that approach.

For one thing, even if the most optimistic projections of hypervisor deployment hold true, 2016 will still see approximately 21% of mission-critical x86 workloads (revenue producing high-performance transaction-processing apps) running on 75% of server hardware without a hypervisor. So, separate strategies will be required for virtualized and non-virtualized servers.

Within the virtualized x86 workloads, some virtual machines (VMs) and their data "disks" (VMDK and VHD files) are simply less important than others. As with pre-virtualized environments, there are many virtual apps, but not all are mission-critical. And like traditional environments, some apps/VMs are used a lot and others not so much, a fact that impacts the frequency with which data needs to be backed up or copied.

If anything, server virtualization has added to the complexity of data protection. Consider that the leading hypervisor software vendor now encourages its users to break up their SANs and deploy direct-attached storage (DAS) behind each virtual host system instead. Once this is done, the vendor recommends the use of on-array mirroring services to replicate data from one "server-side" DAS array to any other connected to a server that might possibly host a given guest machine in the future. The resulting proliferation of replication processes and related traffic can strain local-area networks and exacerbate an already knotty problem with mirroring: Most mirrors go unchecked owing to the dicey process involved in checking mirror consistency. (Checking a mirror typically requires you to quiesce the application, flush the cache to disk one, replicate to disk two, shut down replication, compare the contents of disk one and two for consistency, cross your fingers, restart the mirror, restart the app and pray that everything synchronizes. And then you need to repeat the process for each mirror.)

Monitoring a collection of DR processes

In the final analysis, the current reality is that numerous hardware and software processes are typically brought to bear in the hopes of protecting data from a broad range of threats. Each process has a different price tag and offers a different time-to-data value, and thus must be fitted appropriately to available budget and recovery time objectives/recovery point objectives. The processes must also be coordinated and scheduled to avoid having any deleterious effects on application/VM performance or network throughput. Ideally, the processes should be transparent to users but capable of full monitoring and ad-hoc testing by administrators to confirm that the expected protection is being applied to the expected data targets, and that the result is a redundant data set capable of being recovered within a specified timeframe.

Watch these videos to learn more about DR/BC

Jon Toigo on the goals of a DR/BC program

Are you capturing your company's critical data?

The role of data in a recovery effort

Issues to consider in your organization's protection plan

The ability to monitor the data protection service is vital to verifying that it's being applied correctly. The ability to test the results of the data protection service on an ad-hoc basis is key to reducing the burden placed on formal plan testing, the "long-tail cost" of continuity planning. If you can confirm continuously that the right data is being backed up and can be restored when and if necessary, there's no need to test data recovery in a formal test event. That will cut down the testing work and reduce associated costs.

The challenge to monitoring and validating data protection service processes and outcomes is a lack of any comprehensive technology that will capture information about the processes and feed them back in a coherent single pane-of-glass dashboard. The world of data protection service monitoring remains bifurcated.

Excellent products, such as ARCserve Unified Data Protection (UDP), provide a combined set of software-based services including tape backup, disk-to-disk replication and mirroring, and replication with clustered failover that can be deployed readily and monitored coherently using the product's own dashboard application. Similar capabilities can be gleaned from other "geo-clustering" products, such as Neverfail Group's Neverfail product line (which is used under the covers in many VMware HA solutions to provide distance failover) and Vision Solutions' Double-Take software.

Two dashboards may be better than one

Unfortunately, these products don't capture or report status information on data protection services created and managed using different software or hardware-based initiators. Perhaps the best hardware-centric monitor is Continuity Software's AvailabilityGuard suite. The company addresses the key challenges confronting business continuity and data protection: configuration drift, synchronicity gaps in mirrors or replicas, and the inefficacy of using formal disaster recovery (DR) tests to validate data restorability. Arguably, they are furthest along with technology to resolve these issues by providing near real-time information on process status -- for the processes they monitor, that is. In contrast to products such as ARCserve UDP and others, Continuity Software products tend to focus on hardware-based data protection services and not software-based services.

In the best of possible worlds, it may be necessary to have at least two consoles to monitor a defense-in-depth data protection strategy encompassing both hardware and software services. However, the industry's attention to (and consumer interest in) unified data protection (a term first coined by Unitrends, but coopted today by numerous vendors) bodes well for future prospects of unifying data protection service monitoring and management functions.

Recently, EMC has been touting its Data Protection Advisor as a unified data protection management solution, referring to the product's capability to automate and centralize the collection and analysis of data protection services. Unfortunately, according to its datasheet, Data Protection Advisor supports EMC's storage hardware processes exclusively, but it enables the integration of status information from third-party backup software products, database platforms and VMware-controlled replication processes. Not quite a true Swiss Army knife, but getting a bit closer.

What's really needed is a distributed systems equivalent of 21st Century Software's mainframe-oriented DR/VFI. The company was an early innovator in providing transparency in mirroring so that mirrored volumes could be compared for consistency without taking down the application. Now, it's making great strides in bringing visibility to active-active clustering and synchronous replication processes. This is technology with tremendous application to IT in all of its topologies.

Also on a wish list for the future is the RESTful enablement of data protection services. Given that the core of data protection is basic data copy, using simple RESTful protocols for moving data copies between targets -- whether two disks, two arrays or two data centers -- would seem to provide a standardized, expedient approach that could truly "universalize" the delivery of data protection services.

For now, unified data protection service management is a lot like the term integrated network management of a decade ago. The term has several hundred meanings ranging from "all products are listed on the same brochure and are therefore integrated" to "all processes share a common database architecture." For now, read all spec sheets carefully before you select a product for deployment. Better yet, use the free trial copies available for most of these products to determine which one(s) deliver the capabilities you need to herd your data protection cats.

About the author: 
Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.

Dig Deeper on Disaster recovery planning - management