Disaster recovery (DR) monitoring tools are put in place to help monitor changes that occur in data storage environments, and can notify users of these changes before a disaster occurs. Because they monitor changes, DR monitoring tools can often boost data protection and simplify disaster recovery processes. They have other added benefits, such as reducing overall disaster recovery costs and eliminating the need for a large IT staff. Jon Toigo, CEO and Managing Principal of Toigo Partners International, discusses DR monitoring tools and how they work in this Q&A. His answers are also available as an MP3 below.
A disaster recovery monitoring tool is a piece of software along the spectrum of disaster recovery software. There are three different types of disaster recovery monitoring tools. There are software products that are used to store information you've collected about your plan, and perhaps even create the planning documents. Then there are tools that help you set up scenarios of software that help you failover under certain circumstances from one set of technology to another set of technology, and provide background services that are required in order to replicate data. And third, there are passive tools that basically aggregate the data protection processes that are going on. Whether they're data backups and you get an ongoing report on the successful completion of backups, or they're monitoring ongoing site-to-site data replication over a wide-area network (WAN) where the technology has been deployed to replicate data across networks, which is also known as a tapeless backup environment.
All of these disaster recovery tools are very important, but what we need to boil down to is data protection, which is at the core of disaster recovery. Without data you're dead in the water, so then there would be no reason to make a recovery effort.
How can DR monitoring tools help with data protection?
In the area of data protection, obviously we've got different methodologies being used: tape backup, disk-to-tape and in some cases disk-to-disk, whether it's local or remote mirroring across wide-area networks. Most companies already probably have a lot of data protection processes, and some of them are performed by the software that they already use. For example, Oracle has its own backup procedure and drives its own data out into a data backup environment. And certain Microsoft applications have built-in data protection capabilities and perform backups. You can even do it on your desktop in Windows 7. In addition to software solutions in business applications and operating software tools, a lot of people go out and buy data backup software or replication software. This software is freestanding, and makes copies of whatever target you point it at to whatever target you point it at. In the case of backup, it may go to a set of disks, or from the disk migrated over to tape, or it might just go directly to tape. Of course in site-to site replication it's just replicating data from one location to another location. Oftentimes virtual tape libraries (VTLs) are used in that role where you back up to disk, and then there's software that's used to replicate the images or the data sets that have been backed up to those disks across the WAN to a comparable set of hardware somewhere else.
The nice thing about using a third-party software instead of using built-in data backup and replication technologies is that you basically aren't owing your soul to the company store. You don't have to buy the same vendor's gear at both the local site and the remote site. So these third-party services also come into place with data protection mechanisms. The third approach is that vendors try to build onto their hardware. You may have value-add features that do automatic replication and point-in-time splitting of mirrors and point-in-time data sets that are pushed across the WAN to the exact same, identical hardware over at the recovery center. And that's all handled by the storage array that has extra software on it so that you don't have to worry about it on an ongoing basis.
So you've got three layers here. You've got the processes performed on the operating system on diskless applications. You've got the third-party software for data replication and those processes, be they backup or replication. And then you've got hardware approaches used by many hardware vendors in the storage arena. The real task as your company grows is to coordinate and confirm that all those processes are happening as expected. It's a hassle to break a mirror and to stop a disk-to-disk replication or mirror process and validate that the right data is actually being replicated on the target. Typically you have to quiesce the application and stop the mirror, and then you have to do file-by-file pairs on both the local and the remote site in order to validate that the mirror is working. Backups tend to have a tendency to provide you a report at the end of the completion of the job or notice that a job has not completed. So if you're using tape backup, you have a lot of information that's generated by that process. And to a greater or lesser extent, some of the business applications and operating system-level procedures will also give you some reports.
How do you coordinate all of the different aspects of DR monitoring tools?
Let's say you've got thousands of applications and they're distributed all over your environment and you don't have a single pane of glass where you can coordinate and aggregate all of that information. So you have to confirm that everything is working as expected and that you have a balanced set of data to recover from. That's a huge challenge and a huge cost accelerator in disaster recovery. Typically you have to allocate personnel and their time to actually manually check and track all of those outcomes. The purpose of what we're calling DR software in this context is to aggregate those processes and to basically deliver them to you in a sensible way. I can go into a couple of different flavors of those. There are products -- let's call them simple aggregators -- such as Continuity Software's RecoverGuard that can monitor these processes. It goes out and it looks at all the mirroring going on in the hardware and it gives you a view on an ongoing basis of the solvency of those mirrors and those data replication processes that are hardware controlled. So if you've got EMC Corp. arrays that are replicating to other EMC arrays, it will monitor those processes and reassure you that you are in fact doing some data replication. It doesn't get granular enough to tell you that you're replicating the right data, but it does tell you that the replication process you set up is occurring on an ongoing basis and without interruption.
What are the limitations of DR monitoring tools?
The limitations of an aggregator approach that's very hardware focused is that you're not able to see software replication with that particular product. You can't see, for example, if you're running a CA XOsoft or you're running a Neverfail or any of these other software-based data replication products, you can't trap them or report them on the screen with RecoverGuard. You only see certain hardware replication products and only the ones that RecoverGuard has prioritized and added to its list of supported hardware platforms. There's a certain exercise they have to go through to get access to an application programming interface (API) from the vendor. They have to collect that information from the API, normalize it and then present it on their screen. All of this requires a relationship with the vendor and they have to be certified with that vendor, etc. So you're limited in the number of types of equipment that you can monitor, the number of replication processes you can monitor, and you can only monitor hardware related replication processes using that product.
Could software products such as CA XOsoft and Neverfail be used as DR monitoring tools?
These products are not strictly speaking just for DR monitoring or data protection monitoring. They do that as an additional value-add feature. They're primarily known as geoclustering tools. They allow you to actually replicate the data and in some cases provide push-button failover for the application from one designated site to another designated site. And they do that all on software so you don't need to have the same equipment at both the local and remote site. It's kind of a cool strategy if you can avoid the network costs and rationalize the need for the technology based on the requirements of the application itself.
The nice thing about the CA XOsoft product, which I know rather well because I've used it a lot, is that it can also aggregate input and the reports that are coming out of the software I may be using in parts of my organization to do tape backups. I get all of the backup information normalized and reported on the same screen being used to monitor ongoing disk-to-disk based replication, and that's handled by that product. So this isn't tracking the hardware-level stuff. It doesn't do any work in securing API access to hardware platforms, and in fact what they're doing is kind of usurping the value-add that you're paying a lot of money for on the hardware -- that functionality that allows you to do the mirroring. So instead of doing hardware-to-hardware mirroring across like hardware, you can do hardware-to-hardware mirroring at a software layer that is completely hardware agnostic.
How can DR monitoring tools reduce your costs and simplify DR?
Jon ToigoCEO, Toigo Partners International
There are a lot of cost savings from using disaster recovery monitoring tools. Basically what I'm able to do for my mission-critical applications is I'm able to track and make sure my mirroring is on for my always-on applications, and that's accomplished using this software. And I can also make sure that my tape backups are going swimmingly for those applications that don't need instantaneous failover, whose absence I can deal with for hours or days following an interruption event. But I do aggregate all those processes onto a single pane of glass and that means I can have one or two people monitoring all the data protection I've got going on in my shop. Now, the CA product and the Neverfail and some of the others also are tied into other applications as well. They can monitor and report on things that are going on in VMware for example, or in Hyper-V where there are a lot of data replication services that are available for use. In the case of CA, they can also monitor the activity that's going on in the Microsoft environment. So if I am using an app that can perform its own backups, I can trap and report that information as well.
That's basically your range of options. You do this in software, which sometimes ignores hardware value-add features, or you do it with a product like RecoverGuard that looks at the hardware based replication processes but ignores the software processes. It'd be nice if they two of them could kind of get their act together and join together and come up with some common interoperability that would allow them to share data. So if you're using a mixture of hardware-based replication for data protection and also software-based functionality, you'd still be able to aggregate all of them onto a single screen. That's really the point. The other value of this is that it reduces the testing load that is on you. Testing DR is the long-tail cost of disaster recovery. The acquisition of hardware and software components for DR is a very small sliver of the overall budget that's required to do disaster recovery. The remainder of the budget is tied up with the personnel and the time required to do testing. Of course you need to test the plan in order to keep it up to date with the changes in the business and in the infrastructure.
What are the major benefits of DR monitoring tools?
If you use a DR monitoring tool, you can actually spot gaps a lot more readily than you can without them. And you may be able to address those gaps on a day-to-day basis as well as perform a failover everyday if you want to just to validate that the disaster recovery strategy that you have in place is actually going to work if you ever needed it. And none of that is disruptive to your current environment. And none of that takes anybody off premise. There's no need to schedule huge tests to test that functionality. So basically one of the biggest and most time consuming aspects of testing, which is the actual performance of recovery or failover can be handled on a day-to-day basis as part of normal operations. And that's what one of these DR monitoring tools can bring to the party -- is the ability to shortcut the testing process so when it does come time to test you haven't eliminated the need for testing.
Basically all you're doing is testing logistics. You're testing if everyone knows the job they're supposed to do -- is our call tree up to date and accurate; do I have the right contact names and keys for unencrypting data -- all the other things that go into a successful recovery we can use to test those things, not to wait for recoveries to occur, to rebuild platforms or restore data from tape. That stuff can be done every day if you're using a good DR monitoring tool. You can have high confidence that you've got good data, now the only thing we have to test is: do the users know what they're doing, do the key team members of a recovery team know what they're doing. That's a much less exhausting set of tasks and less prone to failure and less prone to jamming up the testing process or requiring a lot of re-tests. I think it's a win-win to use a good DR monitoring tool.
What companies should be using DR monitoring tools and how do you know if they are right for you?
A small- to medium-sized business (SMB) doesn't need DR monitoring tools. In a small company, all its data protection is accomplished by using, if you wanted to, a single DVD. I always tell small firms that the only thing that stands between them and successful recovery of your business if you ever had a disaster is a one-dollar piece of media. And all you have to do is make your back up to a DVD or a flash key or whatever you want to use on a weekly basis, depending on the rate that data changes, the sensitivity of that information and what line of business you're in -- you may also want to encrypt it, too. They typically don't have the complexity of the infrastructure, applications or the different types of data that they're replicating -- they don't need DR monitoring tools. However for a medium-sized or larger company where they have tens or hundreds, or even thousands of different business processes supported by different applications and different data sets, they've got a coordination nightmare and the only way to really drive cost out of that kind of infrastructure is to deploy a disaster recovery planning tool.