| New automated application recovery products, geared toward SMBs, keep Exchange running 24/7.
In the last five years, data recovery technologies have been introduced that let users recover data to almost any desired previous point in time, shorten recovery times to minutes for even very large data sets and minimize the amount of storage capacity required for data protection tasks. But there's a class of apps for which data recovery by itself isn't sufficient. They require recovery of data and of the app itself in the most automated way possible.
The ultimate objective for any recovery operation is to keep the business running; in the case of an application outage, lessening its impact on revenue or customer service is critical. The key recovery metrics are recovery point objective (RPO) and recovery time objective (RTO). It's therefore important to understand what the recovery requirements are for each app before an appropriate solution can be chosen.
CCA comes in two flavors: shared disk and shared nothing. In the shared-disk model, two or more nodes share a set of physical disks, limiting these to local configurations. In the shared-nothing model, some form of storage-centric replication is used to keep two physically separate data stores (the source and the target) in sync across a network.
While FTC and CCA solutions do a good job of managing high-availability requirements, there were several issues with these approaches. Early FTC designs from companies like Tandem Computers, Sequoia Systems and Stratus Technologies were proprietary and didn't support mainstream apps. More recently, FTC companies like Hewlett-Packard (HP) Co. and Stratus built systems using commodity hardware and software, but with a proprietary software layer to connect the mainstream operating system to the redundant hardware architecture. This results in higher costs and lengthier development cycles for new releases. CCA is generally more complex than FTC because it requires custom scripting, strict change-control requirements and sophisticated administrators; however, it uses off-the-shelf hardware and software, making it generally more applicable to mainstream applications. But the complexity of clusters makes them less attractive, especially to smaller shops.
New technologies for application recovery
Recent developments have bolstered a new high-availability computing model that comes closer to managing applications for continuity, rather than just very rapid recovery.
Transactional replication. As opposed to traditional replication, transactional replication creates a logical copy of the application and allows the target server to be in a hot standby mode. Because the target server is application-aware, certain integrity checks for physical and logical corruption can be performed as transactions are applied on the target. This ensures that if a transaction commits on the target, it's a valid transaction with valid data. Because it's a form of replication, transactional replication requires the shared-nothing disk model discussed earlier.
Transactional replication is available for most major relational database management products, as well as for the major messaging products. These capabilities are accessible through external APIs and can be easily leveraged by third parties to create high-availability configurations.
Shadow server. Running an active copy of an application on a target server has a variety of positive implications. First, because the application is already running, application recovery times are very short (measured in seconds for local configurations and minutes for remote ones). Second, as the source and target are kept in sync through replication, data RPO is very good; given that data is checked for corruption as it's applied, it's as good as can be operationally achieved with continuous data protection products. Third, because the shadow server is hosting a logical copy of the application, a variety of recurring activities like backup, archiving or data mining can be offloaded from the source to the target. Fourth, the shadow server can be used to handle any form of maintenance without impacting the source. Patches can be applied and validated first on the shadow server, ensuring higher quality in ongoing maintenance operations. In addition, the shadow server can be used to minimize downtime associated with any planned maintenance operations.
These companies ship appliances that are preconfigured to be a shadow server for a primary Microsoft Exchange server. The appliance is connected to the network and configured for transactional replication. Appliances can be deployed locally for high availability, in remote configurations for disaster recovery, or in hybrid configurations that can provide disaster recovery and high availability. Some of the offerings also include outsourcing the ongoing management of the shadow server (see "ACC products for Microsoft Exchange," below).
The native transactional replication capabilities inherent in most messaging and database products will stop processing at the source and target sites once a "corrupt" transaction has been identified. While this is important for data integrity, it doesn't support the concept of application continuity. Some of the above products can detect data corruption and fix the problem so that the app can continue to run reliably. Suspect transactions are removed from the the source and target logs, and the app is allowed to continue. Repair is attempted, and any transactions that can't be repaired are marked for review by administrators.
There are downsides to these products. First and foremost, since the shadow server will be mirroring the source application, it will contain twice the amount of storage. CCA models leverage a shared-disk store, so they're likely to require far less storage. Disk is relatively cheap, but the requirement for twice the amount of storage will limit the cost-effectiveness of these products, particularly for larger application environments.
Because the shadow server runs an active version of the primary application, application software licensing costs will double. CCA architectures require only one active license at any one time, and most cluster vendors give big discounts on the secondary application software license (if it's even required). For small applications, this second application software license may be a minimal expense, but it can become quite onerous for large applications.
As presently constituted, application continuity computing (ACC) is best suited for SMBs, not large enterprises. "We already have at least one of every kind of high-availability product there is, and while ACC offers the same types of fast recovery and support for mainstream applications we already enjoy, it wouldn't be cost-effective for our large data sets," says Steven Hirsch, senior VP of technology at NYSE Euronext.
Smaller customers, on the other hand, responded positively. "I'm a one-man IT shop and am managing over 20 applications," says David Clark, IT director at Jones Waldo, a Salt Lake City law firm. "I've already deployed one of these solutions for Exchange, and I would do it in a heartbeat to handle SQL Server." Hugh Smallwood Jr., CIO at Hagerstown, MD-based Ongoing Operations LLC, a business continuity provider for credit unions, agrees that the simplicity of the model is one of the primary reasons for deployment.
"We believe that a recovery strategy based on log replication has significant advantages over traditional shared-disk clustering, and we consciously built native support for this model into Exchange 2007 with features like CCR," says Perry Clarke, product unit manager for Microsoft Exchange Server. "The simplicity of this model makes it appealing not only for SMBs, but also for larger enterprises."
While today's ACC products are focused exclusively on Exchange, this will likely change over the next year. Several ACC vendors say they'll support SQL Server and SharePoint in the future. It's somewhat surprising that an Oracle-based product isn't yet available, but that's also likely to change in the next year. ACC vendors wouldn't talk about specific product roadmaps, but natural extensions would include better integration with other secondary data management functions such as enterprise backup (for visibility within backup catalogs), ediscovery, archiving, data classification, information tiering and destruction.
ACC isn't a replacement for backup and restore, which must still be done on a regular basis to provide for file-level and other partial restore requirements. What it does offer is advantages in the data protection arena by allowing users to offload backup operations to the shadow server. "Do-it-yourself" approaches offering similar application recovery capabilities are available, but they clearly require more sophisticated administrative expertise and, as such, are likely to have higher management costs over time. In the mid and lower range, ACC is an attractive alternative to solutions like FTC, CCA and other approaches targeted at automating application recovery.
- Focus: Disaster recovery planning and virtualisation –ComputerWeekly.com
- Virtualisation therapy for your disaster recovery plan –IBM
- Disaster Recovery Planning for Today's Real World Outages –Commvault
- How to Build a Disaster Recovery Plan Using the Cloud –Druva Software