Automate application recovery

Today's application continuity computing (ACC) products are best suited for small- and medium-sized businesses, and are focused exclusively on Exchange, which most companies now consider a business-critical application. But the concentration on Exchange will likely change over the next few years, as several ACC vendors plan support for SQL Server and SharePoint in the future.

New automated application recovery products, geared toward SMBs, keep Exchange running 24/7.

Every company has certain applications that must be managed for high availability. Whether driven by business or regulatory mandates, these applications must be available 24/7 or as close to that as possible. Keeping these applications up and running while performing all of their necessary administration and maintenance is a real challenge. Products that meet recovery requirements for both data and apps are starting to appear; however, they're currently geared toward small- and medium-sized businesses (SMBs) and concentrate mainly on Microsoft Exchange.

In the last five years, data recovery technologies have been introduced that let users recover data to almost any desired previous point in time, shorten recovery times to minutes for even very large data sets and minimize the amount of storage capacity required for data protection tasks. But there's a class of apps for which data recovery by itself isn't sufficient. They require recovery of data and of the app itself in the most automated way possible.

The ultimate objective for any recovery operation is to keep the business running; in the case of an application outage, lessening its impact on revenue or customer service is critical. The key recovery metrics are recovery point objective (RPO) and recovery time objective (RTO). It's therefore important to understand what the recovery requirements are for each app before an appropriate solution can be chosen.

RTO/RPO requirements
Applications with high-availability requirements usually have very short RPO/RTO requirements. For those environments, some form of automated application recovery is often deployed. Two distinct architectures have been available in this space: fault-tolerant computing (FTC) and clustered computing for availability (CCA). FTC uses fully redundant servers with specialized operating systems that run a mirror copy of the application across both "halves" of the server, providing instant and fully transparent recovery in the event of a hardware failure. CCA uses separate server "nodes" connected across a network, as well as fault-management software that monitors for failures and can switch an application workload (including an application, its clients and data) to another node in the cluster. Both approaches do a good job of addressing planned and unplanned downtime, allowing companies to manage application availability to very high levels. FTC platforms usually provide higher overall application availability than CCA architectures.

CCA comes in two flavors: shared disk and shared nothing. In the shared-disk model, two or more nodes share a set of physical disks, limiting these to local configurations. In the shared-nothing model, some form of storage-centric replication is used to keep two physically separate data stores (the source and the target) in sync across a network.

While FTC and CCA solutions do a good job of managing high-availability requirements, there were several issues with these approaches. Early FTC designs from companies like Tandem Computers, Sequoia Systems and Stratus Technologies were proprietary and didn't support mainstream apps. More recently, FTC companies like Hewlett-Packard (HP) Co. and Stratus built systems using commodity hardware and software, but with a proprietary software layer to connect the mainstream operating system to the redundant hardware architecture. This results in higher costs and lengthier development cycles for new releases. CCA is generally more complex than FTC because it requires custom scripting, strict change-control requirements and sophisticated administrators; however, it uses off-the-shelf hardware and software, making it generally more applicable to mainstream applications. But the complexity of clusters makes them less attractive, especially to smaller shops.

Defining ACC solutions
Several distinct capabilities set application continuity computing (ACC) products apart from existing fault-tolerant computing (FTC) and clustered computing for availability (CCA) alternatives.

Simplicity of deployment. The ACC model is built around pre-packaged, application-specific hardware/software solutions targeted for easy deployment.

Transparent recovery. Because ACC products maintain a hot standby on a shadow server, recovery occurs within seconds for local configurations, and the same application capabilities are supported in both pre- and post-failover modes.

Based on industry-standard hardware/software. ACC products use commodity hardware/software components and leverage the native data validation capabilities of the application.

New technologies for application recovery
Traditional replication products have been storage centric, replicating at either the block or volume level. They create a physical copy of the data and require the target server to be in standby mode. In addition, these products don't offer any way to monitor and detect logical and/or physical corruption in application objects.

Recent developments have bolstered a new high-availability computing model that comes closer to managing applications for continuity, rather than just very rapid recovery.

Transactional replication. As opposed to traditional replication, transactional replication creates a logical copy of the application and allows the target server to be in a hot standby mode. Because the target server is application-aware, certain integrity checks for physical and logical corruption can be performed as transactions are applied on the target. This ensures that if a transaction commits on the target, it's a valid transaction with valid data. Because it's a form of replication, transactional replication requires the shared-nothing disk model discussed earlier.

Transactional replication is available for most major relational database management products, as well as for the major messaging products. These capabilities are accessible through external APIs and can be easily leveraged by third parties to create high-availability configurations.

Shadow server. Running an active copy of an application on a target server has a variety of positive implications. First, because the application is already running, application recovery times are very short (measured in seconds for local configurations and minutes for remote ones). Second, as the source and target are kept in sync through replication, data RPO is very good; given that data is checked for corruption as it's applied, it's as good as can be operationally achieved with continuous data protection products. Third, because the shadow server is hosting a logical copy of the application, a variety of recurring activities like backup, archiving or data mining can be offloaded from the source to the target. Fourth, the shadow server can be used to handle any form of maintenance without impacting the source. Patches can be applied and validated first on the shadow server, ensuring higher quality in ongoing maintenance operations. In addition, the shadow server can be used to minimize downtime associated with any planned maintenance operations.

New players
By combining transactional replication and shadow server technologies, a small number of storage vendors are delivering products designed to manage application continuity, rather than just provide fast application recovery; current products (with one exception) are available only for Microsoft Exchange. Vendors with offerings in this space include Cemaphore Systems Inc., Sonasoft Corp. and Teneros Inc.

These companies ship appliances that are preconfigured to be a shadow server for a primary Microsoft Exchange server. The appliance is connected to the network and configured for transactional replication. Appliances can be deployed locally for high availability, in remote configurations for disaster recovery, or in hybrid configurations that can provide disaster recovery and high availability. Some of the offerings also include outsourcing the ongoing management of the shadow server (see "ACC products for Microsoft Exchange," below).

The native transactional replication capabilities inherent in most messaging and database products will stop processing at the source and target sites once a "corrupt" transaction has been identified. While this is important for data integrity, it doesn't support the concept of application continuity. Some of the above products can detect data corruption and fix the problem so that the app can continue to run reliably. Suspect transactions are removed from the the source and target logs, and the app is allowed to continue. Repair is attempted, and any transactions that can't be repaired are marked for review by administrators.

There are downsides to these products. First and foremost, since the shadow server will be mirroring the source application, it will contain twice the amount of storage. CCA models leverage a shared-disk store, so they're likely to require far less storage. Disk is relatively cheap, but the requirement for twice the amount of storage will limit the cost-effectiveness of these products, particularly for larger application environments.

ACC products for Microsoft Exchange
MailShadow from Cemaphore Systems Inc. A software-based product designed to run on network-attached Wintel platforms, MailShadow provides a hot standby Exchange server and supports failover at the Active Directory level. Because it uses MAPI to replicate Exchange objects, protection can be configured at the mailbox level (in addition to the entire server) and it can support n-into-one configurations where multiple source mailboxes on multiple Exchange servers can be replicated to a single, centralized Exchange server. MailShadow can also be used to nondisruptively migrate to Exchange 2007.

SonaSafe for Exchange Server from Sonasoft Corp. Deployed as a network-based appliance, SonaSafe is an integrated backup and replication platform that can be used for all Windows-based apps such as Exchange, SQL Server and files. Using MAPI to replicate Exchange objects, SonaSafe supports the ability to protect configurations at the mailbox level. It maintains an active Exchange standby server, providing a platform for fast failover either locally or remotely for Exchange 2000, 2003 and 2007.

Teneros Application Continuity Appliances for Microsoft Exchange from Teneros Inc. Teneros offers managed services for Exchange 2000, 2003 and 2007. The ACA resides at the user's site and is managed remotely by the Teneros Network Operations Center. Transactions are replicated using MAPI (supporting mailbox-level protection) or by directly accessing Exchange log files (for lower overhead on the production Exchange server). Teneros supports full Outlook client functionality in pre- or post-failover modes.

Because the shadow server runs an active version of the primary application, application software licensing costs will double. CCA architectures require only one active license at any one time, and most cluster vendors give big discounts on the secondary application software license (if it's even required). For small applications, this second application software license may be a minimal expense, but it can become quite onerous for large applications.

As presently constituted, application continuity computing (ACC) is best suited for SMBs, not large enterprises. "We already have at least one of every kind of high-availability product there is, and while ACC offers the same types of fast recovery and support for mainstream applications we already enjoy, it wouldn't be cost-effective for our large data sets," says Steven Hirsch, senior VP of technology at NYSE Euronext.

Smaller customers, on the other hand, responded positively. "I'm a one-man IT shop and am managing over 20 applications," says David Clark, IT director at Jones Waldo, a Salt Lake City law firm. "I've already deployed one of these solutions for Exchange, and I would do it in a heartbeat to handle SQL Server." Hugh Smallwood Jr., CIO at Hagerstown, MD-based Ongoing Operations LLC, a business continuity provider for credit unions, agrees that the simplicity of the model is one of the primary reasons for deployment.

"We believe that a recovery strategy based on log replication has significant advantages over traditional shared-disk clustering, and we consciously built native support for this model into Exchange 2007 with features like CCR," says Perry Clarke, product unit manager for Microsoft Exchange Server. "The simplicity of this model makes it appealing not only for SMBs, but also for larger enterprises."

While today's ACC products are focused exclusively on Exchange, this will likely change over the next year. Several ACC vendors say they'll support SQL Server and SharePoint in the future. It's somewhat surprising that an Oracle-based product isn't yet available, but that's also likely to change in the next year. ACC vendors wouldn't talk about specific product roadmaps, but natural extensions would include better integration with other secondary data management functions such as enterprise backup (for visibility within backup catalogs), ediscovery, archiving, data classification, information tiering and destruction.

When considering high-availability options for Microsoft Exchange, SMBs should take a look at ACC solutions. While ACC isn't up to challenging existing high-availability solutions for large environments, it provides a simple deployment model for smaller applications that requires less care and feeding over time than other data protection approaches, with the added advantage of data validation.

ACC isn't a replacement for backup and restore, which must still be done on a regular basis to provide for file-level and other partial restore requirements. What it does offer is advantages in the data protection arena by allowing users to offload backup operations to the shadow server. "Do-it-yourself" approaches offering similar application recovery capabilities are available, but they clearly require more sophisticated administrative expertise and, as such, are likely to have higher management costs over time. In the mid and lower range, ACC is an attractive alternative to solutions like FTC, CCA and other approaches targeted at automating application recovery.

Dig Deeper on Disaster recovery planning - management