Home > Failover and failback operations FAQ
FAQ:
EMAIL THIS

Failover and failback operations FAQ

15 Oct 2008 | Jeff Boles

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

Failover and failback operations can be crucial to the success of a disaster recovery (DR) plan. So how confident are you in your organization's failover processes? Jeff Boles, Senior Analyst with the Taneja Group, discusses the significance of failover and failback to a DR plan and provides best practices for ensuring the effectiveness of these operations.

Table of contents:

>>The importance of failover and failback for DR
>>Defining failover and failback
>>Running a system if failover and failback is delayed
>>Best practices for selecting a recovery site
>>Failover operations non-critical business operations
>>The location of the backup facility
>>Major players in the failover/failback operations space

Can you discuss the importance of failover and failback for disaster recovery?

Failover and failback are the secret sauce in executing a DR plan. Assuming that your DR infrastructure is set up appropriately (you have the actual systems you need being replicated and protected appropriately at the secondary location), then failover and failback will in fact be the most disruptive element to your DR execution.

The disruptiveness of this process is usually defined by the amount of data change you have going on at the primary location, your available bandwidth, and how your data is being copied, mirrored or replicated to that secondary location.

If you're an architect, you should be interested in minimizing the data transmission while maximizing the synchronization between sites. Then focus on how to trigger failover while minimizing the time the operation takes. There are a bunch of technologies in this area; some technologies will vary, and that may modify how well you can execute.

There are some continuous data protection (CDP)-type replication technologies out there, such as InMage, that excel in these areas, minimizing data transmissions and maximizing synchronization between sites.

Can you define failover and failback?

Failover is the process of shifting I/O and its processes from a primary location to a secondary DR location. This typically involves using a vendor's tool or a third-party tool of some type that can temporarily halt I/O, and restart it from a remote location.

This will temporarily halt I/O, suspending data copying and mirroring activity that may be going on from the primary location to the secondary location. This will then bring applications and I/O up from that remote location.

During activity at the remote site, changes are usually tracked so that their original location can be re-synchronized and restored to service by just replicating the data between the start and end of the DR event back to the primary location when it comes back up. Failback is the process of re-synchronizing that data back to the primary location, halting I/O and application activity once again and cutting back over to the original location.

How long can a system run as a failover operation if the failback process is delayed?

This is really a matter of the operations, availability and the performance level of the DR center and the ability of your staff to support ongoing operations of a remote facility. None of those aspects should be overlooked. Yet, even with the amount of time you spend at a failover operation, its potential implications are often overlooked.

Organizations should consider their DR capabilities in the context of the impact on staff and the long-term availability of the DR center. An extended outage or a far-removed facility may be impossible for you to support with your existing staff. Or, frankly, your staff just may not be available if you have a severe local disaster.

So how much can be automated to that center, in a lights-out, totally unmanaged fashion to allow failover to operate for an extended period? Also, what is the cost?

Are there any best practices for selecting a recovery site and how crucial is this step to the overall success of a failover operation?

There are lots of facilities out there today. I think more and more of them are gaining on an even playing field with each other and are delivering very similar services. They are becoming commoditized in a similar fashion and it is becoming more an issue of just a cost comparison for similar levels of capabilities between sites. You should keep in mind your requirements while running in an extended outage and the ability of your personnel when it comes to remote management of the failover facility.

Beyond this, also consider what service-level agreements (SLAs) you currently have that need to be preserved while operating in a failover state and whether a backup facility can meet those SLAs. Quite often when we're assessing a DR facility, we get myopic and think specifically of disaster-type SLAs.

We often neglect the operational SLAs we have in place internally and don't carry those over to the DR infrastructure when they may be crucial to continuing business as normal. It doesn't do you any good to maintain uptime and operations if your performance may still deteriorate to the point that you lose customers or sacrifice revenue.

Are failover operations worthwhile to systems that may not be critical to day-to-day business?

If I were undertaking a project today to evaluate what systems to protect in my enterprise, I would start with a loose consideration of the following:

I would use those criteria to evaluate systems supporting key business processes, all of the infrastructure systems that support key business processes, and then place them on a quadrant-like grid that would visually present the cost of outage and the cost of protection for each primary application in the enterprise.

By projecting the cost of an outage and considering those costs in the context of disaster risk, you may be able to better establish a threshold with senior management to what should be protected in your enterprise. So that's how I would begin assessing whether it would be critical for a day-to-day business system that enterprises normally consider less critical.

After that exercise is done, I would undertake it for general operational IT systems, like email and general infrastructure services. Then I would add one more element; the potential number of systems and business processes that are dependent upon those services.

If a disruption is caused by a local event, how crucial is the location of the backup facility?

In my view there are two aspects users should consider here: Is your disaster center isolated from a local disaster that may impact your primary location and is your remote center readily accessible in the event of an extended outage? If not, your team needs to be comfortable with remote management and you should be sure that all remote operations are supported with lights-out technology.

It's becoming increasingly practical to construct DR solutions in a cloud as well. This requires a larger shift in management for many organizations, but it can harness more cost-effective resources and be more flexible and scalable. More than likely this will require a fairly comprehensive shift to server virtualization.

The issue surrounding this approach is that services may be more difficult to guarantee with service-level agreements (SLAs). These services may be more subject to performance degradation from disasters impacting large numbers of customers. It may be very challenging trying to get a handle on what a cloud-based service provider's infrastructure is capable of; there may be unique security implications, and you may run into various compliance and regulatory hurdles depending on your industry when you start looking at hosting data in the cloud.

DR in the cloud is certainly viable. It may solve a lot of your concerns about the location of a data center because that will force you down this path of managing things in a hands-off way that can be run remotely with any type of personnel.

Who are the major players in the failover/failback operations space?

All of the major vendors have solutions and various mechanisms even exist at the post-operating system level and from backup vendors. I would go to your major solution provider partners and talk to them about their recommended approaches.

One other important aspect of failover and failback to consider is the assessment of whether your services are appropriately configured for synchronization and complete startup of a recovery environment. In general, the IT practitioner has relied heavily on the manual testing of failover and failback. But that exercise is surrounded with poor assumptions.

You're assuming that your prep for the exercise will help you mitigate issues that you would face during an unpredictable disaster. But there are discussions with end users that these exercises are often riddled with holes and failures or compromises. As a result, we're seeing an increasing number of vendors bring solutions to the table that will more holistically manage or evaluate the DR environment, including Simple Continuous and Continuity Software. These focus on managing your DR environment and setup, ongoing preservation and identifying what might be out of spec so that you can correct or automate mitigation of issues that might pop up.

Those solutions, in my view, are disruptive to the marketplace this year. They will have tremendous traction with end users and solve some big issues in large enterprises. But the value proposition is all the way down the food chain for small businesses that have a fairly complex environment to manage DR for.

Jeff Boles is a Senior Analyst with the Taneja Group.

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Disaster Recovery Facilities/Operations
Disaster recovery and business continuity planning strategies for natural disasters
IT disaster recovery (DR) plan template: A free download and sample plan
Disaster recovery and business continuity articles and podcasts by DR expert Paul Kirvan
The importance of workforce continuity in a disaster recovery plan
Disaster recovery essentials: E-Guide on DR planning and testing strategies
Twelve tips for business continuity management in a recession
Disaster recovery monitoring software offers visibility into certain DR environments
What advice do you have on choosing a disaster recovery site/failover facility? How do I know which one is right for my company?
Data center and IT systems availability in disaster recovery planning
Disaster recovery planning and operations tutorial

FAQs
Disaster recovery fundamentals FAQ
Disaster recovery replication FAQ
Disaster recovery services FAQ
Disaster recovery testing FAQ
WAN acceleration and optimization FAQ
Failover and failback in server virtualization environments FAQ

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Disaster Recovery Outsourcing - Electronic Vaulting, Hosting Services, Hot Sites
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts