Home > Disaster Recovery Tips > Disaster recovery tips > Checklist: First steps for limiting your downtime
Disaster Recovery Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DISASTER RECOVERY TIPS

Checklist: First steps for limiting your downtime


Rick Cook
11.08.2005
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


What you will learn from this tip: Keeping downtime to a minimum requires forethought, preparation and practice. This checklist covers what you need to do to win the downtime battle.

"The more you sweat in peace, the less you bleed in war" is as true of IT backup as it is of military operations. Downtime is the enemy and forethought, preparation and practice are the keys to minimizing it.

Downtime is a complex subject and so is the process of limiting it. The checklist that follows hits the top layer, so to speak. Each of these items has many sub-items and each of those sub-items could easily generate its own checklist, or group of checklists.

Manage changes and patches effectively
Changes, upgrades and patches are one of the most fertile sources of downtime. This includes both the planned downtime it takes to install the changes and the unplanned downtime when things go wrong.

Change and patch installation isn't an event -- it's a process; and like any process it works best when it is standardized, documented and controlled as much as possible. A good change management process includes when changes and patches should be applied, how they should be installed and how they should be tested. It also includes processes for dealing with the problems that arise. For example, you need to know what to do if the patch produces problems -- do you simply roll back to the pre-patch state or you attempt to keep the patch and fix the problem?
Related information

DR plans hindered by poor communication

Strategic Storage: DR planning blueprint

Crash Course: Recovery

Use instant-restore technologies, such as snapshots and Volume Shadow Copies
The ability to instantly restore files or to roll back a system to a recent last-known-good state is a powerful tool for minimizing downtime. While they can't replace true backups, such techniques can solve many problems, especially the most common ones.

Prioritize
Although 'downtime' has several meanings, the most practical meaning is the amount of time you are out of business because of an IT-related occurrence. Minimizing that kind of downtime is critical. Not everything needs to be restored at the same time or with the same urgency. Prioritize your business-critical applications and restore the most important first.

If you think of downtime as 'the time that part of the computer system is unavailable,' setting priorities may not improve overall downtime, but that's usually less important that business continuity.

Set goals for downtime
Your organization should have clear, measurable downtime-related goals, such as how long it will take to restore each business critical application under various conditions.

Setting these goals isn't just a matter for the IT department. They should be set by, and bought into by, the entire organization. This not only lets everyone know what to expect, it also makes it easier to invest in needed equipment and training to meet those goals.

Plan carefully
You can't think of every possible cause of downtime, but you can sure try.

Monitor constantly
The best way to limit your downtime is to catch problems before you go down. Log files are your friend. Monitor your system's performance constantly and compare current performance in critical areas to a baseline record. Pay special attention to trends. Often you can spot hardware or software problems early and fix them before they shut you down. You should have some form of automatic warning if critical parameters exceed pre-set levels or if an operation needs a large number of re-tries. Needless to say, those levels should be high enough to be significant and low enough to give you warning. Among the things to keep an eye on are performance-critical measures such as storage system throughput.

Where to set the alarm levels depends very much on the application and the nature of your installation. Vendors can usually offer you guidance on their hardware and software.

Test and drill regularly
Planning is wonderful, but it's not execution. The hard fact is that a depressingly large number of emergency restores -- something like two-thirds by some estimates -- suffer significant problems or fail entirely. Even something as simple as a misplaced (or worse, mislabeled) tape can add hours to your down time.

Human ingenuity being what it is, we can usually find workarounds. However, you end up working a lot harder for a lot longer and sweating a lot more than if you'd tested everything out before hand.

The only way to make sure you can execute your plan is to test it constantly. At the very least, make sure your restoration procedures work by doing test restores and comparing the results with the original files. It's better to test the entire recovery procedure from beginning to end, and best to conduct regular recovery drills to make sure everything works and everyone involved is prepared.

Document everything
When the system is down you should never have to guess and never have to experiment. Ideally, you should have all the information you need at your fingertips, including all the required procedures to get back up. This should all be filed and cross-indexed, and you should store at least one copy in a separate location other than the original computer. You should also keep a copy of your current documentation offsite.

Among the items you need are the numbers of all the current versions of software and firmware you are using, including patches, complete system configuration information and a duplicate of your tape inventory detailing what is stored on which tapes. It's also a good idea to keep lists of where recovery-related procedures are found in the documentation and current lists of phone numbers for vendor contacts.

Invest appropriately
While much of minimizing downtime is simply a matter of proper procedures, some of it requires investing in the right hardware and software. Consider your recovery goals and look for bottlenecks caused by your present hardware and software. Then spend the money to eliminate those bottlenecks.

You will often trade money for protection or speed. RAID arrays with hot swapping and dual power supplies are more expensive but they can prevent a lot of downtime.

Sometimes architectural changes can reduce downtime. For example, disk-based backup is more expensive than tape, but a disk-based backup system or a disk-to-disk-to-tape system can enormously reduce downtime. The only way to know if the expense is worth it for your enterprise is to do your own analysis.

For more information:

Expanding the reach of disaster recovery


About the author: Rick Cook has been writing about mass storage since the days when the term meant an 80 K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last 20 years, he has been a freelance writer specializing in storage and other computer issues.


Rate this Tip
To rate tips, you must be a member of SearchDisasterRecovery.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Disaster recovery tips
How to prepare and plan for a pandemic disaster
Disaster recovery plan basics: Updating and reviewing DR plans
Metrics for measuring business continuity management performance
IT disaster recovery and business continuity planning for non-catastrophic disasters
Is your disaster recovery (DR) plan out of date?
The pros and cons of network-based data replication
The importance of workforce continuity in a disaster recovery plan
Twelve tips for business continuity management in a recession
Disaster recovery planning fundamentals: DR testing basics
Microsoft SharePoint disaster recovery strategies

Disaster Recovery Planning/Management
Exploring Microsoft Windows clustering and high-availability tools in disaster recovery
How to prepare and plan for a pandemic disaster
Disaster recovery plan basics: Updating and reviewing DR plans
Disaster recovery news briefs: SteelEye supports disaster recovery and business continuity for Windows Server 2008 R2
Metrics for measuring business continuity management performance
Iowa Health System uses 'cloud' for disaster recovery to survive flood
Disaster recovery and business continuity planning strategies for natural disasters
Easy ways for SMBs to improve their disaster recovery and pandemic plans
Disaster recovery news briefs: Riverbed updates Riverbed Optimization System software
IT disaster recovery (DR) plan template: A free download and sample plan

Disaster Recovery Storage
Exploring Microsoft Windows clustering and high-availability tools in disaster recovery
Disaster recovery news briefs: SteelEye supports disaster recovery and business continuity for Windows Server 2008 R2
Iowa Health System uses 'cloud' for disaster recovery to survive flood
Disaster recovery news briefs: Riverbed updates Riverbed Optimization System software
Data deduplication makes disaster recovery and data replication easier
IT disaster recovery and business continuity planning for non-catastrophic disasters
VMware upgrades Site Recovery Manager for disaster recovery
The pros and cons of network-based data replication
AppAssure and InMage's continuous data protection apps include replication for disaster recovery
Data replication technologies and disaster recovery planning tutorial

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Disaster Plan - DR Best Practices, Mitigating Risk, Disaster Business Impacts
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts