What you will learn in this tip: This technical tip outlines the essentials of disaster recovery and business continuity...
planning for small- to medium-sized businesses (SMBs). Learn about SMB disaster recovery best practices and what to include in the disaster recovery planning process.
SMB disaster recovery isn't always easy, but following some key disaster recovery best practices is a good start.
It is possible for IT managers at SMBs to feel that they can easily recover from an outage because they have smaller IT environments and employ smart IT people. Conversely, there are instances where managers don’t know how to build a disaster recovery strategy. In either case, this often leads to no disaster recovery planning at all. If an SMB intends to build a DR plan, they need to follow the essentials for disaster recovery planning.
The most important—and difficult—step in disaster recovery planning is to understand how an unplanned outage would affect an organization. This step is a referred to a business impact analysis (BIA). Without the ability to determine impact of an unplanned outage in a meaningful way, it becomes very difficult to determine the type of disaster recovery strategy is needed.
An “unplanned outage” refers to any unforeseen event that interrupts normal business activity for a period of amount of time, such as an IT systems failure, fire, power outage or a natural disaster. Depending on the nature of the interruption, this can cause an organization to lose revenue, have problems with customer satisfaction, lose opportunities or possibly go out of business.
That impact is determined by identifying the most critical business activities or functions, and then predicting what would happen if those processes stopped. This is where many inexperienced planners make a mistake: They are tempted skip a few steps and go to solution mode.
DR planners should not assume there is a workaround or contingency available in case a highly critical function goes offline.
The intention is to set a recovery time objective (or RTO, which refers to how long can a process be down) and a recovery point objective (meaning how much data can be lost) for critical functions and IT infrastructure.
Businesses must determine:
- A financial value for a critical function, based on how much money is lost when the revenue stream is interrupted. An organization’s accountant can usually help with this process
- How critical each function is for the organization, based on how a function affects the revenue stream using a rating system (for example, one to five) where one is the most critical and five the least critical
- How long a business function can be interrupted before it starts affecting revenue stream
- How much client or business transaction information can be lost or recreated without seriously affecting the business
- The IT infrastructure and systems upon which the business functions depend
The next step is the risk assessment which complements the impact analysis. The impact of an outage and the anticipated risk that may exist will indicate the need to develop a recovery strategy.
Assessing risk is another area where planners can get bogged down. Do not attempt to calculate risk on the chance it could happen, or try to calculate annualized loss expectancy (which are both complex tasks). Keep it simple and be realistic about the kinds of risks your organization could face, including specific threats tied to an organization’s geographic location. A risk exists for an organization if there’s nothing in place to maintain or quickly recover a critical function.
On the other hand, if a system identified as critical is found to have adequate redundancies and protection, you can move on to the next systems and applications.
Developing a recovery strategy
Once critical functions and the supporting IT infrastructure have been identified, and the impact of an outage is quantified using a dollar value or rating, a recovery strategy can be developed to help prevent or mitigate losses.
This is also when we need to start considering any existing contingencies or redundancies already in place. For example, if a critical application is hosted by a service provider and under a service-level agreement, it is probably safe to say that little to no recovery strategy is required for that application. However, a recovery strategy is required for applications which support critical functions that lack provisions to keep those applications operational.
A specific recovery strategy is determined by an organization’s anticipated financial losses if critical functions are unavailable, as well as the time needed to recover necessary applications.
An application with a recovery time objective of within five days may do just fine with a tape backup process, but an application that needs to be up within an eight-hour business day might require remote data replication and/or standby IT systems at a recovery site. Outsourcing disaster recovery is also a viable strategy: Companies that cannot afford the cost of developing their own recovery strategy may consider paying for DR availability services or a DR-as-a-Service subscription.
The key is to always remember that the total cost of a recovery strategy should never exceed the losses it is designed to prevent.
Documenting the recovery plan
The next step is to document the recovery strategy and procedure, which forms the foundation for a disaster recovery plan. Keep it simple: Smaller businesses should not attempt to develop an enterprise-class DR plan. Very detailed disaster recovery plans take time to develop and are hard to maintain. At a high level, the disaster recovery plan should outline the priorities for system recovery, the recovery time objective, recovery procedures, as well as the location of data backups and the contacts for key recovery personnel.
Testing the plan frequently will help identify what elements are missing and need to be added, instead of discovering problems with the plan during a disaster event. Every time a recovery procedure is tested, gaps and improvements are identified and this is how plan maturity is eventually achieved.
About this author:
Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and DR planning services and corporate data protection.
Download a free small business disaster recovery template and guide
Take our quiz on disaster recovery basics
Read about the results of Symantec's SMB disaster recovery survey