What does recovery time objective (RTO) mean exactly? We find this acronym in just about every discussion, brochure or write-up about backup products or storage arrays. Many IT folks mistakenly think it refers to the time it takes to restore a system, an application and its data. That's not a RTO; that's reality and very often, the RTO and reality are miles apart.
The RTO, as its full name implies, is a goal or an ideal time in which you need a specific function or service to be available following an interruption. In essence, the RTO represents the maximum amount of time before an organization is negatively impacted by the interruption of one its core business processes or functions. For this reason, the task of establishing the recovery time objective must start at the business level and not the systems level.
Where to start
Information must be gathered from the various business units and then be analyzed, in order to draw some conclusions with respect to potential losses and the time frame within which they can be incurred. This business continuity planning process is known as a business impact analysis (BIA), and should address the following items:
What the business unit does: Create a list of the various functions or processes for which the business unit (or department) is responsible -- this includes revenue generating activities and what happens when a specific business process or function is halted.
Possible losses: Determine the financial or intangible losses an outage can cause. Financial losses include lost revenue, salaries paid to idle workers, extra expenses, fines, etc. Intangible losses include damaged reputation, negative public opinion, depreciated stock, etc.
Critical cycles: Consider the worst possible time at which an interruption occurs (i.e., quarter-end, year-end, etc) in the BIA and RTO
Dependencies: Identify applications required to perform or assist with a specific business function. Other dependencies include other business functions (input), services, key roles, etc.
Criticality of dependencies: Identify how critical dependencies are to the business function. For example, unavailability of an application that is highly critical to the function may actually halt that function.
Workaround: Create a documented manual process or contingency plan that could temporarily allow a function to be performed to buy some time, thus allowing a longer recovery time objective.
Once the potential losses have been identified, the business is in a position to make a decision regarding what it considers acceptable losses. Because losses are incurred over time, this decision also dictates the maximum outage the business can tolerate for each specific function. The RTO for the business functions must therefore not exceed that maximum tolerable outage.
Subsequently, dependencies identified as highly critical to a business function must also fall under the same RTO -- in the context of this discussion, it includes applications or other business functions and their respective applications. Ultimately, this is where the RTO for each application and supporting infrastructure is established.
In closing, recovery time must also always consider notification, response, situational assessment and procurement delays, as all these elements can consume part of the RTO before the actual recovery effort is even initiated.
About the author:
Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection.