Managing and protecting all enterprise data


Manage Learn to apply best practices and optimize your operations.

The search for cost-effective disaster recovery

Creating an efficient DR strategy starts with determining the value of your company's applications and data. You can find the right mix of DR technologies to protect your data without breaking the bank.

Establishing data's value
Mission-critical data is the organization's most vital data. It's usually tied to primary business processes, primary applications and service-level agreements (SLAs). Losing access to this data means risking organizational death. Lost data almost certainly means loss of business and revenue and potential lawsuits. The organization must have access to this data regardless of the cost. Examples include online transaction processing (OLTP), order entry, customer service and current e-mail. If an application has a recovery point objective (RPO) of immediate or near immediate, and a recovery time objective (RTO) of immediate or near immediate, it's mission-critical data.
Essential data is very important to the organization. It's commonly used for critical, day-to-day operations. The data doesn't require instantaneous recovery to keep the organization running. Essential data may be classified by the organization as secret. Examples include engineering software development code, marketing plans, product pricing, sales forecasts, customer account lists, prospect lists, accounting, etc. Essential data usually has an acceptable RPO of seconds and, in some cases, even minutes. The acceptable RTO is typically minutes to hours.
Important data is used by many day-to-day operations and applications. RPOs of minutes or hours are commonly acceptable without severely hampering operations. RTOs can be from minutes to hours, days or even weeks. Examples of important data can be HR records, contact lists, calendars and schedules.
Less-critical data has relatively low organizational value. Corrupted, damaged, even irretrievably lost data can be reconstructed with nominal effort and cost. There are typically multiple copies of the data and security isn't paramount. RPO ranges from hours to days, weeks and, in some cases, even months. RTO ranges from hours to days. Less-critical data is the largest amount of stored data, and may include e-mail archives, personal data files and historical digital records. Much of this data can be kept off site.
This is an incredibly scary time to be a disaster recovery (DR) manager. It's no longer if a disaster will strike; it's when, how often and of what magnitude. DRis your company's insurance against what might otherwise be a debilitating incident. DR prices are coming down and the options to protect your data are growing. Picking the right combination of options is the key to being cost effective. But before you can select the right combination of DR options, you must establish an application/data classification foundation (ADCF). The following describes how to classify data by its value and then pick the tools (replication, backup, mirroring, snapshot, etc.) that will protect the various classes of data while also keeping your company in compliance with regulatory requirements. The ADCF also lays the groundwork for a comprehensive, workable information lifecycle management process.

Risk-to-benefit actuarial analysis often shows the cost to meet an organization's DRobjectives for all applications and associated data. This leads to a budget-driven DR approach that may meet the DR requirements only for the organization's most vital application data. The rest of the data is either completely unprotected or relegated to a DR approach that falls short of organizational and regulatory requirements. This is also known as the DRGAP (gravely abysmal protection).

The DRGAP is the difference between the level of DRrequired and the level of DR that can be afforded, or the difference between the actual level provided and the level required.

What is cost-effective DR?
Depending on a company's needs, the definition of costeffective DR can vary considerably. For our purposes, cost-effective DR is defined as meeting the DR needs of the organization; the protected data is within compliance with regulations, and the DR process falls within the organization's budget. The ADCF is the key element to determining the proper balance between under- and overspending on DR protection.

Use the ADCF to determine the value of the data to be protected. One method that works well is to assign the data to four categories: mission critical, essential, important and less critical (see "Establishing data's value," this page). For each category, you must set the recovery point objectives (RPOs) and recovery time objectives (RTOs). Next, each application and its associated data must be prioritized and assigned to a category. This isn't as daunting as it might appear. Ongoing surveys suggest that assignment should be based on application availability requirements, regulatory compliance and elimination of business risk.

Setting RPO and RTO benchmarks
Recovery point objective (RPO): RPO is the point in time to which systems and data must be recovered to the DR center. For example, a typical synchronous mirror RPO is to recover the data to the very moment before the interruption at the primary site (e.g., zero data loss). Asynchronous replication RPOs recover the data to within seconds of the interruption at the primary site. Common point-in-time snapshot RPOs recover the data to the last snapshot, which could be anywhere from hours to days.
Recovery time objective (RTO): RTO is the period of time within which systems and applications must be recovered after a disaster or outage, including application recovery time. Think of RTO as the amount of time required to get everything back to the same level of production as before the primary site interruption.

Once you know the RTO and RPO for each application (see "Setting RPO and RTO benchmarks"), you can provide an operational framework for valuing or prioritizing the data (see "Correlating data to RPO and RTO").

The next step is to determine which available DRoptions satisfy each application/data classification's DR requirements and to then match their total cost of ownership (TCO) to the budget. You'll likely find that one size doesn't fit all and a mix of solutions will be required.

DR options
The number of DR options is large and growing. Each technology has advantages and disadvantages, and solves different aspects of the total DR puzzle. Afollow-up article in a future issue of Storage magazine will examine the merits of each DR technology in greater depth.

Matching the appropriate DRtechnology to the DR requirements entails calculating the TCO for each technology option. The TCO includes all capital expenditures (CapEx) as well as operating expenditures (OpEx) for the expected DR lifecycle (see "Simple method for calculating TCO").

When correlating the TCO of selected DR options to each data value, it's likely there will be more data classified as mission critical or essential than the DR budget allows. And in some cases, the application server and its mission-critical data are geographically dispersed and it's not financially practical to use the DRtechnologies that are typically applied. Current DR regulations may be another key factor in this mismatch.

In that case, it may be necessary to relegate some applications and their data to a lower level of DR. Unfortunately, relegating a portion of the data to the important or less-critical data pools may not be a viable option. It may make the organization non-compliant with current regulations, raising the specter of significant financial penalties. Another issue may arise if the organization lacks personnel skills at the required locations to provide adequate DR.

Eliminating the DR GAP
Bridging the DRGAPrequires putting aside the belief that one product will solve all your problems. You need to consider a combination of DR technologies working together in a layered solution. It's the synergistic combination that enables the organization to meet its needs and requirements within its budget.

For example, you may need to use server-to-server replication to a centrally located disk array for remote application servers. From the central location, that data can then be backed up to tape and/or snapped to a DR disk array. This would require fewer backup server licenses and allow better disk array storage utilization.

Correlating data to RPO and RTO

In another example, a continuous snapshot appliance replicates primary data onto a low-cost serial ATA (SATA) RAID or a massive array of idle disks (MAID) array. Then the RAID or MAID array asynchronously replicates the data over long distances to a DR site.

A third example may include an appliance-based distributed backup system that replicates remote sites to a central appliance. At the central facility, the data is then written to tape, MAID, etc. This leverages the DR target platform used for disk-to-disk replication as well.

The possibilities are endless, and each organization will have different needs. The trick is putting together the right combination of technologies that meets the needs at the lowest possible price.

Simple method for calculating TCO
Upfront work: Gather data from human resources and accounting. You need to know:
  1. Average annual IT personnel salary, plus the cost of fringe benefits.
  2. Number of business hours per business year (typically 2,016).
  3. Calculate the hourly IT personnel cost.
  4. The organizational cost of money: This is important for net present value (NPV) calculations. NPV is a simple time value of money calculation that adds credibility to the TCO calculation. It discounts the value of future cash flows. If this information is difficult to obtain, prime rate plus three points will usually suffice.
Next, estimate the capital expenditures (CapEx) and operating expenditures (OpEx).
CapEx: This will be the easiest aspect of determining DR TCO. Hardware purchased or acquired with a capital lease is CapEx. Hardware acquired with an operating lease is OpEx. Software licenses are usually accounted for as OpEx. If hardware is acquired with a capital lease, the up-front CapEx is the NPV of the lease payments. This is calculated with the NPV formula.
NPV = Σ Ct / (1+rt)t
Σ = Sum
C = Cash flow (in the case of a capital lease, the monthly payment)
t = time periods (monthly, quarterly, annually, etc.)
r = Cost of money or risk per time period
(5% annually = .4167% monthly)
Each cash flow must be inserted into the formula and then summed. (This is built into Microsoft Excel spreadsheets and HP 12C calculators.) Next, you need to factor in the projected additional hardware purchases or additional leases using the NPV formula. Finally, the cost of any initial and ongoing structural and infrastructure improvements required must be determined using the NPV formula.
OpEx: Calculating OpEx starts with fixed costs such as monthly maintenance, software license fees, contract work and professional services. All future expenditures should be discounted using the NPV formula. Growth must also be taken into account.

Be careful to correctly analyze personnel time expenditures, such as research, planning, preparation, implementation, management, operations, change management and troubleshooting. Then, multiply those hours against the hourly cost (don't forget fringe benefits) of personnel. If personnel receive overtime or bonus money for evenings, weekends and holidays, that must be taken into account. Once again, the NPV formula must be used for all cash flow expenditures.

TCO: Add up all of the NPVs for all of the expenditures for both CapEx and OpEx to determine the DR option's TCO.
Choosing a DR partner
Choosing DR partners is as important as the technologies that vendors offer. Careful attention must be paid to the following:
  • The DR partners will work together. No one needs finger-pointing when problems arise.
  • Partners should support the final solution.
  • The total DR solution data must always be in a recoverable and usable state regardless of where the failure or disaster occurs. Usable also means as up-todate as possible based on the RPO.
  • Database management systems can link to and recover from the DR data.
  • The total solution must work with all current and planned organization applications, operating systems, storage, storage infrastructure, platforms, etc.
  • With data storage growing exponentially (estimates are between 30% and 100% per year), the DR solution must scale with it. Assuming a 50% growth rate, DR for a terabyte of storage today will need to scale to more than 11TB in just six years.
  • To maximize control and minimize multisite DR skills (to keep TCO low), it's indispensable to have a centrally located cross-system management console. Central consoles ought to provide an at-a-glance view of the state of all current, active DR configurations. The central management facility should allow initiation of any action that's required, regardless of the DRsolution's location. This means no IT personnel are required at the primary application server, allowing for "lights out" DR.
  • Minimizing the need for user involvement (again, to lower TCO) calls for increased automation. Automated recovery from common failures, including server reboots, application crashes and network failures, can significantly reduce or eliminate the need for human intervention.

Meeting the DR requirements of the organization and regulators has become a challenge of sizable proportions. Matching DRrequirements, data and ITskills to the budget too often leads to a large GAP, but this DRGAP can be mitigated--and even eliminated--by building a sound DR foundation. There are six steps to establishing this foundation:

  1. Classify each application and its data into four categories: mission critical, essential, important and less critical.
  2. Determine the required RPO and RTO for each class of data.
  3. Determine the available DR options per class of data.
  4. Establish each option's TCO for the expected life of the implementation.
  5. Objectively evaluate the skills required at every DRlocation.
  6. Match the data, DR options and skills to the budget to determine the breadth of the DRGAP.

Employing multiple DR technologies instead of trying to force one size to fit all will help to shrink or eliminate the DRGAP. This approach takes more time to plan, implement and iron out the kinks, but the benefits are too compelling to ignore.

Article 10 of 18

Dig Deeper on Disaster recovery planning - management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Storage

Access to all of our back issues View All