Manage Learn to apply best practices and optimize your operations.

What your DR plan should protect

If you have a disaster recovery plan in place, you're a step ahead of many other companies. But you need to assess your plan to ensure critical data is being protected properly and that you're not wasting resources by providing too much protection for less-important data.

Many DR plans aren't based on the data's value to the company. Here's how to protect your critical data more effectively while reducing costs.

A disaster recovery (DR) plan often provides too little protection for critical data and too much protection for less-important data.

Important data must be protected from loss or damage caused by human or system error, hacker attacks, viruses, hardware failure or site outages. Protection strategies generally involve keeping a separate copy of the data or a journal of changes; this allows users and applications to access the backup or recovery copy if the primary copy is lost or damaged. Ideally, every recovery copy would be up-to-date and instantly available. However, this level of protection is difficult and expensive to realize, and it's not needed for all applications and data types. Thus, a practical DR plan will set different recovery objectives for different types of data.

When framing the DR plan, maintain a clear distinction between your two objectives: preventing data loss and recovering applications for business resumption. These objectives will drive different protection requirements for various types of data.

Preventing data loss
The impact of data loss is generally reflected in the recovery point objective (RPO) for an application or data set. You may find that much of your data is protected by previous backup or archive copies. This is especially true for historical, fixed-content data that's kept online for easy reference and comparison. Previously backed up copies provide full protection against data loss, so mirroring and replication may be expensive overkill. For these applications, an acceptable RPO might be 48 hours or more.

In contrast, the most recent OLTP activities generally haven't been captured by a periodic backup process, so they'll require stronger protections. For critical transaction-processing applications, an RPO of an hour might be too long. However, since very low RPO values are expensive to achieve, it's important to ensure that the objectives are applied appropriately to specific data sets.

In addition to preventing data loss, businesses must ensure that critical applications and data sets are restored quickly after an outage. The time allowed for recovery is specified in the recovery time objective (RTO) for each application or data set. Some transaction-processing applications may need to be recovered within a few minutes to avoid losing business or damaging customer satisfaction. In contrast, some project-oriented applications and data sets, such as software development, test systems or marketing analysis databases, might tolerate recovery several days after a disaster without causing major damage to the business.

Records-retention policy update
It's important to regularly update your company's records-retention policies to identify any missing records that need to be protected in case of a disaster. A business record is defined as any information that's recorded or stored by graphic, electronic, mechanical or other means and contains evidence of business-related activities, events and transactions with ongoing business, legal, compliance, operational or historical value. This is based on the content, not the format or recording media.

Your company's general counsel can become a strong source of support for a policy update project, and may provide budget dollars and resources to get it done. The records-retention schedule generally lists required records by business function or department, and the update process should include structured interviews with representatives from these groups. The interview process should update the list of paper records created and retained, and identify the types of electronic records used or retained by each group.

You should also take the opportunity to identify documents and data sets that need special privacy protection or security measures during emergencies. The disaster recovery plan will need to support these special requirements.

Assessing the situation
Your approach to DR planning will depend on whether or not you have a DR plan and when it was last updated. If you have a DR plan, you need to determine whether it is:

  • Covering only a small part of the critical data, and addressing only a few of the likely threats and risks.
  • Current but very expensive, placing all data on the highest cost storage tier and replicating it locally and remotely for rapid recovery in almost any failure scenario.
  • Current, cost-effective and based on a clear set of data classifications, as well as appropriate levels of protection and replication for each class.
If your company doesn't have a DR plan, or it covers only a fraction of the critical data and addresses a limited number of likely threats and risks, your first steps are to determine what data needs to be protected and to define the business requirements for protecting and recovering that data. Once this is done, you can then decide what technical and procedural steps you can take to provide the needed protection and recovery capabilities. These include the following:

Establish record types and business requirements. To ensure you understand what data is most critical for business recovery, start with a review of the legal, regulatory and operational requirements for records retention, protection and availability. A review of your company's official records-retention schedule will ensure that you address the full range of records and data the company must create and retain. If the retention schedule is incomplete or out of date, you may need to work with your legal and compliance teams to update it so that it covers important electronic records in addition to paper documents (see "Records-retention policy update").

Classify applications and data. For electronic records, divide your data into a few broad categories such as structured, unstructured and semistructured data. Identify applications that support key business processes or have clear legal and regulatory requirements for retention and protection. For example, production OLTP applications like order entry often need high levels of protection and short timeframes for RPO and RTO. But don't neglect records that can tolerate longer RPO and RTO service levels. They must still be appropriately protected. For example, some scanned document files may serve as backup copies of vital paper records--and the organization will need those copies if a disaster causes the destruction of the paper originals.

Evaluate data importance. Importance is only partly measured by the immediate impact of data loss (or delayed recovery) on the organization's operations, costs and reputation. You should also consider the potential long-term impact of data loss, recognizing that data importance doesn't always decline over time. According to the traditional scenario, file accesses decline over time. Accesses to a document or a customer transaction file may be frequent for the first 30 days, then less frequent for a few months and very rare for anything older than 12 months. But some documents and files, such as contracts and agreements, remain quite valuable for longer periods, and may play a critical role in resolving disputes or preparing for legal action even years after they were first recorded and stored. Even though the RPO and RTO may seem undemanding for historical reference data, you must provide adequate protection and recovery throughout the specified retention periods.

Assess the risk. Now that you know which data sets you need to protect and recover, assess the risks to that data. Consider various failure scenarios, including local server or storage hardware failures, site-wide outages due to fire or flood, and regional disasters such as earthquakes or power grid failures. Assess the potential impact of each scenario on the availability, integrity and confidentiality of each data class. For example, what if the order-entry system shut down for 48 hours? What if the supporting documents for critical financial transactions were lost, making the company unable to pass an audit or satisfy a regulatory examination?

Application recovery objectives

Assign protection and recovery objectives. Beginning with the most critical applications and the most important data sets, define realistic protection levels and recovery objectives. Protection objectives can specify levels of hardware redundancy and failover, and the number of duplicate copies, media types and storage locations. Recovery objectives should include RPO and RTO metrics for all data classes. If you've done a good job of classifying applications and data sets that have common business needs, you should be able to assign consistent RPO and RTO metrics for all the data in each data class (see "Application recovery objectives").

Your protection requirements and recovery metrics, along with application performance requirements, will correlate with storage infrastructure configurations and costs. For example, reference data files might be adequately protected with RAID 5 protection on local disk storage and daily tape-based backups to support an RPO of 24 hours. In contrast, a critical OLTP database with an RPO of one hour (for a site-wide outage) might require local RAID 1 protection, continuous replication to a remote site and frequent snapshot copies to ensure consistent application recovery. The disk storage infrastructure for this OLTP application could cost five times to 10 times as much as the reference-data archive storage. By properly classifying your data, and assigning recovery objectives based on business needs, you should be able to meet the protection and recovery needs of the most demanding applications while reducing infrastructure costs for the less-demanding data classes.

Continuous improvement. After completing the initial assessment and DR plan, regularly test the DR plan to discover any deficiencies that will require modifications to the plan. As resources permit, or as business needs and risk assessments dictate, plan to execute a more complete data classification and archiving program and a more sophisticated set of recovery and service-level objectives.

DR plan is too expensive
Some deep-pocketed enterprises have taken an approach, at least for centralized application data, that is simple but very expensive: All data is placed on the highest cost storage tier and replicated locally and remotely for rapid recovery in almost any failure scenario. For organizations with well-endowed storage departments in tightly regulated industries, the obvious next step is more of the same. These companies can continue to treat all data the same and spend more money to get more protection.

However, if capital resources are limited, there's an alternative approach. Recognize that all data isn't the same and manage different data classes in terms of their actual business requirements. In such an environment, 10% to 20% of data is typically underprotected, while 40% or more is overprotected.

A good data classification program can identify critical data that deserves a higher level of protection. It can also pinpoint data sets that are protected and replicated too much, enabling you to redeploy existing resources and meet data growth requirements with less-expensive infrastructure. If this is done well, most enterprises can improve protection, performance and recovery for the most critical data while reducing overall storage costs.

DR plan is current and cost-effective
If your DR plan is based on a clear set of data classifications, service-level objectives, and appropriate levels of protection and replication for each class, you're among the fortunate few. Good job! However, there may still be room for improvement.

From a data management and storage perspective, you may be able to identify additional opportunities to improve service levels and reduce costs while continuing to meet the required RPO and RTO metrics. Companies often consider advanced data archiving solutions at this stage. By moving older data out of the current production data set, an organization can ensure proper preservation and protection of the archived data while improving application response times, reducing backup and recovery times, and lowering overall risks and costs.

For example, a production OLTP database may contain 1TB of accounting records and related data, spanning several years of history. If a database archiving product can move 500GB of inactive data (from closed periods, for example) to a static history file, the application can process queries and updates on the active data much more quickly. The database will recover faster from outages or disasters, and the enterprise can reduce its storage costs by placing the history file on less-expensive storage. Once the archive is backed up, it won't need additional backups until it changes--perhaps every 90 days when records from additional historical periods are added to the archive. Archiving applications for e-mail and other data types offer similar benefits in terms of reduced cost, improved performance, better protection and faster recovery time.

As a storage professional, you know about data--where it's stored, how it's protected and how it can be recovered. You can also take a broader view as part of a team that understands the business processes, records and applications, and how they can best be protected and recovered. If you want to be a leader and ensure successful DR planning in your environment, embrace the broader view and build a consensus on data classification processes and recovery metrics within the larger business team.

To decide what data you need to protect, begin by developing a good understanding of business requirements. Collaborate with colleagues who have complementary business, legal and technical expertise. Begin today and set realistic goals for a successful DR program or improvement project. Then keep the DR plan current as part of your ongoing process for building effective storage strategies.

Dig Deeper on Disaster recovery planning - management