It's always a good idea to ensure the quality of service your disaster recovery (DR)/business continuity (BC) programs provide by establishing service-level agreements (SLAs). Likewise, if your BC/DR program depends on the provision of specialized services, such as a hot site or collocated data center, SLAs are essential tools to ensure that the services you obtain are acceptable. Service-level agreements apply to internal departments as well as outside vendors and service firms. In addition, good IT practice regularly includes service-level agreements as part of any IT program involving the provision of products and/or services.
Service-level agreements are agreements that specify: 1) a service to be provided; 2) expected performance with regard to what's being delivered; 3) metrics against which performance will be judged; and 4) and remedies in case the agreed-upon deliverables aren't satisfactorily provided.
In this article, we'll examine the elements of an SLA for BC/DR activities and also provide you with a free service-level agreement template to help you get started.
SERVICE LEVEL AGREEMENT TEMPLATE: TABLE OF CONTENTS
>> The components of a service-level agreement
>> What services are appropriate for service-level agreements?
>> Disaster recovery metrics and SLAs
>> Download our free service-level agreement template
As with any kind of legal document, be sure to have your organization's legal department review and approve the service-level agreement before it's signed. Depending on how the SLA is structured, it can protect either your organization or the service provider, or both. Most likely, you'll want an SLA to ensure that a service provider delivers products and services according to a set of agreed-upon expectations. A key part of an SLA is agreeing on financial penalties and other remedies if performance is unacceptable.
A comprehensive service-level agreement usually has most or all of the following components:
- Description of services to be provided
- Scope of the services
- Location(s) where services are to be provided
- Responsibilities and duties of service provider
- Responsibilities and duties of service recipient
- Description of acceptable performance levels
- Metrics to be used for evaluating performance
- Process for monitoring, tracking and evaluating performance
- Process for resolving poor performance
- Remedies for failure to provide acceptable performance, time frames, escalation procedures
- Protection of intellectual property, as applicable
- Compliance with legislation, standards, regulation, acceptable practices
- Termination of agreement
An service-level agreement can cover broad-based or precise requirements for service provision. Most importantly, the parties in the SLA must agree on what is to be provided, the metrics to be satisfied (e.g., time frame needed to provide service, percent successful vs. unsuccessful delivery of service), method of monitoring and reporting service delivery, and remedies for failure to satisfy SLA requirements.
Let's briefly review examples of services that ought to have service-level agreements in place. On an internal basis, a BC/DR program might require the following:
- Satisfaction of agreed-upon recovery time objectives (RTOs) in the event of a disruption, e.g., certain systems are restored within eight hours of the disruption
- Satisfaction of agreed-upon recovery point objectives (RPOs) in the event of a disruption, e.g., data being used can be recovered to within 0.25 hours of the disruption
- Completion of one risk assessment for each business unit per year
- Completion of one tabletop exercise for each BC/DR plan annually
- Review and updating of business impact analysis (BIA) data annually
Further, service-level agreements are particularly desirable for externally provided services, such as:
- Hot sites, particularly how quickly the organization has access to its agreed-upon hot site resources upon declaration, such as eight hours or 24 hours
- Recovery of network connectivity to the Internet following disruption of local access facilities, such as within four hours
- Time required to fail over from primary to backup servers, such as one hour
- Time required to recover and restart downed systems via a cloud-based recovery service, such as one hour
When evaluating performance, benchmarks or metrics for service must be in place. In a previous article I described tier one and tier two metrics. We identified high-level BC/DR metrics, which we called "tier one." By contrast, "tier two" metrics are often more detailed and granular than tier one, and are typically found in technology-focused disaster recovery plans. Many of these are based on BC professional practices as defined by the Business Continuity Institute (BCI) Good Practice Guidelines and DRI International's Generally Accepted Practices. Examples of each are provided in the following tables.
Tier one action areas and metrics
|Tier one action areas
||Examples of metrics
Tier two action areas and metrics
|Tier two action areas
||Examples of metrics
The provision of metrics, agreement of them by all parties, the process for monitoring the provision of service against the metrics, the process for evaluating performance and resolving SLA violations are key parts of the SLA process.
Don't be surprised if most of your vendors have their own service-level agreement template. If they are proactive about SLAs, it's probably a good thing, as it suggests they take the provision of service seriously. If the vendor seems reluctant to accept your desire for an SLA, it's probably a strong clue that their performance may not fulfil your expectations. The best strategy is to have your own SLAs in place, review the vendor's SLA, make your decision as to the way to go, and have your legal staff review everything before signing.
To help you get started with creating your own service-level agreement, we've provided a free service-level agreement template for you to download.
About this author: Paul Kirvan, CISA, FBCVI, CBCP, has more than 20 years experience in business continuity management as a consultant, author and educator. He has been directly involved with dozens of IT/telecom consulting and audit engagements ranging from governance program development, program exercising, execution and maintenance, and RFP preparation and response. Kirvan currently works as an independent business continuity consultant/auditor and is the secretary of the Business Continuity Institute USA chapter. He can be reached at firstname.lastname@example.org.