Figuring out what kind of disaster recovery (DR) site your organization needs requires careful planning, and you will have to balance costs against any risks.
One of the toughest disaster recovery (DR) issues to resolve -- and potentially one of the most expensive decisions in a DR plan -- is determining the type of recovery site your organization will use. The options (cold sites, warm sites and hot sites) can keep you awake at night as you try to figure out which one is best for your organization. The reality is that any of these options can help your organization recover from a disaster, and with some careful planning they can also help you protect critical data.
Recovery site options can vary significantly in cost; we’ll provide some guidance to help you determine the most appropriate recovery site setup, and also give you an idea of what you should expect to pay for private facilities and third-party services.
According to Ted Brown, CBCP, MBCI, CBCV, and president and CEO at KETCHConsulting, a business continuity (BC)/DR consultancy in Waverly, Pa., a good way to think of these various options is as alternate sites. “The development of alternate sites came from the need to protect data centers,” Brown said, and within each data center are hardware and software, physical facilities, a variety of systems and services, and operational/critical data needed to run the business.
There are two fundamental alternate site arrangements: internal and external. “If you have the need and the funding, you can design and build an internal recovery site, typically a second data center, to provide the resources you need to recover and resume data center operations following a disaster at the primary data center,” Brown said. Companies with very large information requirements and aggressive recovery time objectives (RTOs) are more likely to have internal recovery arrangements.
By contrast, organizations with more restricted budgets have a huge variety of options for protecting their data centers and critical information. These options are typically provided externally, and are the familiar cold/warm/hot sites.
With regard to externally provided solutions, the definitions of cold, warm and hot sites are many and varied. While there are currently no official standards for alternate sites, international standards like ISO/IEC 24762:2008, Guidelines for Information and Communications Technology Disaster Recovery Services, address IT DR services provided by vendors and provides a good set of criteria for evaluating these options.
Brown, who has been involved in alternate site planning and implementation for more than 30 years, likes to use the following definitions: “A hot site is a fully operational data center that also has live customer data. A cold site is a type of data center that has no technology installed. It will have power, HVAC and communications in place. A warm site is an equipped data center, but without customer data.”
Cold, warm and hot sites defined
Cold site. Space and associated infrastructure (e.g., power, telecoms and environmental controls to support IT systems), which will only be installed when disaster recovery (DR) services are activated.
Warm site. Site is partially equipped with some of the equipment (e.g., computing hardware and software, and supporting personnel); organizations install additional equipment, computing hardware and software, and supporting personnel when DR services are activated.
Hot site. Fully equipped site with the required equipment, computing hardware/software and supporting personnel; it's also fully functional and manned on a 24x7 basis so that it's ready for organizations to operate their IT systems when DR services are activated.
Source: ISO/IEC 24762:2008
In each case, the equipment in place at alternate site facilities is shared by multiple users. If there are multiple disaster declarations, Brown added, the response is usually first come, first served. Some companies will pay extra to have dedicated equipment available only to them.
The key criteria most likely to influence the selection of a particular alternate site arrangement include internal vs. external resources, RTO and cost vs. risk. For example, it would be very good to mirror your data in real-time to an offsite facility, but the cost to do that may be prohibitive. Brown estimates that data mirroring costs can be up to 10 times the cost of a hot site.
In this option, costs are incurred for the service used, the data mirroring technology and the network bandwidth (usually fairly high) required to transmit large volumes of data in real-time. Can your organization risk data loss by using a data protection solution that doesn’t provide real-time mirroring? Because alternate sites are usually shared facilities, Brown noted, they represent a shared risk -- unless you decide to pay additional fees for dedicated access to recovery resources.
Another important consideration is make vs. buy. Factors that can influence a make vs. buy decision include RTO, cost and risk. According to Brown, an internal system done right is a far better solution, but it’s also the most costly option. It’s always better to do an internal solution, but can you afford the cost vs. the risk?
A major issue today is work-area recovery, which focuses more on getting people back to work than just getting systems up and running. It’s the biggest growth area in the alternate site business, according to Brown, who estimates there are approximately 1,000 vendors offering work-area recovery. People are a major planning consideration in traditional alternate sites and the primary concern with work-area recovery.
Where will your people work if their primary offices are unavailable? Unless your employees can safely use telecommuting and similar remote-access arrangements, they must be willing to relocate, even temporarily, to another site. According to Brown, a major issue that nobody truly thinks about is what happens if people bring their children to a work-area recovery center. Parents may not be able to (or want to) leave their children with someone while they work at a distant recovery location for what might be an extended time.
Major aspects of work-area recovery deal with human resources issues. “Should the possibility of working in another area for an extended period of time be added to job descriptions?” he asked. Another issue is whether employees should be required to participate in tests.
According to Brown, most major organizations in the public and private sectors have good IT recovery plans. “What they don’t have are good work-area recovery plans for employees, contractors and other staff,” he said. There’s also the issue of senior management’s perception of the value of alternate sites.
Additional points for consideration
Alternate sites should be located far enough away from primary offices so that they’re less likely to be affected by the same disaster/ failure events that have put the main facilities out of service. The issues of site proximity, operational risks and service-level agreements (SLAs) should be considered when contracting with disaster recovery service providers.
Alternatives to traditional alternate sites discussed in this article include colocation facilities, in which your organization can locate disaster recovery equipment in the same building as major service providers, such as telecommunications carriers; and cloud-based recovery services, in which alternate site facilities are located within the “cloud.”