What you will learn in this tip: The rise of cloud disaster recovery has provided another disaster recovery planning...
option. Here's how one organization turned to a private cloud provisioned for disaster recovery services to meet their recovery needs.
It's important when deciding whether any disaster recovery service is viable to look at your organization's requirements, understand how critical production systems are architected, and define your recovery point objectives (RPOs) and recovery time objectives (RTOs).
The private cloud and disaster recovery
- No desire to "own" or invest any capital into a disaster recovery solution but to rent all technologies (licensing, storage, processing, services) on a monthly basis.
- Getting data backups off tape and replicated (and hopefully deduplicated) to an off-site location (private cloud).
- Setting up an always-on resilient network, directory (e.g., Active Directory [AD]), authentication (e.g., single or multi-factor), and firewalls to ensure immediate redirection of users for a primary directory or network outage or full disaster.
- Common technology platforms that leverage virtualization (x86 virtualization) and/or partitioning (e.g., AS/400 LPARs) that can be "burstable" as demanded for testing and/or disaster recovery.
- Wish to outsource the management of your disaster recovery program and shift the risk of performance to a capable service provider who will own the service-level agreement (SLA) for data replication, testing, documentation and staff augmentation to perform the restoration and recovery when requested.
- Desire to leverage the redundant, replicated set of critical data any time a need presents it with a monthly pay-as-you-use model.
- Wish to maximize the claims that can be made toward insurance policies if a disaster occurs.
- Are leveraging a service provider that is more than a simple collocation provider, has a private cloud, managed hosting or managed services provider (MSP) model.
A private cloud can be configured and controlled to meet your exact requirements. A public cloud, however, has limited flexibility. Overall, it's rare that you will be able to find a cookie-cutter solution for your disaster recovery needs. No two companies are alike in their critical technology, how it's architected, how critical data flows, and the business requirements of each. However, a private-cloud provisioning approach may be viable.
Managed cloud disaster recovery services with a private cloud
The best way to talk about how a "managed cloud recovery program" in a private cloud can work is by describing what has been designed and implemented for actual clients. The following describes a client (who preferred not to be named) that embraced a managed recovery services model that leverages a private cloud and a "burstable" disaster recovery mode.
Insurance provider "client A's" situation: This company was formerly on tape-based backups, and was never able to test recovery due to resource and staff constraints. It was estimated that it could be four to six weeks to recover from a catastrophic disaster, and at worst case would have lost 48 hours of data. A board inquiry about DR capabilities made it a priority to improve the situation. The company had to "find the budget" to immediately implement and test the new solution within three months, and report at next board meeting.
The solution: A 'burstable' private cloud managed recovery solution
The private cloud disaster recovery solution achieved the following:
- It achieved a testable 12- to 24-hour RPO and a testable four- to eight-hour RTO.
- Everything is a monthly OPEX rental -- software, hardware, services -- no CAPEX was spent.
- Enterprise backup software licenses with data deduplication and replication functions were set up and implemented to the target public cloud site.
- Because of deduplication and initial seeding of the data, current available Internet bandwidth and secure VPN tunnel could be used for the deduplicated, replication traffic. A second, redundant, diverse network connection was recommended to client should a network outage occur.
- Backups are performed every night, deduplicated, and replicated to the target private cloud site (frequency of replicated backups could be done as close as five minutes apart with deduplication if desired or required for specific data or files).
- Backups monitored by managed recovery solution provider.
- Select user desktops/laptops were backed up capturing one of each desktop/laptop "golden image" used within the organization (claims, underwriting, sales, etc.) and any potential local data for certain key users (executives/management teams/other key users) that are suspected of not saving to the network drive.
- Active Directory, authentication and firewalls were provisioned to be "always on" within the monthly rental of a VM and VPN, and firewall and bandwidth. This would provide AD, firewall and network resiliency every day and would allow for immediate failover of user connectivity authentication and network infrastructure.
- "Burstable" virtualized server farm and managed recovery team for staff augmentation at time of testing or disaster to perform the recovery (and these costs are charged when utilized). This aligned the costs of disaster recovery with what is actually being used and when.
- Burstable virtualized desktop infrastructure provisioned at time of test or disaster.
- Users can log in from any browser to a Web address and using their current login/password; they can access the recovered environment (for user acceptance testing or during a disaster) and test 10 to 20 concurrent users during exercises.
- The virtual desktops can be cloned and scaled out to entire workforce at time of disaster to allow them to work from home or go to a hotel conference room, etc. Just hand out browser devices and your employees are back working with their applications and corporate data as long as they have internet connectivity.
- "Request for restoration" procedures and technical procedures for recovery were written by the managed recovery provider. Request for restoration could be for any reason, not just a disaster, and could be to bring up environment or partial environment for Dev/Test or to provide uptime during a hardware or application migration or upgrade.
- Validation test of restoration performed by the managed recovery services provider to a separate isolated, burstable virtualized server and storage environment.
- Environment turned over to Client A's IT and end users for 72 hours to conduct functional and user acceptance testing.
- Once restoration approved by Client A, the remaining time left their IT organization can use the restored disaster recovery environment as a test bed for trying new patches, fixes, code.
- Post-test report provided on findings, timeline of restoration, and recommendations that could continue to improve recovery point and recovery time achievable (RTA/RPA).
- Letter of test performance provided to Client A (without any confidential technical information) that can be shared with their stakeholders, board, clients or auditors.
- At time of a real disaster situation, production should be running within the private cloud/managed service provider's environment.
- Client A can stay as long as needed provided "burstable" monthly fees are paid.
- Client A is provided production-level SLAs on availability (power, internet, service response time, system availability).
- Client A's backups will continue to be performed by service provider until they return home.
Client A saved hundreds of thousands in CAPEX and saved tens of thousands in OPEX. Within 60 days, Client A had a tested, proven and much improved disaster recovery capabilities (better RPA and faster RTA) than they ever had thought they could achieve. This was accomplished by using a private cloud provider offering a managed recovery service, allowing for burstable cloud resources (virtual servers and virtual desktops at time of disaster), and rental of all software, hardware, and staff augmentation services. Their cloud DR solution has not only proven viable -- it also better matched actual restoration costs with usage and leveraging property and casualty insurance riders to their fullest.
About this author:
Biff Myre has worked with IT-focused organizations, providing solutions to address data sharing, management and protection, disaster recovery, high availability replication and clustering, managed IT services, cloud computing and business continuity across diverse platforms, industries and clients globally. Currently, Myre is a director with Worknet Inc., and works with organizations to help power their business models with managed IT services, cloud computing and virtualization solutions.