1. Identical SRM servers must be installed at the protected site and recovery site. Each SRM site must also have its own vCenter Server.
2. The protection site SRM server must be configured with VM protection groups. These groups are a collection of virtual machines (VMs) that fail over concurrently. The protection groups are added to the recovery site and usually are recovered concurrently on the recovery SRM server.
3. Make sure the SRM-based external storage system where the replication is controlled, managed and scheduled is validated and certified by VMware. Storage systems that support vSphere SRM can protect up to 1000 VMs per cluster. VSphere replication is limited to protecting 500 VMs. Note that vSphere clusters that utilize vSphere replication, leveraging VSAN, are also based on lower-cost, off-the-shelf internal flash and hard disk drives (HDDs). Those components must also be certified and validated on VMware's hardware-compatibility matrix. Don't assume that just because a vSphere cluster is running VSAN and SRM that the host embedded storage components are on the matrix. Troubleshooting SRM issues and problems (especially performance) can be a very real exercise in frustration and lost time when unsupported components are used.
4. Be sure that recovery point objectives (RPO) and recovery time objectives (RTO) can be met by the replication methodology. Each external storage system has its own RPO and RTO capabilities and limitations. Even vSphere replication has limitations, such as a RPO best-case scenario of 15 min. If a lower RPO is required, then an external storage system that can schedule the protected groups' replication on a more frequent basis must be utilized.
5. Test frequently. Set up a representative test plan. Test at least once a quarter. Before each test, make sure the SRM environment has not changed. If it has changed, update the test to reflect the real environment. Evaluate the results of each test and make fixes to problems, issues and holes in the plan. Update the test plan.
Resolving five common VMware SRM error messages
How RPO, RTO influence VMware SRM business decisions
The VMware Site Recovery Manager-disaster recovery relationship
Dig Deeper on Disaster recovery facilities - operations
Related Q&A from Marc Staimer
Network File System and Common Internet File System/Server Message Block were designed to work with any operating system, but NFS remains dominant in... Continue Reading
Object storage has unique features, including erasure coding and multi-copy mirroring, which may make it better suited to data protection than more ... Continue Reading
Why would you attach NAND flash storage directly to the memory channel? Isn't RAM much faster than NAND? Marc Staimer discusses this and more in this... Continue Reading