Niemann Capital Management, an investment management firm, needed a new data protection and disaster recovery plan that would eliminate backup windows while also meeting its stringent recovery time objective (RTO) of two hours and recovery point objective (RPO) of 30 minutes. The firm switched from a disk-to-disk-to-tape setup to replicating between data centers, and achieved its objectives with time to spare.
NCM switched from using tape-based backup running EMC NetWorker software with a Dell direct attached storage system to a Hitachi Data Systems AMS2300 SAN with InMage Scout software. Scout eliminates the need for traditional backup because of its asynchronous replication and continuous data protection (CDP) capabilities.
IT director John Etheridge said it’s important for NCM to avoid downtime because the firm manages its customers’ money, and needs to be online to trade. NCM’s SQL servers operate for 24 hours each day.
Etheridge said recovering from tape could take two days to get back online and a total of 12 days to return to full capacity. Backups also took too long.
“We had continuous problems because the backups impacted production servers,” he said. “Before we only backed up the absolute minimal, only the data we could not afford to lose. We were looking for a different way of doing things.”
Etheridge said a recent DR test produced a 26-minute recovery time, including validation that everything was working and the firm could perform all required functions. He said his RPO is often less than two minutes because of Scout’s CDP capabilities.
“We wanted an RPO to be within 30 minutes,” Etheridge said. “We actually are running within two minutes. And to bring up our primary stack for disaster recovery, we run at about 20 to 23 minutes to switch to the (secondary site).”
NCM is based in Scotts Valley, Calif., and has five offices throughout the U.S. Most of its 170 virtual and physical servers are Windows-based and located in an Equinix facility in northern California. That serves as the primary data center with a secondary site in Carson City, Nev. Data continuously is replicated between sites. From the Nevada site, the data gets backed up to tape for compliance.
“Tape backup now is done on the secondary site so it doesn’t impact the production servers. I can’t tell you how many man-hours that has saved us. We use tape but we never access tape [for restores],” Etheridge said.
He said he considered a fault tolerant system that would require full mirroring and allow for no downtime, but concluded it was too expensive to justify the ROI. He liked InMage’s point-in-time recovery capability. Scout uses application-specific APIs to set application-consistent points called AppShots. Etheridge said Scout’s compression and bandwidth throttling enables NCM to get by with 50Mbps of bandwidth.
Etheridge said wide area network (WAN) optimization was a requirement to reach his RPO goal. He evaluated WAN optimization products from Cisco and Riverbed, but found Scout’s compression and bandwidth throttling provided enough optimization. He originally expected to need a 250 Mbps bandwidth connection, but gets by with far less.
InMage collects data changes from production servers in real time, and places them in memory before they are written to disk. The changes are sent to a software appliance called the InMage Scout Server, and then transferred to the secondary site. The server offloads compute intensive tasks from the production systems, such as compression, encryption, WAN acceleration and consolidated bandwidth management.
“We bought a 50 megabit burstable to 100 megabit internet connection between California and Nevada,” he said. “We rarely go over 50 megabits per second using InMage. That is outstanding for us. We throttle it on the secondary site to control the speed.”