The Thrasher Group Inc. engineering firm's use of hyper-converged infrastructure evolved from backup to primary...
storage to -- with the help of Ethernet switching -- a disaster recovery system.
Thrasher Group has four Scale Computing HC3 clusters, with a total of 23 nodes with 92.5 TB of capacity. Two clusters reside in the data center in Bridgeport, W.Va. The other Scale HC3 clusters are connected by dark fiber and Brocade Communications Systems Inc.'s Ethernet switching to a disaster recovery (DR) site. The firm uses Double-Take software to fail over and fail back between sites.
The Thrasher Group does engineering for water, sewer, roadways, bridges and airports, in addition to master planning projects.
With two IT people supporting 280 employees, the organization required a storage system that was easy to manage and flexible enough to handle large computer-aided design (CAD) drawings, as well as mainstream business applications. CAD files quickly filled up a small storage system.
IT manager Brad Fortney said the group's data was growing from 20% to 25% a year when he arrived in 2010. He quickly realized he didn't have enough capacity with the direct-attached storage and file servers the company had in place.
"Knowing our plans for the future, I said 'We're going to need a lot more space, and it's too expensive and impractical to do it this way,'" Fortney said.
He looked at Scale's HC storage appliances, a precursor to the vendor's hyper-converged systems.
"We didn't really need more horsepower, but we needed some capacity," Fortney said. "Scale was very new, and there's always a little apprehension about using something unproven in your production environment."
He bought a three-node Scale cluster for backup storage to try it out. The system impressed him enough that he went back for another cluster to use as an iSCSI SAN for primary storage. Scale asked Fortney if he wanted to beta test its new software -- code-named Torino -- with a built-in KVM hypervisor.
"That was the beginning stages of what would be their hyper-converged platform," Fortney said. "I throttled it up and played with it during the day. I was impressed. Then they made it available on the next upgrade for free."
Fortney upgraded and did rolling migrations across his primary and secondary clusters, moving VMware virtual machines off to remote sites. He never told his users, and none of them noticed any performance changes.
Scale HC3 clusters help with DR prep
Fortney added Brocade VDX Ethernet switches to his Scale HC3 appliances when Thrasher opened a new data center in 2013. Thrasher Group has since added Brocade's Ruckus ICX campus switches to connect the Scale HC3 clusters, and Brocade Network Advisor management software for a view across sites.
Fortney said he hasn't had any full-blown disasters yet, but he feels adequately prepared if one strikes.
"We haven't had to fail over the entire data center," he said. "We had a couple of servers fail, and we've rolled over, but never had to roll the entire site over yet. But I never know if today will be the day that trailer truck with hazardous materials crashes on the interstate that I can see from here. I find a lot of comfort knowing I can drive over to the DR site, get everything back up, call my ISP [internet service provider] and start taking calls again."
Fortney explored using the cloud for DR, but decided against that for performance reasons.
Brad FortneyIT Manager, Thrasher Group
"We have deadlines to meet, and we want to meet them even if there's an outage," he said. "If we were in the cloud, we wouldn't be able to do that. We would rather invest and have everything on premises than go to a cloud subscription service."
Thrasher Group uses all spinning disk in its Scale HC3 clusters, but Fortney said he expects to add flash this year or next.
"We'll probably get to the IOPS level where I need flash in the next year or two," he said. "I'll probably need at least three or four flash nodes in my main cluster for my file server and Exchange."
Thrasher Group's IT team consists of Fortney and a help desk person -- down from a whopping four-person IT team a few years ago. So he needed gear that didn't take too much time to manage.
"I do everything, and occasionally I'll run the sweeper in the office," he said of his job function. "Most of my network and storage job is checking to see if everything's OK. I also handle the workstations. But probably 70% of my time is spent talking to management, with 30% on the network, hardware and servers. I spend a fair amount of time with managers and equity partners to find out what's coming, so I can stay in front of it."
Microsoft Windows Server adds cluster-to-cluster replication
The evolution of DR monitoring software
Identify key criteria for network recovery after a disaster