A snafu with its StorageTek Corp. SAN eventually drove BBC Worldwide, the commercial arm of the British Broadcasting Corporation (BBC), to deploy a remote replication appliance from Kashya Inc. to protect its data from disasters.
BBC Worldwide creates and distributes all the books, videos, DVDs, magazines and television exports associated with the BBC, the U.K.'s treasured national broadcasting service. There are over 20 magazines alone related to BBC programs: In terms of data storage, this amounts to upwards of 25 terabytes.
"We're not a straight-to-market supplier, but our operations still need to run around the clock," said Danny Cooper, IT manager at BBC Worldwide. However, a problem with the company's StorageTek SAN, which caused three significant outages, prompted the broadcaster to rethink its data protection procedures.
BBC Worldwide has two SANs at its headquarters -- each comprising a StorageTek D178 disk array bundled with a Brocade Communcations Fibre Channel switch. Fifteen servers in a cluster running Windows and Exchange use storage on both SANs. BBC Worldwide used Robocopy, a commandline copy utility from Microsoft, to copy data from its headquarters to a third SAN at a disaster recovery site. Two problems arose from this scenario.
The first: Faulty electronics in one of its Brocade switches caused the controllers in the disk arrays to lock up completely. "We'd get spurious resets that would disconnect the users from the SAN," Cooper said. They eventually narrowed this problem down to the switch and had it replaced. A second and more persistent and serious issue related to the way Robocopy copies data. "Any locked files couldn't be replicated, and any databases that were open and in use couldn't be replicated," explained Cooper. "Only 85% of our data was being replicated, and the 15% not being replicated was the most up-to-date, important information."
To fix this problem, BBC Worldwide needed a product that could use its existing 100 Mbps IP link and write all the data in a consistent fashion. "We couldn't afford a faster link so the product had to be able to do asynchronous writes," Cooper said. Meaning it would buffer or cache information, but maintain a consistent write state even when requests came in faster than the link could send them down the wire to be written, he added.
The broadcaster checked out replication products from StorageTek, Veritas Software Inc., RepliWeb Inc. and Cisco Systems Inc. among others, but found that they were either not appropriate or too expensive, or both. He declined to go into details.
Kashya's KBX5000 Data Protection Appliance turned out to be the most viable solution. It requires two appliances, one at each end of the link. Appliance one watches all disk writes at the headquarter's SAN, picks up this data and copies it down to appliance two at the disaster recovery site, where it's then replicated to disk. It relies on continuous small-aperture snapshots (typically seconds apart) that minimize or eliminate the risk of data loss. It's bi-directional and can replicate across any distance, Cooper said.
One criticism he had of the product was the requirement to stop the replication process in order to bring a volume online. "It takes five minutes to stop the process, bring the volume online, copy it and then start the application again … We wouldn't be comfortable about doing that on production data on a regular basis," he said.
It took about three days to deploy the system and to synchronize the data between sites. It was a lengthy process, mainly because of a fault on a Cisco router that was slowing down the replication. Once they fixed this, BBC WorldWide hasn't had a problem since. "It's running as sweet as a nut," Cooper quipped.