Challenges of data deduping across public wide area networksDate: Jun 27, 2013
As the need to back up more and more data in a secure offsite location increases, some organizations may look to conducting public wide-area-network-based replication to accomplish that task.
But according to Jon Toigo, founder of Toigo Partners International LLC, you have to know your payload and your distance if you intend to rely on data deduping across public WANs for data protection.
"In fact, when I hear people say, 'We're getting rid of tape. We're going to disk-based replication across a wide area network on everything.' I [say], 'Wow, that's cool. Do you have a new job lined up for when the environment fails?'" Toigo said. He later noted that one client of his goes through nine carriers in the public cloud for WAN-based replication.
"There is no predictability about the speed in which the data is delivered to its target. It varies so widely that they can't even blackbox it," he said.
Toigo likened the process to airlines that route flights through airports, rather than straight lines, because the distances between the airports can be shorter than a direct flight.
"You're going across a public wide area network [WAN]. You're moving over distance," he said. "Let's say … you're going to go outside an MPLS [multiprotocol label switching] network, which is within striking distance of the two facilities, which also can mean your primary facility and your backup facility can be eaten up by the same Godzilla attack. You want to go beyond that for a real WAN-based replication scheme …. This is what is going on inside that public switch network: There are a bunch of routers -- gateways, switches -- and the problem is that routing protocols were not designed with wide area replication in mind. 'Open shortest path first.' What does it mean? Open the shortest number of hops between routers in order to get from point A to point B … and you're going to incur distance-induced latency.… Open shortest path first prefers the shortest number of router hops, not the shortest distance [in a straight line]."
Toigo said that deltas can compromise the ability to recover a data set at a remote site. He noted if that there is a delay of several minutes in the primary storage and offsite backup, that "can be fairly profound and can make that data set unable to restore your applications to their desired recovery point."
Measures like data deduplication, he said, and other methods to reduce data being sent over a WAN don't necessarily improve speed.
"If there's congestion on the highway, everyone's moving at a dog's pace. The payload size doesn't matter. The total volume of data that needs to be transmitted does. It represents a statistical norming factor. But the size of the packet itself? In fact, a whole bunch of little packets move less efficiently because you have packet resend requirements."
Meanwhile, data backup needs are increasing. Toigo cited an IDC study that suggests file-type data is the fastest growing kind of data being backed up.
"And we're using almost half of our capacity to make copies of the other half because we've decided we don't like tape," he said.