By Todd Erickson
The coordination and preparation of a disaster recovery (DR) plan can be a complex operation that spans departments and facilities, and affects every IT system including applications, data storage, and telecommunications. One critical system that you may overlook is your local area network (LAN). Ensuring that your networking infrastructure is up and running so your users have access to critical IT applications and data as quickly as possible is a key component of a comprehensive network disaster recovery plan.
In this LAN DR planning tutorial, you will learn how to recognize the events that should trigger your LAN DR plan, how to prevent and mitigate those networking disasters, and the key elements of a solid LAN DR plan.
There are many news-making disasters that obviously trigger disaster recovery plans, including floods, fires and earthquakes. But there are also a lot of smaller events that might affect your business such as single-room or floor events like flooding, heating, ventilation, and cooling (HVAC) malfunctions, fires, and equipment failure. What do you do if the office above your data center floods and destroys your in-wall electrical wiring? What do you do when your server farm blows a breaker switch and causes a one-room power outage?
Have you thought about what to do if your switches, routers, firewalls, or intrusion prevention systems (IPSs) fail, leaving your users disconnected to each other and the outside world? According to Kevin Tolly, founder of The Tolly Group, an IT testing and consulting services provider, industry-standard equipment failures are not uncommon. "Many lower-end switch hardware and software is commodity manufactured," he said. They may or may not be well made. However, that commodity manufacturing does mean they are cheap, and stocking spare parts and machines is relatively inexpensive.
Paul Kirvan, an Atlantic Highlands, N.J.,-based IT consultant and auditor with over 35 years of hands-on disaster recovery and business continuity experience, also lists networking cable cuts, security breaches and connector failures as common LAN interruption events.
You may not be able to prevent a disaster, but you can prevent outages caused by many localized disasters. Planning and redundancy are the key elements of an outage-prevention strategy. Take a step-by-step look at all the basic systems that feed your networking infrastructure, such as power and HVAC. Also, look at each networking component and identify possible network failure points. Then plan for location-specific outages and install redundant equipment to eliminate as many single points of failure as possible. Installing redundant equipment in separate physical locations, even separated by rooms, can help in a situation where one room or one floor suffers an outage event.
Tolly recommends filtering incoming power to guard against power surges and lightning strikes, as well as redundant power and cooling units. Kirvan noted the importance of securing data center equipment with wall anchors and floor mounts as well as locked enclosures. He also suggests purchasing emergency generators, conducting regular HVAC inspections, and keeping fire detection and suppression systems up to date. If you are concerned about HVAC issues, consider protected wiring enclosures, redundant in-wall wiring, or wireless options.
It's also smart to have production-ready replacement equipment including hubs, switches, servers, routers and connectors. Don't forget software copies, as well. You can incorporate equipment into the network as redundant systems for high-availability networking, install them in physically separate locations for data recovery, or just have them around.
"It can be as simple as buying a couple extra switches and putting them in the closet," said David Davis, director of infrastructure for Schaumburg, Ill.,-based Train Signal Inc., a computer and IT training-services company. With standby equipment readily available, downtime from a blown switch can be limited to a few hours. Make sure you regularly back up production-equipment configurations for quicker equipment recovery.
If you use chassis equipment, recovery can be even easier, Tolly said. Keeping spare management, port, and power blades around could mean recovery is just a swap away. Most chassis manufacturers also allow you to use multiple power supplies so you can plug the switch into two separate power supplies for redundancy.
Dave Bartoletti, a senior analyst with the Hopkinton, Mass.,-based Taneja Group, a storage industry consulting firm, also recommends redundant LAN and Internet connections to guard against service provider outages. He also suggests stocking servers with redundant NICs in case of a card failure. With two NICs you can connect two cables per server to separate routers or switches.
The two main pitfalls of preparing for a disaster are configuration and disaster recovery tests. According to Bartoletti, both tasks tend to be underfunded. "If you want to protect your network in a disaster recovery site, the challenge is how to keep it current," Bartoletti said. Users come and go, equipment configurations change, and new equipment replaces older equipment. In a perfect world, all of those configuration changes would be backed up and available on DR equipment.
Testing will help you determine if your planning and preparation will pay off. Kirvan recommends testing at least annually. He suggests that IT managers "conduct a system-level test in which one or more LAN devices, network access devices, or other elements are disconnected from the network." Remember, redundancy is your friend, and might save your job.
A local area network disaster recovery plan will most likely be a small portion of a larger organization-wide DR plan, but that shouldn't minimize its importance and the amount of time you put into it. "The key to DR planning is to make sure that you start from the top level, the application itself," Bartoletti said, "and look at the operating system it's running on, the hardware platforms it's running on, then the network connections that connect that to other servers, and then the other servers that it communicates with. And at each one of those points, you have to say to yourself: Here are the different types of failures that I want to protect and how much recovery time and data recovery point am I willing to accept." Successful disaster recovery planning is all about weighing the resources you have for recovery and the amount of downtime you're willing to accept.
Bartoletti recommends assessing each piece of equipment with a hot, warm, and cold rating. A hot rating means the only acceptable amount of downtime is nearly zero minutes, Bartoletti said. To keep downtime to an absolute minimum, you'll need an active duplicate environment running concurrently with the production equipment. If the production equipment fails, you can immediately fail over to the backup equipment. Configuration and application data is replicated to the backup equipment constantly and very little data would be lost in the event of a failover.
A warm rating means data is replicated on a set schedule, maybe every few hours or once a day. The equipment is always up and ready for a failover, but you may lose data entered between the last replication and the time of the failover.
A cold rating means you have the required equipment with the necessary software, but it needs to be installed, configured and the data loaded from backup media. A longer period of time will pass before your network is back up and running. The temperature of your LAN disaster recovery plan will depend on access to recovery equipment and data, and your budget.
You should also have detailed information about your equipment and software including contact information for all of your equipment manufacturers, distributors and service providers. Not only should you have their main phone numbers and email addresses, but emergency contact information in case your vendors are affected by the same disaster that's brought down your network. Other important information includes equipment model numbers, software versions, registration codes, and warranties. It will make fixing and replacing networking equipment much easier.
Do an IT active asset inventory; figure out different scenarios in which the asset would fail, determine the best course of action for each type of failure, keep the backup information as current as possible, and then test it. A solid LAN DR plan will help your organization get back online and back to work as fast as possible.