At the highest level, an enterprise risk management program would consider elements such as market demand, competition and the state of the economy to be business risks. Operational risks are also considered and business resilience, or the ability to resume business in the event of a disaster, is normally included. This is where
Every good DRP should always be based on a recovery strategy that was defined prior to developing the plan itself (hence the term planning). The ideal recovery strategy is not pulled out of a hat, but rather is based on the understanding of the threats to which our IT environment is exposed, its vulnerabilities, the probability of occurrence and the impact to the organization. This essentially summarizes the IT risk assessment process.
Without digging too deep in the specifics of qualifying or quantifying risk, let's examine some of those risks. It should be noted that the following list is by no means exhaustive or complete but is merely a starting point. Risk can vary widely based on geography, climate, level of preparedness, corporate culture and more.
Backup storage and the risks involved
|Single copy backups||Exposure to data loss in the event lost or damaged tapes|
|Daily backups but weekly offsite||Exposure to a much as one week of data loss if the main facility housing the production data is destroyed|
|The offsite vault is the trunk of your car||Hopefully, this exposure requires little explanation|
|Backups exceeding available window||Can impose backup schedules that leave the data exposed. For example, full backups are only run on weekends because they take more than 24 hours and are only sent offsite on Monday.|
|Unencrypted data on offsite-bound media||Can cause a security issue in some cases (industry specific)|
|Poor or inexistent change management||Poorly planned changes (configuration changes, software upgrades, etc.) are at the root of many failed backups and creating an exposure to data loss.|
Disk storage and the risks involved
|Replication or synchronization utility errors||If the production copy of a database becomes corrupted or unusable, is it possible to overwrite the replicated copy with the bad copy by mistake in your environment? Is there a mechanism in place to prevent that from happening?|
|Hardware failure (or SPOF)||Often seen as stating the obvious but single points of failure must be identified from the host all the way to the allocated storage.|
|Insufficient storage masking, mapping, etc.||Many storage experts agree that storage area network (SAN) storage access should be controlled at the HBA, Fibre Channel (FC) switch and disk array level to avoid device contention between hosts|
|Poorly documented custom configuration||Exposure to knowledgeable staff being unavailable following a major outage or disaster|
|Lacking segregation of duty||Too many IT personnel with unrestricted access to storage configuration interfaces or utilities can lead to inadvertent changes or poorly communicated actions|
|Poor or inexistent change management||Change management is probably one of the most common vulnerabilities but is all too often overlooked because IT personnel typically don't see themselves as a threat agent. However, poorly planned changes are frequently identified as the cause for storage failure of data loss.|
Obviously, IT environments are subject to many more internal or external threats that can indirectly affect storage and an attempt at listing them all would exceed the scope of this tip. Some examples include power conditioning, environmental controls, physical security and data integrity. There are a number of publications available on storage best practices and this site offers a lot valuable advice on the subject. Hopefully, this tip will have helped get the thought process started.
About the author: Pierre Dorion is a certified business continuity professional for Mainland Information Systems Inc.
Do you know…
This was first published in June 2008