To effectively plan a Microsoft SharePoint data protection plan or strategy, you have to consider content recovery
and disaster recovery (DR). Widespread adoption of SharePoint within an organization will likely mean that SharePoint servers will eventually replace existing file servers. In other words, users will access documents via SharePoint Web portals (and, ultimately, SQL databases) rather than as files in a shared directory. This potentially enormous implication on traditional nightly backup hasn't been adequately considered. Currently, there are masses of files managed as individual entities by a backup application that would now be stored in monolithic (from a backup app perspective) SQL databases under SharePoint.
A more pressing issue is understanding how SharePoint content is recovered. While this version represents a significant step forward from earlier versions, there is still room for improvement. Take, for example, the wildly successful -- Microsoft Exchange. Most backup administrators have vivid memories of the difficulties restoring content from early versions of Exchange. Because of API limitations back then, most backup applications only supported recovery of an entire Exchange Information Store -- a time-consuming procedure of restoring to an alternate server, searching for the specific messages or mailboxes to be recovered, and then migrating the data into the production Information Store.
Prior to the 2007 version, a similarly arduous process was also required for SharePoint. However, Microsoft added an important timesaver -- two recycle bins: a user and a site-level (or administrative) recycle bin. Recycle bin functionality can be disabled, and both size quotas and object expirations can be applied. In addition, SharePoint inherently supports document versioning, so it's possible to revert to an earlier version without necessarily having to do a recovery. While the recycle bin will help eliminate many nuisance-level file recovery issues, if a document must be recovered, the only options are the traditional Exchange-like recovery process or via a third-party product.
Disaster recovery for SharePoint environments is even trickier. SharePoint provides GUI and command line (stsadm.exe) options for backups, but these options are most effective for small- to mid-sized businesses (SMBs). Also, there's no scheduling mechanism within SharePoint, so automation would need to be performed by scripts executed via the Windows Task Scheduler.
Of course, because SharePoint largely consists of SQL Server databases, one approach is to simply back up or protect those databases using the tried-and-true approach of your choice. This is certainly a viable option, but one must give this good thought, because traditional SQL backups will lack knowledge or context of a SharePoint environment, and additional time-consuming post processing activities will have to be performed to reintegrate the database back into SharePoint. Furthermore, SQL backups can't adequately address the aforementioned content recovery issues nor the additional steps needed to recover WFEs and application servers. Even index servers, if they're large, may need a recovery strategy, because rebuilding would be very time-consuming.
Alternative recovery approaches for SharePoint data
To meet recovery time objectives (RTOs) and recovery point objectives (RPOs) for both operational recovery and disaster recovery, most organizations will turn to third-party products. The obvious first place to look is at your backup app.
Startup vendor AvePoint Inc. has carved out an interesting niche as a SharePoint data protection specialist. Its DocAve 4.1 suite offers a comprehensive set of solutions, including the ability to perform live backups without suspending indexing.
Beyond backup apps, other technologies that lend themselves particularly well to SharePoint data protection are snapshots, replication and virtualization. Regarding virtualization, a number of companies are running SharePoint 2003 servers within virtualized environments with considerable success.
It's imperative to define and deploy standard configurations and consistent processes for SharePoint components and to be sure that these are fully documented. As SharePoint 2007 environments become commonplace, more technology options are sure to appear along with a well-defined body of best practices.
This article originally appeared in Storage magazine.
James Damoulakis is CTO of GlassHouse Technologies, an independent storage services firm with offices across the United States and in the UK.