Assuming a power system is reliable, we can probably assume that it will also be available. While no power system is infallible, we can improve the chances of its continued operation by making appropriate safeguards. The goal is to minimize the potential for component failure. One useful way to do this is to identify and address potential single points of failure. This can be achieved through power audits and assessments. If the failure points can be identified and the threats mitigated or eliminated, we can increase power availability and improve equipment reliability.
Sizing the emergency power system is critical, so that the initial configuration should be able to handle the anticipated loads. Since many uninterruptible power systems are modular, the backup power environment can be expanded via additional uninterruptible power system modules and batteries. Be sure to size the generator to handle initial and potentially increased requirements.
Besides providing a smooth transition to emergency power during outages, the uninterruptible power system conditions power to remove spikes and sags that can damage computer components. Power coming from commercial sources is not clean enough to provide the consistent power required by sophisticated electronics. Both UPS and devices mentioned earlier (e.g., conditioners, filters) can mitigate this situation.
To ensure that the generator will work when needed, perform regular run tests, especially with a medium to full electrical load. Consider installing emergency power systems equipped with load banks (special circuitry designed to consume energy) capable of providing loads equaling 100% of the generator capacity. This allows full testing without impacting data center operations.
Dealing with data center power emergencies
When something interrupts power system operation, emergency procedures are necessary for a speedy resolution of power problems while minimizing the impact on critical data center systems. Such procedures should list, step-by-step, actions to be taken for a given type of emergency. If these procedures are not available, be sure to have access to trained maintenance personnel so a response can be organized. Assuming that your data center maintenance is contracted, you may have minimal on-site staff available to address emergencies. It's possible that if on-site employees are not familiar with power system operation, equipment manufacturers will need to supply this knowledge. The result is that when a power system emergency does occur, even if emergency procedures are in place, there may be nobody familiar enough with the primary and backup power systems to properly implement them.
Adequate and regularly updated power system documentation is essential. Data centers are dynamic environments, and as critical systems and additional infrastructure to support them are added this documentation must be maintained. Be sure to locate your primary and backup power systems in secure areas to prevent unauthorized access.
Perhaps the most important strategy for protecting power systems is regular maintenance. This means scheduling tests of primary and backup power systems, regular inspections, and following manufacturer recommendations for maintenance and support. Another key aspect of maintenance is the need for benchmarking. During maintenance, various tests are performed. The results of such tests are the most meaningful when they are tracked over time, rather than simply counted as "pass/fail."
Commissioning of power systems within the data center is another protective strategy. True commissioning is different from simple component startup, which only tests the component in question, usually with predefined vendor procedures. Ideally, test systems end-to-end across the data center to make sure all components work together properly.
Today's sophisticated data centers handle mission-critical operations and processes, and it's not feasible to shut them down -- even for a short duration. This means that power needs to be available continuously. A properly designed and regularly tested emergency power system will ensure that critical data center operations are protected.
More disaster recovery resources by this author