Measuring Reliability For Uninterruptible Power Supplies and Power Protection Plans

The criticality for which uninterruptible power supplies (UPS) were created means that their reliability requires some form of measure to give customers a means of comparing different manufacturers and UPS. The purpose being to shield the loads the UPS is protecting from vulnerability, therefore, reliability should not be guessed at.
Mean Time Between Failure

MTBF or Mean Time Between Failure is one such measure – an indicator of the reliability of an uninterruptible power supply. It is the average operational time between powering up and system shutdown due to failure (not power failure in this sense but failure of the UPS system itself). It is represented by a measurement of hours.

Average failure rate is another measure of reliability. This is the total number of failures in a given time period. The failure rate over the lifetime of any UPS system, therefore, is inversely proportionate to its MTBF.

Uninterruptible power supplies are no different to any other electronic equipment in that the rate at which they fail is not constant. There are three distinct periods associated with UPS failure (which are often represented by a bathtub curve diagram showing a) infant mortality failures, b) random failures and c) wear out failures).

Infant Mortality UPS Failures

Infant mortality failures correspond to failures early on the life of the uninterruptible power supply. IT-sized uninterruptible power supplies can suffer what is termed ‘dead-on-arrival’. This could be due to a component manufacturing defect or transportation damage. A sudden shock or jolt in transportation may weaken a soldered joint, for example. Whilst UPS manufacturers strive to reduce these incidents as much as possible through stringent quality checks and testing processes, they do happen. Various processes can be applied to minimise the chances of it happening. UPS from 10kVA, for example, can be run for short burn-in periods (up to 48 hours) at high ambient temperature to reduce the potential for such failures.

Random UPS Failures

Random failures happen less often. During the normal working life of a UPS, the rate of these is low and fairly constant.

Wear Out Failures

Wear out failures at the end of an uninterruptible power supply’s working life are more common (and this is where the curves is steeper). Here, battery problems account for 98 percent of UPS wear out failures. Particularly where uninterruptible power supply has been subjected to high ambient temperatures over long periods, internal cabling insulation becomes brittle and breaks down. There are other consumable items that should be part of a regular monitoring regime, such as fans and capacitors, which will also eventually wear out with use.

Just because a manufacturer shows you some favourable MTBF stats does not necessarily mean that their products are the most reliable. Like most things, these can be massaged into looking more relaxed than they actually are. The important question to ask is: what was the basis for their calculation? There are two primary approaches:

1) A record of the total number of failures for a particular UPS size over a given time period.

Commonly adopted by UPS manufacturers, this is a valuable approach if the field population is large and the time period long enough (more than the typical life expectancy of a UPS, which is five to ten years).

2) A system value calculated from the known MTBF values of components and assemblies.

Obviously, this approach is more complex and relies on following standardised calculation formats.

Mean Time to Repair

Mean Time to Repair (Mean Time to Restore) is the time taken to return an uninterruptible power supply to normal operation from shutdown.

Online UPS are designed to fail safely to mains; therefore, the MTBF calculation of the mains power supply is also an important consideration along with mean time to repair (or average repair time).

As it is highly unlikely for a service engineer to be onsite at the very moment a UPS fails, MTTR needs also to include a travel time element. This also assumes the service engineer is carrying the required parts needed to fix the problem in a single visit, which is sometimes not the case. Uninterruptible power supply manufacturers may only provide a figure based on the actual repair time. Although this may be a satisfactory comparison tool, it is not a true representation of reliability. A degree of scepticism is sometimes necessary when comparing marketing data from some manufacturers.