Introduction
Part 1 of this series about Reliability described some aspects of designing products to have a long installed lifetime; that is – to have good reliability.
In this second part of the series, we cover what customers expect and then look at Reliability Prediction and Reliability Measurement.
Expectations
What is a reasonable Mean Time Between Failures? The answer of course depends on the product, how much you paid for it, and how it will be used. At Ozuno, we make products that are installed into buildings. We and our customers have a reasonable expectation that our products will last a long time. Now we need to define what we mean by “A long time”. We can use a fairly simple definition – the time between commercial building refits is around 10 years, so after being installed, a product should last at least that long. Unfortunately, now we need to start on some scary maths.MTBF for 10 Year Lifetime
The lifetime in hours for 10 years is easily calculated: 1 year = 24 hours per day * 365 days = 8760 hours per year. So 10 years = about 87,000 hours Therefore, we’d like a product to still be operating after that kind of period. We also need to bear in mind that product failure has a statistical probability, so a measured or predicted MTBF with those 10 year targets is not appropriate. For example, if the mean (that is, average) time of failure was 10 years then it means, by definition, 50% of all failures will happen before reaching the desired MTBF and the remaining 50% will happen after reaching the desired MTBF. This is not a very good outcome! Using some rough values from statistics, and assuming failures are due to truly random causes and so have a normal (gaussian) distribution:- The standard deviation must be calculated from measurements; and
- 68% of all failures are within +/- 1 standard deviation about the mean; and
- 95% of all failures are within +/- 2 standard deviations about the mean.
Customers, of course, don’t care about averages or statistics – it is no help at all being assured that your failure is just one of those that feeds into the calculation of an average. In a perfect world the actual mean time between failures would be about 2 standard deviations higher than the target lifetime. Because a standard deviation can only be calculated from measurements which happen after a product has failed, meeting this objective is difficult! The solution is to use a rule of thumb: the actual Mean Time Between Failures needs to be significantly higher than the desired 10 year product lifetime. For a useful installed life of 10 years (about 87,000 hours), the design aim MTBF needs to be about 2 – 4 times higher, or more. That is – about 174,000 to 350,000 hours for prediction or measurement is a good result and gives confidence that a significant majority of supplied product will still be operating after 10 years. Using this approach, there will still be a small number of failures after reaching an installed service time of 87,600 hours (10 years), but the number of these failures should be around 2% or less of product supplied. Can anyone do better? Of course – just have a design MTBF even higher. The statistics won’t lie but the cost of making the product may become prohibitive!
Reliability Predictions
Calculating a predicted MTBF can be very difficult and time consuming. Professionally developed and expensive software tools are available to do these predictions.RAPIX Zone Controller
We calculated the MTBF of our RAPIX DALI Zone Controller, using the Reliability Analytics Toolkit MIL-HDBK-217F parts count method. MIL-HDBK-217F includes several methods, the parts count method is a simplified process that still needs some reasonably good information about all components in the bill of materials. The parts count method is notoriously conservative – which means that any number it produces for MTBF is likely to be 2x to 3x smaller than observations in practice, especially for the “ground benign” environmental classification that is appropriate for typical commercial buildings. The Zone Controller includes 2 major circuit boards and a purchased CPU card. Each needed to be calculated separately. After that, a complete product reliability was calculated. Calculation Results:Board #1 (include CPU sub-assembly): MTBF 407,000 hours
Board #2: MTBF 450,000 hours
The combined MTBF is calculated using the same method as resistances in parallel: 1/ (1/M1 + 1/M2) Zone Controller Predicted MTBF: 213,710 hours, or about 24 years*. As noted above – actual MTBF can reasonably be expected to be higher. By prediction – the Zone Controller meets our desired requirement for an MTBF significantly greater than a typical expected service life. * (MIL-HDBK-217F parts count method, GB environment, industrial parts grade).RAPIX DALI Power Supply
A similar calculation for the RAPIX DALI Power Supply yields a predicted MTBF of 541,000 hours. This well exceeds the desired multiple of expected service life, and is once again a very conservative prediction.Reliability Measurement
We have been tracking the Mean Time Between Failures for our products since original market release.
For the RAPIX DALI Power Supply: this product typically has a quite severe service life – loaded to 3/4 or more of it’s capacity and mounted in a warm environment. The sustained operation for power delivery running in a warm environment is tough on electronics.
We track all product returns and have checked all failure types, finding no common factors or trends.
Of all the RAPIX DALI Power Supplies sold, the following small number of common underlying faults have been identified:
- Installation wiring not properly connected;
- Factory assembly process fault (1 sample only).
There have been no reliability / product wear-out returns.
Any calculation of MTBF based on field failures is therefore very difficult. In order to try and estimate the MTBF based on field failures we have assumed that 1% of all supplied power supplies failed and our customers did not return a single one of them. This seems pretty unlikely, but it allows us to calculated some numbers.
Using Reliability Analytics Toolkit Field MTBF Calculator we entered the number sold and the average time in the field. We use a 100% duty cycle, so assume the product is powered up and operational forever.
The result of the calculation is a series of estimates based on a confidence level:
Confidence (%) | MTBFlower (Hours) |
---|---|
50 | 4,350,985 |
60 | 4,242,973 |
70 | 4,131,338 |
90 | 3,839,595 |
95 | 3,709,372 |
99 | 3,480,648 |
Using a combination of:
- the worst case (ie highest confidence level);
- some pretty extreme assumptions about failed products that have not been returned (and which are unlikely to be happening in practice)
then field failure data indicates an MTBF so large as to be close to meaningless.
This result does not indicate the MTBF is the number shown – it indicates the product has a very large MTBF well exceeding the desired range of 174,000 to 350,000 hours.
A similar result is obtained for RAPIX DALI relays, phase dimmers, sensors and so on.