Failure is a process, not an event.
Equipment failure is never an isolated event, nor is it instantaneous. It happens because a combination of incidents and conditions lead to an eventual fault in one or more components. This can be due to internal wear and tear (for example, fatigue in rotating equipment) or due to external factors (for example, high sustained humidity leading to unexpected corrosion).
Framed in this way, we open up a wide range of opportunities to minimise downtime as well as extend the operating lifetime of plant equipment.
There are a number of ways to increase the reliability of equipment once it has been installed, and many tend to fall into the following categories:
1. Reactive maintenance
2. Preventative maintenance
3. Predictive maintenance
Reactive maintenance is the act of correcting the fault once it has occurred. This can be expensive and unpredictable, and causes periods of unscheduled downtime that can affect the plant during peak operation.
Preventative maintenance is the act of scheduling regular services and component replacements, which are typically based conservatively on the statistical time between failure rates of the components. This is common practice in many facilities.
Predictive maintenance is the process of anticipating a fault before it happens, allowing the company to schedule inspections, repairs or replacements before a single part anomaly or minor failure goes on to cause a chain of failures (for example, a cracked gear tooth going on to destroy a gearbox). Component utilisation is generally increased using this approach, leading to reduced downtime and costs when compared to preventative maintenance. It enables a just-in-time maintenance workflow, and with advanced warning systems allows maintenance to be performed outside of peak operation periods.
A good way to look at the failure process is through a "PF curve":
As can be seen, the ideal system is one that detects impending failure as early as possible, allowing the maintenance team to order parts, schedule off-peak repair, and carry out those repairs before any damages occur. In many cases, advanced predictions allow external factors to be corrected, thereby avoiding downtime and repair costs altogether (for example, correcting for environmental factors before they cause damage).
Another way to illustrate the improvements brought on by each stage of this sequence is the cost of upkeep vs downtime over the equipment lifespan:
1. Reactive Maintenance
Here you can see the unpredictability of both the cost of the failure and when the failure occurs. The expected lifetime of the system might also fall short of expectations if the value of a repair is more than the value of replacing the system.
2. Preventative Maintenance
The benefits of longer lifespan and higher reliability can be seen quite clearly with preventative maintenance. The primary downside is the replacement of parts that could have lasted considerably longer, as well as excess downtime of the equipment. Unexpected failures may still occur towards the end of life.
2. Predictive Maintenance + Preventative
In many cases, current best practice is to employ predictive maintenance in combination with more sparsely scheduled preventative maintenance. This allows higher component utilisation, along with reduced downtime, while still making use of inspections to ensure reliable operation. Maximum lifespan of equipment under this method is often the greatest, as wear and tear of components is usually isolated before affecting adjacent areas of the system.
There are many exceptions to these rules, with two that stand out in particular. The first is that small, low-cost, and isolated systems are often easier to replace than maintain. The second is that large operating equipment with long downtime periods will often find value in scheduling full inspection periods during necessary downtime intervals. This still falls within the hybrid model, but is more of a “preventative” approach that moves further away from just-in-time practices.
How do I deploy predictive maintenance for my system?
There are two main factors to consider:
1. Stages of complexity
2. Stages of deployment
- To start with, you need to begin collecting and storing live data - as much of it as possible, and as early as possible.
- The next step is to identify thresholds for each signal. These should alert you if expected operating values are being exceeded.
- To improve on this, anomaly detection models can be deployed to recognise unusual patterns of behaviour across different signal groups and time ranges (for more detail see our post on anomaly detection).
- To start making predictions further up the PF curve, the next improvement generally comes from training supervised learning models (to be discussed in a future post) to predict failure cases. This requires labelling either historic cases of failure or simulated cases of failure and generally requires more engineering effort. However, once trained, this can be a powerful tool to employ across your assets.
- We usually recommend that you begin with your standard baseline schedule for maintenance.
- Once you begin rolling out predictive maintenance models, note down any reductions in reactive maintenance, and then begin to increase the time between relevant scheduled replacements. Depending on the risk involved, you can also substitute scheduled replacements with scheduled component inspections.
- As confidence increases in the predictive maintenance models and their sophistication increases with more data, learning ability, and faster algorithms, preventative maintenance can be phased out over time.
If you would like to talk about implementing predictive maintenance models for your plant or system, or if would simply like to begin consolidating your equipment data in preparation for such models, please contact us at firstname.lastname@example.org.