Operational Excellence Through an Asset Management Optimisation System
Dr Brian J Cane CEng FIMMM
Paper presented at NACE 13th Middle East Corrosion Conference, Kingdom of Bahrain, 14 - 17 February 2010.
Abstract
There are a variety of factors that can influence the performance of plant assets. These include the design and the operating mode, but one of the most influential factors is the effectiveness of maintenance programmes. Best practice operators generally spend less on maintenance but still avail of good operational performance. The first step in any improvement program is to benchmark the current status of the plant in terms of its operation and maintenance process effectiveness. Once this has been established, programmes can be developed to address any skills or procedural deficiencies. This is followed by an assessment of plant condition related risks to safety and availability. Two distinct phases are thus involved in an Asset Maintenance Optimisation System (AMOS). The first phase is a performance benchmarking exercise and management processes review involving historical performance relative to industry norms and integrity management programme and programme implementation audits The second phase involves comprehensive plant-wide implementation of risk-based inspection and maintenance (RBI) planning tools which ouput risks, safe run lengths and inspection plans at equipment level.
The paper describes the processes involved in Phase 1 and Phase 2. Examples of implementation are illustrated using TWI's RISKWISE software for process units, utilities plant and pipelines.
1. Introduction
With the operational drive for increased plant safety, environmental compliance and profitability there is a clear need for owners and operators to ensure that their methods and strategies for plant asset maintenance are optimised. Central to this requirement is a recognised need for:
- systematic and auditable company-wide and plant-wide management processes which can be benchmarked against best practice
- consistent technical approaches for assessment and subsequent safe and economic planning of inspection and maintenance
In some cases these issues are linked such that the development of optimum inspection and maintenance plans will be strongly influenced by the quality and comprehensiveness of management processes. However, even good management processes and associated manuals do not necessarily mean that the risk of failure or plant availability loss is reduced or indeed, that there is justification in increasing maintenance intervals. In fact, the relationship between the effectiveness of the various management systems and the optimum inspection and maintenance strategy is often complex and cannot always simply be factored into the risk calculation to establish the inspection or maintenance plan. Both of the above needs are therefore best dealt with by means of a phased approach which aims to address each in turn. In addition it can be argued that a phased process is more cost-efficient since findings in the initial phase can be used to prioritise or focus efforts in the subsequent phase.
The Asset Maintenance Optimisation System (AMOS) has been developed to meet this requirement.
The AMOS program is aimed at improving plant performance by focussing attention on the effectiveness of inspection and maintenance programmes. Experience indicates that best practice operators generally spend less on inspection and maintenance but still avail themselves of a good operational performance and safety record. This emphasises the driver for the adoption of such a program.
The first step in any improvement program is to benchmark the current status of the plant in terms of its performance and maintenance process effectiveness. Once this has been established, programmes can be developed to address any quality, procedural or skills deficiencies. This is followed by a focussed assessment of plant condition related risks to safety and availability.
Accordingly therefore, the first phase of AMOS incorporates a performance benchmarking exercise and management processes audit. The second phase draws upon the first phase output to plan selective implementation of risk-based inspection and maintenance planning (RBI) tools. The latter addresses failure risks and safe run lengths on a detailed equipment basis compliant with methods set out in API RP 580[1].
Figure 1 summarises the AMOS program including formal approaches for first and second phases. For the second phase TWI's RBI software tool, RISKWISE is shown for illustrative purposes.
2. AMOS Phase 1: Benchmarking and management processes audit
Typically plant performance is influenced by the quality and effectiveness of a company's management processes. The primary management procedures may be developed at a corporate level but the implementation is largely dependent on the local management. Establishing the plant safety and availability risk will therefore require evaluation against best practice by consideration of:
- historical performance relative to generic industry norms
- asset management processes and programmes.
- how well these programmes are implemented?
These areas are detailed below
2.1 Performance review
This first requires a plant-level or unit-level review of plant performance including:
- Identifying existing and potential safety, availability and reliability issues and critical plant items.
- Establishing future performance requirements.
Following this, a preliminary assessment of equipment failure rate relative to industry experience needs to be conducted. For process plants generic failure frequencies, e.g. collated by API (API RP 581)[2] are used. For utilities plant such as boilers and turbines, NERC-GADS (North American Electric Reliability Corp - Generation Availability Data System) data (Ref[3]) can provide generic failure experience. The output is used to highlight areas of concern requiring corrective action focus if future performance targets are to be achieved.
2.2 Management programme audit
The aim of asset management programmes is to ensure that component integrity and plant reliability are economically maintained over the life of the plant. This basically requires that information about the plant, its design, how it is operated and the impact of operation and maintenance on its condition is gathered and assessed by people with appropriate competencies to ensure safe and reliable operation. The work process associated with each programme should result in the correct actions being taken at the appropriate times to meet this requirement. Each management programme should have a series of attributes that define the core actions that need to be taken or the procedures that should be present in order to ensure that all safety and reliability requirements are fulfilled.
For each plant area there are a number of activities or processes that should be active within a comprehensive best practice plant maintenance and integrity management system. Each aspect or attribute of a particular program, activity or process will have a role to play in controlling the condition of the plant. The effectiveness or worth of a particular technical management program can therefore be measured relative to a comprehensive 'best practice' programme by considering plant area attributes and associated metrics. If for example there are deficiencies in the programme, e.g. inadequate or omitted attributes or lack of applicability to a plant area then clearly the likelihood of problems arising will be higher than if all aspects of the best practice program are present.
Existing management programmes are therefore reviewed against best practice programme attributes and metrics to identify any shortcomings or deficiencies. The results are typically presented in the form of charts which portray the 'adequacy' score for each plant area or programme. The results are captured in formal manner in order to monitor the improvements as corrective actions are made.
2.3 Programme implementation audit
It is important that the evaluation of the significance of inspection findings or operational deviations are taken account of by a Competent Person ie someone with considerable training and experience in that area of plant and the damage mechanisms that can affect it. Clearly the above is what should be happening for each major component and plant area. However in practice these procedures are frequently not comprehensively applied or indeed can be omitted altogether. The degree of application of such procedures will give some indication of the level of risk of a major failure occurring on that component.
This audit will essentially involve a series of questions designed to establish precisely what maintenance and inspection is or is not being carried out. The results are again captured in formal manner to monitor and update the scores as corrective actions are taken.
2.4 Phase 1 Output
The output of the Phase 1 benchmarking exercise is formal documentation detailing the performance review and management audit findings and in particular, the quality, procedural and skills development needs as well as specific corrective action recommendations. The aim is to record the current status and facilitate continuous monitoring of improvements into the future.
Recommended actions will address the priority plant areas for focussed implementation of risk-based inspection and maintenance planning (RBI) software in Phase 2.
A Phase 1 application example for utilities plant audits given below.
2.5 Example application - utilities plant
The primary management processes to be addressed can be divided into functional areas and plant areas.
Functional areas include: Engineering, Operations, Maintenance, Performance, Training Plant areas include: Boiler, Turbine & Generator, Electrical, Auxiliary, Civil Structures.
For each area the process requires the definition of best practice attributes and the associated metrics.
Examples for the engineering function and boiler plant are given in Tables 1 and 2 respectively.
Table 1: Example of plant area attributes and metrics (area: engineering)
Principal objectives | Attributes | Metrics |
To keep plant able to fulfil operational requirements |
Comprehensive records of plant design details and drawings and system to track modifications made. |
Plant information retrieval system Register of plant design modification details and drawings |
|
Documented procedure for determining what/why/when plant modifications are required |
Plant area performance/needs/remaining life assessment reports |
|
Procedures for temporary and permanent weld repairs |
Register of Repairs - type and location Number of temporary repairs |
|
System for reviewing and approving quality of Contractors |
Approved contractor list |
|
System for ensuring quality control on all engineering projects and repairs |
Quality control procedure |
|
System for routine root cause analysis of trips and significant failures (e.g. boiler tubes etc) |
Trip investigation reports and failure investigation reports Ratio of faults to investigation reports |
|
Proposed modifications evaluation and review procedure |
Cost Benefit analysis of plant modifications |
|
Procedure to identify best practice for other similar plants. |
Availability of 'best practice' review documents |
|
System for identifying critical spares level and method of optimising spares availability (e.g. shared spares program) |
Documented spares management program |
Table 2: Example of programme implementation audit questions
Plant Area | Component | Query | Yes | No | Compliance Factor |
Boiler |
Superheater Tubing |
Is the inspection policy fully adhered to |
✓ |
|
1 |
|
|
Is the inspection program drawn up by a competent person |
✓ |
|
1 |
|
|
Is the inspection program fully completed |
✓ |
|
1 |
|
|
Are all tube failures recorded |
|
✓ |
0 |
|
|
Are all tube failures subject to root cause analysis |
|
✓ |
0 |
|
|
Are the number of tube failures decreasing |
✓ |
|
1 |
|
|
Are tube repairs/replacements always carried out to documented and approved procedures/engineering standards |
✓ |
|
1 |
|
|
Are temporary repairs used |
✓ |
|
1 |
|
|
Are temporary repairs always subsequently replaced with permanent repairs |
✓ |
|
1 |
|
|
Has a remaining life assessment that considers all feasible failure mechanisms been carried out by a competent person |
✓ |
|
1 |
|
|
Is the remaining life assessment updated by a competent person after each overhaul |
✓ |
|
1 |
|
|
Are historical inspection records available |
✓ |
|
1 |
|
|
Are operating steam temperatures monitored |
✓ |
|
1 |
|
|
Are metal temperatures monitored |
✓ |
|
1 |
|
|
Are significant operational deviations fed back into life assessment process and inspection plans |
✓ |
|
1 |
|
|
Is the damage (e.g. corrosion rate) monitored regularly |
✓ |
|
1 |
|
|
ACTUAL TOTAL |
|
|
14 |
|
|
POSSIBLE TOTAL |
|
|
16 |
|
|
COMPLIANCE RATIO |
|
|
.875 |
The results for each functional or plant area are recorded and used to focus or de-focus efforts in Phase 2.
3. AMOS Phase 2 - implementation of risk-based inspection and maintenance planning software
The process of plant asset optimisation is increasingly incorporating risk assessment followed by identification of optimum inspection and maintenance measures to selectively mitigate risks to levels consistent with target maintenance outage plans. This enables focussing of inspection resources on equipment items which have the greatest impact on plant availability and safety if not managed appropriately.
In this phase a detailed equipment item-by-item risk-based inspection and maintenance (RBI) planning exercise is carried out including:
- evaluation of inspection history.
- equipment damage mechanism audit.
- failure probability and consequence analysis.
- run-length and forward inspection/maintenance plan for each equipment item.
The above is facilitated using risk-based inspection and maintenance planning software. The implementation of Phase 2 is illustrated below using TWI's RISKWISE software.
3.1 RISKWISE Software
TWI's RISKWISE software offers a management support tool, which can be used to capture the decision making process and thereby formalise a fully traceable and auditable consistent methodology of inspection and maintenance planning. RISKWISE assesses the failure risk:
RISK = P(t) x C (1)
P(t) is the failure probability or likelihood
C is the failure consequence - safety as well as business interruption
P(t) = p(t) x .PFi (2)
p(t) is the equipment failure rate or probability of failure for each equipment item obtained from one or more of the following: generic failure frequency for given equipment (API 581);damage or remaining life models; experienced based rules; trended damage measurements (e.g. corrosion);expert judgement
.PFi is the sum of associated probability factors contributing to the failure probability, e.g. current condition, effectiveness of inspection, severity of operation, operational stability, etc.
C = .CFi (3)
.CFi is the sum of associated consequence factors including: energy release; explosion potential; toxic release; consequential damage; effect on unit availability/production; threat to personnel, community & environment
RISKWISE outputs failure risk over three time frames which is used to compute the safe run length through a run-length index (RLI) which is a risk-based indication of the acceptable run period between inspections (Figure 2)
Figure 2. Typical riskwise output screen
Based on the estimated remaining life, the RLI accounts for uncertainties in equipment condition and future operation. A schematic illustrating how the RLI relates to such uncertainties is shown in Figure 3.
Figure 3: Schematic depicting run-length index
The RISKWISE implementation process is illustrated in Figure 4.
The software automatically outputs a 'risk fingerprint' for each unit in the form of a risk matrix displaying the risk of failure and computing the associated run length indices across all equipment items. The software uses this to deliver an optimised inspection and maintenance plan for each equipment item to assist in planning shutdown workscopes as well as scheduling future inspections and risk mitigation actions.
The implementation of RISKWISE includes interfacing with plant inspection and maintenance data storage or CMMS systems. This significantly speeds up the implementation process.
RISKWISE comprises software suites covering oil & gas and chemicals process plant, utility boilers and turbines, storage tanks and pipelines.
3.2 Phase 2 output
The output of Phase 2 is a 'living' risk management software tool for optimised planning of inspection and maintenance fully integrated with plant-wide data storage or CMMS systems.
Several implementation examples are summarised below for utility boilers, process units and pipelines
3.3 Example application: utility boiler
The study was performed on a boiler with the following characteristics:
- 270MW conventional steam plant.
- Benson type boiler design based on TRD 301 for cyclic operation.
- Commissioned in 1980 & fuelled by natural gas.
- Superheater outlet rated design 555C at 200bar.
- Started cyclic operation in 2005.
The primary objectives were to extend boiler inspection interval to 30 months and improve unit availability.
RISKWISE for Boilers was implemented using both manual and automated functions to output risk factors and run-length indices (RLI) for all boiler equipment items. Pre-outage risk analysis and RLI results are shown in Figures 5 & 6.
Figure 5: risk matrix result prior to outage inspection
Figure 6: pre-outage RLI results
Risk mitigation actions were planned on the 14 items having RLI less than 30 months.
An outage inspection including the planned risk reduction actions was carried out. This was followed by post-outage re-running the software to output risk and RLI results based on all outage inspection findings.
Risk mitigation was achieved by increased inspection coverage on high risk items. Fig.7 shows that only two items remained with a RLI less than 30 months. These were related to economiser vibration and evaporator attachments. Control and modification actions were respectively planned to mitigate these at a subsequent outage to extend the overall outage interval to 30 months which was subsequently accepted by the regulator.
Figure 7: results after outage risk mitigation
The benefits of the study centred on: extended inspection interval saving one outage every 30 months; immediate inspection cost saving by eliminating 30% of inspection activities within next outage through exempting components with safe inspection interval >48 months; reducing unplanned outage rate by conducting risk mitigation actions on high risk components (eliminating 5% availability loss due to forced outages.
3.4 Example application: Hydrotreater unit
The example study comprised a pilot application of RISKWISE on selected equipment items in a Naphtha Hydrotreater Unit within an oil refinery. The unit was commissioned in the early 1980s. The current inspection periodicity for pressure equipment was generally 48 months. Piping however, was on a 24-month inspection cycle and storage tanks were inspected every 84 months.
The objectives were to:
- Demonstrate the key steps in the RBI process,
- Suggest ways of optimising inspection plans, and
- Identify ways of reducing the risk of failure.
Table 3 Scope of study
Item type | Total | Tank farm | Oxygen stripper | Reactor and heater | Hot/cold separator |
Accumulator |
1 |
|
|
|
1 |
Column |
1 |
|
1 |
|
|
Drum |
1 |
|
|
|
1 |
Reactor |
1 |
|
|
1 |
|
Heater |
1 |
|
|
1 |
|
Piping |
10 |
|
2 |
5 |
3 |
S&T exchanger |
18 |
|
|
16 |
2 |
Tanks |
2 |
2 |
|
|
|
Grand Total |
35 |
2 |
3 |
23 |
7 |
The results are summarised in Fig.8 & 9. Figure 8 shows the results of the risk audit.
Figure 8: risk summary matrix for the selected equipment items
Figure 9: distribution in run-length index (RLI)
The RLI was automatically output for each damage mechanism (DM) considered active within each item of equipment. The DMs are based on API 571 (Ref
[6]) incorporated within the RISKWISE process plant
Fig.9.
From Fig.9 it can be seen that the computed RLIs vary from zero to 480 months. In view of the currently adopted inspection periods given above and subject to process controlled shutdowns and statutory requirements, there was clearly scope for optimisation.
Details of selected equipment items, their damage mechanisms (DM), the risk audit results, revised inspection plans, and the risk mitigation recommendations are summarised below.
VAPOUR CONDENSER (BUNDLE): Carbon steel; 38°C; DMs: oxygen pitting, water side; RLI = 0 mths (IP = 48 mths), Risk Class: 5A. Resulting Focus/Defocus proposal: Replace bundle at earliest opportunity (or install air cooler); increased RLI = 48 mths (new Risk Class: 2A).
CHARGE HEATER TUBES: 9Cr 1Mo steel; 357°C; DMs: sulphidation, creep, vanadate attack; RLI = 480 mths (IP = 48 mths), Risk Class: 1E. Resulting Focus/Defocus proposal: Relax to visual inspection only at next planned shutdown; unchanged RLI = 48mths (unchanged Risk Class: 1E).
FEED TANK: Carbon steel; 25°C; DMs: general and pitting corrosion; RLI = 84 mths (IP = 84 mths); Risk Class: 2B. Resulting Focus/Defocus proposal: Internally coat tank floor with epoxy resin at next opportunity; increased RLI = 168 mths (new Risk Class: 1B)
REACTOR: P11 & 12Cr steel; 380°C; DMs: sulphidation, creep cracking, hydrogen attack, H+ embrittlement; RLI = 90 mths; Risk Class: 2E. Resulting Focus/Defocus proposal: Increase inspection interval for creep embrittlement cracking (only) to 96 mths; unchanged RLI = 90 mths (unchanged Risk Class: 2E)
HOT SEPARATOR: Carbon steel, 101°C; DMs: general/pitting corrosion, HIC, stress corrosion cracking; RLI = 90 mths (IP = 48mths); Risk Class: 2E. Resulting Focus/Defocus proposal: Defer next internal inspection by 48 mths; external UT (only) at normal inspection interval (48 mths); unchanged RLI = 90 mths (unchanged Risk Class: 2E).
FEED LINE: Carbon steel, 24°C; DMs: general/pitting corrosion; RLI = 240 mths (IP = 24 mths); Risk Class: 1D. Resulting Focus/Defocus proposal: Increase external UT interval to 48 mths; unchanged RLI = 240 mths (unchanged Risk Class: 1D)
COMBINED FEED EXCHANGER (BUNDLE): Carbon steel, 170°C; DMs: general/pitting corrosion; RLI = 48 mths (IP = 48 mths); Risk Class: 2E. Resulting Focus/Defocus proposal: Replace bundle in, e.g. Type 321 SS; increased RLI = 180 mths (unchanged Risk Class: 2E)
The major findings and benefits of the study were as follows:
- There was scope for reduction in shutdown inspection for approximately 70% of equipment;
- Incremental run-length extension was feasible after selected equipment modifications.
The study highlighted areas where imminent risk mitigation was needed (e.g. replacement of vapour condenser bundles).
3.5 Example application : Oil & gas pipelines
The approach involves the phased implementation of a risk management programme to provide decision support to the client on planning inspection, repair or replacement of pipeline elements of an ageing pipeline network. The work uses 'RISKWISE for Pipelines' for risk assessment and ranking of pipeline network elements based on information available. This is followed by a focussed inspection programme, the results of which are fed back into RISKWISE for re-assessment of risk/remaining life distributions and risk mitigation actions. Depending on the condition data available, high risk areas are then subject to a quantitative probabilistic evaluation program (TWI's LIFEWISE program) which will allow future maintenance and replacement strategies to be refined.
The assessment is carried out to the requirements of ASME B31.8S (Ref[7]) which covers the risk assessment, inspection and integrity assessment of gas pipelines. NACE RP 0502 (Ref[8]) will be applied in areas not covered by ASME.
Example information required includes:
- - As-built design data: pipeline data sheets, drawings, geographical layout (GIS), etc.
- - Pipeline operating conditions flow rates, global pressures and temperatures
- - Historical inspection data for in-service external inspection if any
- - Results of intelligent pigging if used
- - Analysis of scales and debris
- - Corrosion monitoring summaries if any
- - Details of cathodic protection
- - Results of external UT scanning and internal cable operated UT
- - Failure/leakage and repair history
The information is used to populate the initial RISKWISE data input followed by segmenting the pipelines and characterising the following failure likelihood and failure consequence factors for each segment:
- Likelihood factors:
- - Current condition: prior internal(pigging) and external inspection data, age of lines, leak history
- - Failure likelihood due to internal and external corrosion: pressure, flow conditions, coatings, coating condition, geographical location, ground conditions, soil type, depth, cathodic protection, climatic conditions, line supports
- - Effectiveness of inspection: comprehensiveness of pigging, external inspection, nature of corrosion damage
- - Third party damage potential
- - Potential for ground movement
- - Operation in relation to design limits
- - Recurring repair issue
- Consequence factors:
- - Failure mode
- - Extent of release
- - Pressure factor
- - Fire and explosion damage potential
- - Location of line
- - Effect of failure on distribution, time to rectify a leak
- - Cost of repair
- - Threat to personnel and environment
- - Adequacy of safety systems
The output of the risk analysis provides a risk-focussed inspection plan.
The inspection programme should include intelligent pigging (where possible), direct assessment methods including excavation visual examination (e.g. coating condition) and UT thickness measurements will also be performed in selected high criticality areas.
The overall inspection workscope includes:
- - identifying locations for cathodic protection, bell-hole excavation, digital photographs
- - soil resistivity assessment
- - above-ground pipeline inspection
- - effectiveness assessment of existing cathodic protection using 'close interval potential survey' (CIPS). Direct current voltage gradient (DCVG) will also be used to provide an assessment of coating condition
Results of the inspection are fed back to RISKWISE to re-assess and refine the initial risk assessment and run length index (RLI) evaluations and provide initial output of forward inspection and maintenance plans.
High risk/low RLI areas are subject to further refinement by means of a quantitative probabilistic analysis using TWI's LIFEWISE program which is based on the ASME SRRA (structural reliability and risk assessment) code. The LIFEWISE program incorporates a generic remaining life rule which accepts uncertainty distributions in damage/thinning status, materials data/corrosion rates and operating conditions. The distributions are integrated in a probabilistic analysis to compute failure probability against forward time. The probability can be combined with the failure consequence cost to give risk verses time in monetary value terms. This enables pipeline maintenance or replacement schedules to be optimised in terms of 'net present value' considerations.
Elements of the above methodology are illustrated in the example study below.
The study was aimed at establishing the inspection/maintenance focus and the remaining life of oil production, water injection and flare lines in two Oil Fields: Field A: 180 lines (600km); Field B: 101 lines (350km). The main damage mechanisms were: CO2, H2S, O2, microbial corrosion. Results of the initial assessment are shown in Fig.10.
Figure 10: Pipeline network initial assessment results
A quantitative probabilistic assessment was performed on some lines. An example of the output is shown in Fig.11. This illustrates the extension in assessed lifetime obtainable by performing inspection compared with using corrosion modelling only.
Figure 11: Results of probabilistic life analysis
The study enabled a pipeline inspection and replacement scheduling plan to be formalised.
Benefits included: deferred capital spend, extended inspection intervals, minimised risk of business interruption and minimised liability uncertainties.
4. Concluding remarks
This paper has set out a common process for establishing an optimised O&M regime for various plant types. The AMOS approach divides the process into two distinct phases separating the benchmarking and management processes from the equipment level failure risk assessment and maintenance planning. In this way, the first phase allows prioritisation or focus for implementation of the second phase. This step-wise approach is considered to be a cost-efficient way of achieving operational excellence.
5. References
- API Recommended Practice 580, Risk-Based Inspection, American Petroleum Institute, 2002
- API RP 581 Risk-Based Inspection Technology, 2nd Edition American Petroleum Institute, 2008
- North American Electric Reliability Council - Generation Availability Data System (NERC-GADS), 1995
- UK Health & Safety Executive Document CRR 363/2001 Best practice for risk based inspection as a part of plant integrity management, 2005
- Cane, B.J. RISKWISE - A user oriented risk-based inspection & maintenance planning tool, 2nd conference on corrosion in the oil industry, Tehran, 2003
- API RP 571 Damage Mechanisms Affecting Fixed Equipment in the Refining Industry, American Petroleum Institute, 2003
- ANSI/ASME B31.8S - 2004 Managing System Integrity of Gas Pipelines, American Society of Mechanical ngineers, 2005
- NACE RP0502 - 2002 Standard Recommended Practice - Pipeline External Corrosion Direct Assessment Methodology, 2002