Using Predictive Analytics to Calibrate FMEDA Why FMEDA gives the best failure rate results ifea March 8, 2012 Dr. William M. Goble ex Sellersville, PA USA Copyright ex 2009 1
IEC 61508 Fundamental Concepts IEC61508 Safety Life Cycle detailed engineering process Systematic Faults Design SOFTWARE Mistakes RELIABILITY Probabilistic performance based system design Random Failures HARDWARE RELIABILITY Copyright 2000-2012 ex Copyright ex 2009 2
SIF Verification Steps Safety Requirements Specification SIF Functional Description, Target SIL, RRF, Mitigated Hazards, Process Parameters, Logic, Bypass/Maintenance Requirements, Response Time, Proof Test Targets, etc. Manufacturer s Safety Manual Application Standards Manufacturer s Failure Data Failure Data Database Manufacturer s Safety Manual Application Standards 8. SIF Conceptual Design Select Technology No IEC 61511 Clause 11 9. SIF Conceptual Design Select Architecture IEC 61511 Clause 11 10. SIF Conceptual Design Determine Test Plan IEC 61511 Clause 11 11. SIF Conceptual Design Reliability/Safety Calculation IEC 61511 Clause 11 RRF, SIL Achieved? Yes 12. Detailed Design IEC 61511 Clause 11, 12 Equipment Justification Report H/W & S/W Design Safety Requirements - Technology Chosen, Voting Logic, Proof Test Requirements, Automatic Diagnostic Logic, Bypass Logic, Repair Time Requirements, SIL achieved, etc. Detailed Design Documentation Loop Diagrams, Wiring Diagrams, Logic Diagrams, Panel Layout, PLC Programming, PLC Program Testing, FAT Test Plan, Installation Requirements, Commissioning Requirements, Proof Test Plans, etc. Detailed Safety Lifecycle Drawings, Copyright ex 2008, used with permission. Copyright ex 2009 3
Field Failure Studies Pressure Transmitter Source Product Type Failure Rate per hour Comment NRPD95 Pressure Transducer 8.31E-06 Highest Number Refinery Data [Shel00] Analog Pressure Transducer 2.71E-06 Refinery Data [Shel00] Microprocessor Based Pressur 7.19E-06 DOW Plant Study [Skwe08] Pressure Transmitter 4.96E-07 Manufacturer Study [Moor98] Microprocessor Based Pressur 3.57E-07 Lowest Number When researching failure data it can be seen that results from various studies vary by an order of magnitude! This has caused many to conclude that the whole probabilistic approach to design cannot work! Copyright ex 2009 Copyright 2000-2012 ex 4
Field Failure Studies Source Product Type Failure Rate per hour Comment CCPS89 Diff. Pressure Transmitter, low 1.01E-06 CCPS89 Diff. Pressure Transmitter, mean 6.56E-05 CCPS89 Diff. Pressure Transmitter, high 2.54E-04 Highest Number NRPD95 Pressure Transducer 8.31E-06 Refinery Data [Shel00] Analog Pressure Transducer 2.71E-06 Refinery Data [Shel00] Microprocessor Based Pressure Transmit 7.19E-06 DOW Plant Study [Skwe08] Pressure Transmitter 4.96E-07 Manufacturer Study [Moor98] Microprocessor Based Pressure Transmit 3.57E-07 Lowest Number In a letter to the Editor, IEEE Spectrum Magazine, August 4, 2000, Dr. Patrick D. T. O Connor, a well known reliabiliity engineer and author wrote: IEC61508 introduces a threat greater than inefficiency and cost, since it relates to safety.... In particular, it requires the quantification of risk probabilities... Copyright ex 2009 Copyright 2000-2011 ex 5
Field Failure Studies Solenoid Valve Source Product Type Failure Rate per hour Comment NRPD95 Valve, Pneumatic Solenoid 1.67E-05 Highest Nu Refinery Data [Shel00] Solenoid Valve Not Available DOW Plant Study [Skwe08] Solenoid Manufacturer s Valve 7.02E-07 Manufacturer Study [AEAT05] Solenoid Valve 1.70E-08 Lowest Num Warranty Data Which number is "correct"? Why the different results? Copyright ex 2009 Copyright 2000-2012 ex 6
Field Failure Studies Owner/Operator Field Failure Studies Variations of amount of data collected Different definitions of FAILURE Wear-out failures included? Categorizing and Merging Technologies Fault Isolation often not complete Manufacturer Field Return Data Studies Data Analysis methods vary widely Rarely is it known what percentage of actual failures are returned to manufacturer Different definitions of FAILURE (Not a problem scenario) Copyright ex 2009 Copyright 2000-2012 ex 7
Owner/Operator Field Failure Studies Variations of amount of data collected Different definitions of FAILURE Wear-out failures included? Categorizing and Merging Technologies Lack of fault isolation After performing dozens of studies ex experience recognized that the data collection process varies by an order of magnitude or more! The process must be understood in order to analyze the data. Copyright ex 2009 Copyright 2000-2012 ex 8
Owner/Operator Field Failure Studies The process must be understood in order to analyze the data. Questions: When is a failure report written? What is the definition of failure? Are proof test results included? Are "as found" conditions recorded for a proof test? What were the operating conditions? Copyright ex 2009 Copyright 2000-2012 ex 9
Owner/Operator Field Failure Studies An extensive set of test results from a test shop indicated that a manufacturer had an exceptionally low failure rate when compared to the ex Predictive Analytics benchmark. An onsite visit to the test shop showed that when each instrument was returned for testing it was "cleaned up" then tested. Cleaning included disassembly and replacement of seals and o-rings. It is surprising that any units failed the test after that refurbishment activity. The test, repair and data collection process must be understood to analyze data. As Found conditions must be recorded. Copyright ex 2009 Copyright 2000-2012 ex 10
Owner/Operator Field Failure Studies Data from an owner/operator repair shop is entered into maintenance management system (MMS) where equipment failure rates are automatically analyzed. The MMS report showed that the failure rates were much lower than expected. An instrument engineer visited the test shop. It was discovered that failure reports were entered into the MMS only when a device was sent out for repair. That was how the repair purchase order was generated. All units repaired internally were not recorded as no purchase order was needed. The data in the MMS was very optimistic. The proof test interval was not extended. The test, repair and data collection process must be understood to analyze data. As Found conditions for all failures and tests must be recorded. Copyright ex 2009 Copyright 2000-2012 ex 11
Owner/Operator Field Failure Studies A detailed study of a valve showed a low failure rate when compared with the ex Predictive Analytics benchmark. A conclusion was that this valve would be an excellent choice for safety applications. The valve was designed and manufactured for control applications. Although not stated in the analysis report, it may be possible that the data was gathered for the dynamic operating conditions of a control valve. The failure rates for the same product in a static application would be much larger as many other failure modes exist in static applications. Application conditions must be recorded and listed in any analysis report. Copyright ex 2009 Copyright 2000-2012 ex 12
Proof Test Data Collection The repair shop is not the only source of failure data. Proof test records may be the only good source of data for good as found indication of fail-danger mode failures. 2 Proof test records must be carefully analyzed with an understanding of the application and the proof test procedures. What is the PROOF TEST COVERAGE FACTOR? Copyright 2000-2012 ex Copyright ex 2009 13
Manufacturer Field Return Studies Manufacturer Field Return Data Studies Calculation methods vary widely- how many operational hours? Cannot know what percentage of actual failures are returned Different definitions of FAILURE (Not a problem scenario) Many manufacturers classify returned items as a failure only if a manufacturing defect is found. Many returned items are marked no problem found. I have seen calculations where operational hours are estimated based on shipping records and it is assumed that all failures are returned. However, root cause analysis is typically done providing valuable information. Copyright ex 2009 Copyright 2000-2012 ex 14
Manufacturer Field Return Studies Manufacturer Field Return Data Studies Calculation methods vary widely Cannot know what percentage of actual failures are returned Different definitions of FAILURE (Not a problem scenario) Use warranty period only: Count failures during warranty period. Count operational hours only during warranty period. Assume a percentage of units returned: 10%, 50%, 70% depending on cost of device Lower Bound Lambda Some products have a built-in power on hours counter. For those products use the TTF number given by the product to calculate upper bound on failure rate Upper Bound Lambda. Copyright ex 2009 Copyright 2000-2012 ex 15
Failure Modes, Effects, & Diagnostics Analysis (FMEDA) Concept CONCEPT: Break problem to smaller pieces Perform a detailed study of each component and how the component failure will affect the instrument failure Copyright ex 2009 16 Copyright 2000-2012 ex
Failure Modes, Effects and Diagnostics Analysis (FMEDA) Extension of FMEA Technique developed in the early 1990s by ex engineers Adds quantitative failure rate column Add diagnostic and diagnostic coverage (DC) columns Documents how each component failure will impact operation of the system in terms of system failure mode Results in a failure rate for each failure mode of the system Copyright 2000-2012 ex Copyright ex 2009 17
FMEDA COMPONENT DATABASE Product λ Component λ s FMEDA Product Failure Modes Failure Mode Distribution Diagnostic Coverage Using a component database, failure rates and failure modes for a product (transmitter, I/O module, solenoid, actuator, valve) can be determined far more accurately than with only manufacturer s field failure data Copyright 2000-2012 ex Copyright ex 2009 18
Fault Injection Testing Simulate component failures and test that diagnostics perform as expected. F.I.T. test suites determined from FMEDA to test each diagnostic and a sample of other failures. Copyright 2000-2012 ex Copyright ex 2009 19
FMEDA Biggest Negative COMPONENT DATABASE Component λ s Failure Mode Distribution FMEDA Product λ Product Failure Modes Diagnostic Coverage The accuracy of the FMEDA depends of the accuracy of the component database. It must include failure rates and failure mode distributions of each component as a function of operating profile. The useful life of each component should also be listed as a function of operating profile. Copyright 2000-2012 ex Copyright ex 2009 20
Therefore the component database must be based on and calibrated by FIELD FAILURE DATA. Detail Design Information Components Used Stress Factors Application Environment Used In Field Failure Data Calibrates Component Failure Database Used In FMEDA Analysis Root Cause Analysis Copyright ex 2009 Explain Differences Adjust as needed Compare Product Level FMEDA Results Failure Rates Failure Modes Useful Life 21
FMEDA Based Failure Model Whenever an FMEDA is done, the results are compared with field failure data. The comparison often shows new failure modes of components, new failure rates due primarily to operational profile. Field Failure Data Product λ Compare FMEDA Product λ ELEC./MECH. COMPONENT DATABASE Industry Database Significant Difference? YES Update Component Database NO Copyright ex 2009 Finish Copyright 2000-2012 ex 22
Ten Billion Unit Operating Hours After over one hundred field failure studies of both end user and manufacturer data sets that represent more than ten billion unit operating hours, ex has: Updated the component failure database to constantly improve the model. Identified and updated the model when differences between the model and the results are explained Field Failure Data Product λ Compare FMEDA Product λ ELEC./MECH. COMPONENT DATABASE Industry Database Significant Difference? YES Update Component Database NO Copyright ex 2009 Finish Copyright 2000-2011 ex 23
FMEDA Based Failure Model A predictive failure rate / failure mode model for some components can be constructed from a hierarchical set of FMEDAs. The component database is the repository of the data. Product λ Product Failure Modes Diagnostic Coverage FMEDA Environment ELEC./MECH. COMPONENT DATABASE Component FMEDA Copyright ex 2009 Copyright 2000-2012 ex 24
FMEDA Based Failure Model FMEDA Examples of component FMEDA include: Failure Relays Modes Microprocessors MEDA Product λ Product Diagnostic Coverage Environment Microprocessor designed for functional safety: Duplicate memory w/comparison Dual CPU with diverse substrate Automatic self-diagnostics on Registers, Instruction decoder, Cache, I/O, etc. ldu < 2E-9 failures per hour ELEC./MECH. COMPONENT DATABASE Component FMEDA Copyright ex 2009 Copyright 2000-2012 ex 25
Comparing FMEDAs Some FMEA results based on manufacturer warranty return data analysis has produced very low failure rate numbers. Source Product Type Ball Valve, Close to trip, ex FMEDA [exid09b] Full Stroke Ball Valve, Close to trip, ex Manufacturer s FMEDA [exid09b] Tight Shutoff FMEA/Manufacturer Data (TUVRh10)Ball Valve Warranty Data Actuator, Rack and Pinion ex FMEDA [exid09a] Spring Return Actuator, Rack and Pinion ex Manufacturer s FMEDA [exid09a] Double Acting FMEA/Manufacturer Data (TUVRh11)Actuator, Rack and Pinion Warranty Data Lambda D - failures per hour 4.83E-07 Comment 1.35E-06 Highest Numb 6.55E-08 Lowest Numb 4.29E-07 6.84E-07 Highest Numb 8.54E-09 Lowest Numb Copyright ex 2009 Copyright 2000-2012 ex 26
Comparing FMEDA and Field Failure Results Failure Rate DU Failure Source Product Type per hour Rate Comment Refinery Data [Shel00] Analog Pressure Transducer 2.71E-06 Refinery Data [Shel00] Microprocessor Based Pressure Transm 7.19E-06 DOW Plant Study [Skwe08] Pressure Transmitter 4.96E-07 OLF-070 Pressure Transmitter 3.00E-07 FMEDA Analog 1151 Analog Pressure Transducer 3.53E-07 1.20E-07 No Remote Seals FMEDA Analog 1152 Analog Pressure Transducer 8.13E-07 2.87E-07 No Remote Seals FMEDA Microprocessor 1151 Microprocessor Based Pressure Transm 5.64E-07 1.15E-07 No Remote Seals FMEDA Microprocessor 3051 Microprocessor Based Pressure Transm 5.43E-07 9.80E-08 No Remote Seals FMEDA Safety 3051 61508 Certified Pressure Transmitter 5.36E-07 3.70E-08 No Remote Seals FMEDA Safety EJX 61508 Certified Pressure Transmitter 5.01E-07 2.80E-08 No Remote Seals FMEDA 3051 w Remote Seal Microprocessor Based Pressure Transm 7.04E-07 1.81E-07 Include Remote S Field Failure Data ranged from 4.96E-07 to 7.19E-06 including impulse lines, remote seals? No Info available. FMEDA Data ranged from 3.53E-7 to 8.13E-7 Copyright ex 2009 27
l Total Failure Rate e Comparing FMEDA and Field Failure Results NAMUR Study, 2009: Dupont, Daniel, "Merging Bottom-up and Top-down Availability for realistic Analysis of Safetyrelated Loops, ISBN 978-3-8322-9592-9, Shaker Verlag GmbH, Aachen, 2010. 3500 3000 Final Element 2500 2000 1500 Series1 1000 500 0 Sensors Logic Solver FMEDA NAMUR FMEDA NAMUR FMEDA NAMUR Copyright ex 2009 28
FMEDA Results Do Not Include Maintenance Induced Failures: These are site specific and not product specific. However as they are real, the exsilentia tool has a Maintenance Capability parameter that adjusts probability of successful repair, probability of failures, etc. Systematic Failures- Failures due to design or procedure: Systematic failures should be strongly reduced by careful root cause analysis of failures with corrective action. However, the Maintenance Capability parameter of exsilentia will adjust for site specific problems. Wearout Failures: Instead a useful life parameter is published. Copyright ex 2009 29
FMEDA Results Environmental Specific Operational Parameters are specified. Control Room Cabinet Field Equipment - 2 wire Field Equipment 4 wire Automotive Subsea Copyright ex 2009 30
Predictive Analytic Analysis USE DESIGN KNOWLEDGE By developing a detail understanding of dozens of different designs for a particular product, ranges of expected failure rates, ratios of failure rates and expected product level failure modes are known. FMEDA 1 Results Failure Rates Failure Modes Useful Life 28 FMEDA n Results Failure Rates Failure Modes Useful Life Predictive Analytic Model Failure Rate Range Failure Modes Useful Life Range 28 Field Failure Data Set Statistical Signature Analysis Compare to Benchmark Find Source of Differences Copyright ex 2009 Copyright 2000-2012 ex 31
Predictive Analytic Analysis EVALUATE FIELD DATA QUALITY By comparing field data analysis results with the Predictive Analytic Benchmark, data quality can be judged and issues in the data collection process can investigated. FMEDA 1 Results Failure Rates Failure Modes Useful Life 28 FMEDA n Results Failure Rates Failure Modes Useful Life Predictive Analytic Model Failure Rate Range Failure Modes Useful Life Range 28 Field Failure Data Set Statistical Signature Analysis Compare to Benchmark Find Source of Differences Copyright ex 2009 Copyright 2000-2012 ex 32
Future End User companies are beginning to see value in keeping good records of failure and proof test events. If a good product database is combined with a good data event capture tool, future field failure data will improve. This will be used to improve accuracy of the Predictive Analytic Models and FMEDA results. We do have work to do. SILStat Copyright ex 2009 33
Questions? Comments? More Information: 1. Free Wednesday Web Seminars see www.ex.com 2. White Papers 3. Safety Automation Equipment List www.saelonline.com 4. Books: www.isa.org, www.ex.com e Copyright ex 2009 34
Field Data Collection Tool SILStat Failure Database creation and maintenance Import data from other exsilentia applications including the SILver calculation tool Event Recorder available for ipad/ipad2, Windows Laptop, Windows Desktop Built-in failure taxonomies Pick list simplified selection Data Analysis service using Predictive Analytics Data output reports compatible with PERD SILStat Copyright ex 2009 35 Copyright ex 2011