Measurement error is unavoidable. There will always be some measurement variation that is due to the measurement system itself.
Most problematic measurement system issues come from measuring attribute data in terms that rely on human judgment such as good/bad, pass/fail, etc. This is because it is very difficult for all testers to apply the same operational definition of what is “good” and what is “bad.”
However, such measurement systems exist throughout industries. One example is quality control inspectors using a high-powered microscope to determine whether a pair of contact lenses is defect free. Hence, it is important to quantify how well such measurement systems are working.
Understanding Attribute Gage R&R
The tool used for this kind of analysis is called attribute gage R&R. The gage R&R stands for repeatability and reproducibility. Repeatability means that the same operator, measuring the same thing, using the same gage, should get the same reading every time. Reproducibility means that different operators, measuring the same thing, using the same gage, should get the same reading every time.
Most importantly, attribute gage R&R reveals two important findings – percentage of repeatability and percentage of reproducibility. Ideally, both percentages should be 100 percent. But generally, the rule of thumb is anything above 90 percent is quite adequate. However, this depends on the application.
Although, gage R&R is based on two-proportion test, obtaining these percentages can be done with simple mathematics only. And there is really no need for sophisticated software. Nevertheless, some statistics software such as Minitab or SigmaXL have a module called Attribute Agreement Analysis that does the same and much more, and this makes analysts’ lives easier.
However, it is important for analysts to understand what the statistical software is doing to make good sense of the report. In this article, the steps are reproduced using spreadsheet software with a case study as an example.
Steps to Calculate Gage R&R
Setting Up the Attribute Measurement System Analysis
Step 1: Select between 20 to 30 test samples that represent the full range of variation encountered in actual production runs. Practically speaking, if we use “clearly good” parts and “clearly bad” parts, the ability of the measurement system to accurately categorise the ones in between will not show. For maximum confidence, a 50-50 mix of good/bad parts is recommended. A 30:70 ratio is acceptable.
Step 2: Have a master appraiser categorize each test sample into its true attribute category.
Step 3: Select two to three inspectors who do the job. Have them categorize each test sample without knowing what the master appraiser has rated them.
Step 4: Place the test samples in a new random order and have the inspectors repeat their assessments.
Step 5: For each inspector, count the number of times his or her two readings agree. Divide this number by the total inspected to obtain the percentage of agreement. This is the individual repeatability of that inspector.
Furthermore, to obtain the overall repeatability, obtain the average of all individual repeatability percentages for all inspectors. In this case study, the overall repeatability is 93.33%, which means if the measurements are repeated on the same set of items, there is a 93.33% chance of getting the same results, which is not bad but not perfect.
Understanding the Results
In this case, the individual repeatability of Inspector 1 is 100%, of Inspector 2 is 95% and of Inspector 3 is 85%. This means for example that Inspector 3 is only consistent with himself 85% of the time. He is probably inexperienced and needs retraining.
Step 6: Compute the number of times each inspector’s two assessments agree with each other and also the standard produced by the master appraiser in Step 2.
This percentage is the individual effectiveness or reproducibility. In this case, Inspector 1 is in agreement with the standard only 75% of the time. Inspector 3 is in agreement with the standard only 65% of the time. Inspector 1 is clearly experienced in doing this kind of inspection. But he does not know the standard very well. Inspector 2 does. Inspector 3 needs explanation of the standard, too, after receiving general training in attribute inspection.
Step 7: Compute the percentage of times all the inspectors’ assessments agree for the first and second measurement for each sample item.
Step 8: Compute the percentage of the time all the inspectors’ assessments agree with each other and with the standard.
Finally, this percentage gives the overall effectiveness of the measurement system. The result is 65%. Hence, this is the percent of time all inspectors agree and their agreement matches with the standard.
In conclusion, Minitab and SigmaXL produce a lot more statistics in the output of the attribute agreement analysis, but for most cases and use, the analysis outlined in this article should suffice.
So What If the Gage R&R Is Not Good?
In conclusion, the key in all measurement systems is having a clear test method and clear criteria for what to accept and what to reject. The steps are as follows:
- Identify what is to be measured.
- Select the measurement instrument.
- Develop the test method and criteria for pass or fail.
- Test the test method and criteria (the operational definition) with some test samples (perform a gage R&R study).
- Confirm that the gage R&R in the study is close to 100 percent.
- Document the test method and criteria.
- Train all inspectors on the test method and criteria.
- Pilot run the new test method and criteria and perform periodic gage R&Rs to check if the measurement system is good.
- Launch the new test method and criteria.
Interested in more details? Read here.