Inter-rater agreement
Command: | Tests Inter-rater agreement |
Description
Use Inter-rater agreement to evaluate the agreement between two classifications (nominal or ordinal scales).
If the raw data are available in the spreadsheet, use Inter-rater agreement in the Statistics menu to create the classification table and calculate Kappa (Cohen 1960; Cohen 1968; Fleiss et al., 2003).
Agreement is quantified by the Kappa (K) statistic:
- K is 1 when there is perfect agreement between the classification systems
- K is 0 when there is no agreement better than chance
- K is negative when agreement is worse than chance.
Required input
In the dialog form you can enter the two classification systems in a 6x6 frequency table.
Select Weighted Kappa (Cohen 1968) if the data come from an ordered scale. If the data come from a nominal scale, do not select Weighted Kappa.
In this example, from the 6 cases that observer B has placed in class 1, observer A has placed 5 in class 1 and 1 in class 2; from the 19 cases that observer B has placed in class 2, observer A has placed 3 in class 1, 12 in class 2 and 4 in class 3; and from the 12 cases that observer B has placed in class 3, observer A has placed 2 in class 1, 2 in class 2 and 8 in class 3.
After you have entered the data, click Test.
Results
MedCalc calculates the value for Kappa with its standard Error and 95% confidence interval (CI).
MedCalc calculates the inter-rater agreement statistic Kappa according to Cohen, 1960; and weighted Kappa according to Cohen, 1968. Computational details are also given in Altman, 1991 (p. 406-407). The standard error and 95% confidence interval are calculated according to Fleiss et al., 2003.
The Standard errors reported by MedCalc are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).
The K value can be interpreted as follows (Altman, 1991):
Value of K | Strength of agreement |
---|---|
< 0.20 | Poor |
0.21 - 0.40 | Fair |
0.41 - 0.60 | Moderate |
0.61 - 0.80 | Good |
0.81 - 1.00 | Very good |
In an optional Comment input field you can enter a comment or conclusion that will be included on the printed report.
Literature
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37-46.
- Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220.
- Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd ed. Hoboken: John Wiley & Sons.