Use Inter-rater agreement to evaluate the agreement between two classifications (nominal or ordinal scales).
If the raw data are available in the spreadsheet, use Inter-rater agreement in the Statistics menu to create the classification table and calculate Kappa (Cohen 1960; Cohen 1968; Fleiss et al., 2003).
Agreement is quantified by the Kappa (K) statistic:
- K is 1 when there is perfect agreement between the classification systems
- K is 0 when there is no agreement better than chance
- K is negative when agreement is worse than chance.
In the dialog form you can enter the two classification systems in a 6x6 frequency table.
Select Weighted Kappa (Cohen 1968; Fleiss et al., 2003) if the data come from an ordered scale. If the data come from a nominal scale, do not select Weighted Kappa.
Use linear weights when the difference between the first and second category has the same importance as a difference between the second and third category, etc. If the difference between the first and second category is less important than a difference between the second and third category, etc., use quadratic weights.
In this example, from the 6 cases that observer B has placed in class 1, observer A has placed 5 in class 1 and 1 in class 2; from the 19 cases that observer B has placed in class 2, observer A has placed 3 in class 1, 12 in class 2 and 4 in class 3; and from the 12 cases that observer B has placed in class 3, observer A has placed 2 in class 1, 2 in class 2 and 8 in class 3.
After you have entered the data, click the Test button. The program will display the value for Kappa with its Standard Error and 95% confidence interval (CI) (Fleiss et al., 2003).
The Standard errors reported by MedCalc are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).
The K value can be interpreted as follows (Altman, 1991):
|Value of K||Strength of agreement|
|0.21 - 0.40||Fair|
|0.41 - 0.60||Moderate|
|0.61 - 0.80||Good|
|0.81 - 1.00||Very good|
In the Comment input field you can enter a comment or conclusion that will be included on the printed report.
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37-46.
- Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220. [Abstract]
- Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd ed. Hoboken: John Wiley & Sons.