Inter-rater agreement (kappa)
Command: | Statistics![]() ![]() ![]() ![]() |
Description
Creates a classification table, from raw data in the spreadsheet, for two observers and calculates an inter-rater agreement statistic (Kappa) to evaluate the agreement between two classifications on ordinal or nominal scales.
How to enter data
This test is performed on the raw data in the spreadsheet. If you have the data already organised in a table, you can use the Inter-rater agreement command in the Tests menu.
Required input
In the Inter-rater agreement dialog box, two discrete variables with the classification data from the two observers must be identified. Classification data may either be numeric or alphanumeric (string) values.
Weighted Kappa
Kappa does not take into account the degree of disagreement between observers and all disagreement is treated equally as total disagreement. Therefore when the categories are ordered, it is preferable to use Weighted Kappa (Cohen 1968), and assign different weights wi to subjects for whom the raters differ by i categories, so that different levels of agreement can contribute to the value of Kappa.
MedCalc offers two sets of weights, called linear and quadratic. In the linear set, if there are k categories, the weights are calculated as follows:
and in the quadratic set:
When there are 5 categories, the weights in the linear set are 1, 0.75, 0.50, 0.25 and 0 when there is a difference of 0 (=total agreement) or 1, 2, 3 and 4 categories respectively. In the quadratic set the weights are 1, 0.937, 0.750, 0.437 and 0.
Results
MedCalc calculates the inter-rater agreement statistic Kappa according to Cohen, 1960; and weighted Kappa according to Cohen, 1968. Computation details are also given in Altman, 1991 (p. 406-407). The standard error and 95% confidence interval are calculated according to Fleiss et al., 2003.
The standard errors reported by MedCalc are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).
The K value can be interpreted as follows (Altman, 1991):
Value of K | Strength of agreement |
---|---|
< 0.20 | Poor |
0.21 - 0.40 | Fair |
0.41 - 0.60 | Moderate |
0.61 - 0.80 | Good |
0.81 - 1.00 | Very good |
Literature
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
- Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220.
- Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd ed. Hoboken: John Wiley & Sons.