# Inter-rater agreement (kappa)

Command: | Statistics Agreement & responsiveness Inter-rater agreement (kappa) |

## Description

Creates a classification table, from raw data in the spreadsheet, for two observers and calculates an inter-rater agreement statistic (Kappa) to evaluate the agreement between two classifications on ordinal or nominal scales (Cohen, 1960; Cohen 1968; Fleiss et al., 2003).

## How to enter data

This test is performed on the raw data in the spreadsheet. If you have the data already organised in a table, you can use the Inter-rater agreement command in the Tests menu.

## Required input

In the *Inter-rater agreement* dialog box, two discrete variables with the classification data from the two observers must be identified. Classification data may either be numeric or alphanumeric (string) values.

## Weighted Kappa

Kappa does not take into account the degree of disagreement between observers and all
disagreement is treated equally as total disagreement.
Therefore when the categories are ordered, it is preferable to use Weighted Kappa (Cohen 1968; Fleis et al., 2003),
and assign different weights *w _{i}* to subjects for whom the raters differ
by

*i*categories, so that different levels of agreement can contribute to the value of Kappa.

MedCalc offers two sets of weights, called linear and quadratic.
In the linear set, if there are *k* categories, the weights are calculated as follows:

and in the quadratic set:

When there are 5 categories, the weights in the linear set are 1, 0.75, 0.50, 0.25 and 0 when there is a difference of 0 (=total agreement) or 1, 2, 3 and 4 categories respectively. In the quadratic set the weights are 1, 0.937, 0.750, 0.437 and 0.

Use linear weights when the difference between the first and second category has the same importance as a difference between the second and third category, etc. If the difference between the first and second category is less important than a difference between the second and third category, etc., use quadratic weights.

## Results

MedCalc calculates the inter-rater agreement statistic Kappa with 95% confidence interval (Fleiss et al., 2003).

The Standard errors reported by MedCalc are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).

The * K* value can be interpreted as follows (Altman, 1991):

Value of K | Strength of agreement |
---|---|

< 0.20 | Poor |

0.21 - 0.40 | Fair |

0.41 - 0.60 | Moderate |

0.61 - 0.80 | Good |

0.81 - 1.00 | Very good |

## Literature

- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
- Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220. [Abstract]
- Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3
^{rd}ed. Hoboken: John Wiley & Sons.

## See also

## External links

- Cohen's kappa on Wikipedia.