Inter-rater agreement (kappa)

Command: Statistics
Next selectAgreement & responsiveness
Next selectInter-rater agreement (kappa)


Creates a classification table, from raw data in the spreadsheet, for two observers and calculates an inter-rater agreement statistic (Kappa) to evaluate the agreement between two classifications on ordinal or nominal scales (Cohen, 1960; Cohen 1968; Fleiss et al., 2003).

How to enter data

This test is performed on the raw data in the spreadsheet. If you have the data already organised in a table, you can use the Inter-rater agreement command in the Tests menu.

Inter-rater agreement (kappa)

Required input

In the Inter-rater agreement dialog box, two discrete variables with the classification data from the two observers must be identified. Classification data may either be numeric or alphanumeric (string) values.

Inter-rater agreement (kappa)

Weighted Kappa

Kappa does not take into account the degree of disagreement between observers and all disagreement is treated equally as total disagreement. Therefore when the categories are ordered, it is preferable to use Weighted Kappa (Cohen 1968; Fleis et al., 2003), and assign different weights wi to subjects for whom the raters differ by i categories, so that different levels of agreement can contribute to the value of Kappa.

MedCalc offers two sets of weights, called linear and quadratic. In the linear set, if there are k categories, the weights are calculated as follows:

Inter-rater agreement (kappa)

and in the quadratic set:

Inter-rater agreement (kappa)

When there are 5 categories, the weights in the linear set are 1, 0.75, 0.50, 0.25 and 0 when there is a difference of 0 (=total agreement) or 1, 2, 3 and 4 categories respectively. In the quadratic set the weights are 1, 0.937, 0.750, 0.437 and 0.

Use linear weights when the difference between the first and second category has the same importance as a difference between the second and third category, etc. If the difference between the first and second category is less important than a difference between the second and third category, etc., use quadratic weights.


Inter-rater agreement (kappa)

MedCalc calculates the inter-rater agreement statistic Kappa with 95% confidence interval (Fleiss et al., 2003).

The Standard errors reported by MedCalc are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).

The K value can be interpreted as follows (Altman, 1991):

Value of KStrength of agreement
< 0.20Poor
0.21 - 0.40Fair
0.41 - 0.60Moderate
0.61 - 0.80Good
0.81 - 1.00Very good


  • Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
  • Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
  • Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220. [Abstract]
  • Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd ed. Hoboken: John Wiley & Sons.

See also

External links

Privacy & cookies Contact Site map