MedCalc
Mail a PDF copy of this page to:
(Your email address will not be added to a mailing list)
working
Skip to main content
Show menu

Inter-rater agreement

Command:Tests
Next selectInter-rater agreement

Description

Use Inter-rater agreement to evaluate the agreement between two classifications (nominal or ordinal scales).

If the raw data are available in the spreadsheet, use Inter-rater agreement in the Statistics menu to create the classification table and calculate Kappa (Cohen 1960; Cohen 1968; Fleiss et al., 2003).

Agreement is quantified by the Kappa (K) statistic:

Required input

In the dialog form you can enter the two classification systems in a 6x6 frequency table.

Select Weighted Kappa (Cohen 1968) if the data come from an ordered scale. If the data come from a nominal scale, do not select Weighted Kappa.

Use linear weights when the difference between the first and secondcategory has the same importance as a difference between the second and thirdcategory, etc. If the difference between the first and second category is lessimportant than a difference between the second and third category, etc., use quadratic weights.

Inter-rater agreement (Kappa) test - dialog box

In this example, from the 6 cases that observer B has placed in class 1, observer A has placed 5 in class 1 and 1 in class 2; from the 19 cases that observer B has placed in class 2, observer A has placed 3 in class 1, 12 in class 2 and 4 in class 3; and from the 12 cases that observer B has placed in class 3, observer A has placed 2 in class 1, 2 in class 2 and 8 in class 3.

After you have entered the data, click the Test button. The program will display the value for Kappa with its standard Error and 95% confidence interval (CI).

MedCalc calculates the inter-rater agreement statistic Kappa according to Cohen, 1960; and weighted Kappa according to Cohen, 1968. Computational details are also given in Altman, 1991 (p. 406-407). The standard error and 95% confidence interval are calculated according to Fleiss et al., 2003.

The Standard errors reported by MedCalc are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).

The K value can be interpreted as follows (Altman, 1991):

Value of KStrength of agreement
< 0.20Poor
0.21 - 0.40Fair
0.41 - 0.60Moderate
0.61 - 0.80Good
0.81 - 1.00Very good

In the Comment input field you can enter a comment or conclusion that will be included on the printed report.

Literature

This site uses cookies to store information on your computer. More info...