Precision-recall curve

Command:

Statistics

ROC curves

Precision-recall curve

Description

A precision-recall curve is a plot of the precision (positive predictive value, y-axis) against the recall (sensitivity, x-axis) for different thresholds. It is an alternative for the ROC curve (Saito & Rehmsmeier, 2015).

MedCalc generates the precision-recall curve from the raw data (not from a sensitivity-PPV table).

How to enter data for a precision-recall curve

To create a precision-recall curve you should have a measurement of interest (= the parameter you want to study) and an independent diagnosis which classifies your study subjects into two distinct groups: a diseased and non-diseased group. The latter diagnosis should be independent from the measurement of interest.

In the spreadsheet, create a column Classification and a column for the variable of interest, e.g. Param. For every study subject enter a code for the classification as follows: 1 for the diseased cases, and 0 for the non-diseased or normal cases. In the Param column, enter the measurement of interest (this can be measurements, grades, etc. - if the data are categorical, code them with numerical values).

Data for precision-recall curve

Required input

Dialog box for Precision-recall curve

Variable: select the variable of interest.
Classification variable: select a dichotomous variable indicating diagnosis (0=negative, 1=positive).
If your data are coded differently, you can use the Define status tool to recode your data.
It is important to correctly identify the positive cases.
Filter: (optionally) a filter in order to include only a selected subgroup of cases (e.g. AGE>21, SEX="Male").
Graph options:
- Option to mark points corresponding to criterion values.

Results

Results for Precision-recall curve

MedCalc reports:

The sample sizes in the positive and negative groups.
The area under the precision-recall curve (AUPRC), calculated using non-linear interpolation (Davis & Goadrich, 2006).
F1_max: the F1 Score is a measure of a test's accuracy, and is the harmonic mean of the precision and recall. It is calculated at each measurement level and F1_max is the maximum F1 score over all measurement levels.
$$ F_1 = 2 \times \frac{Recall \times Precision}{Recall + Precision } $$
Associated criterion: the criterion (measurement level) at which F1_max was reached.
The 95% Confidence Interval for AUPRC, which is calculated as follows (Boyd et al, 2013; Logit method):
$$ CI(AUPRC) = \left[ \frac{e^{\mu_\eta - 1.96 \tau}}{1+ e^{\mu_\eta - 1.96 \tau}} \; ; \; \frac{e^{\mu_\eta + 1.96 \tau}}{1+ e^{\mu_\eta + 1.96 \tau}} \right] $$

where

$$ \mu_\eta = logit(AUPRC) = \ln \left ( \frac {AUPRC} {1-AUPRC} \right ) $$ $$\tau = \frac{1}{\sqrt{n*AUPRC*(1-AUPRC)}} $$

Graph

Precision-recall curve

When the option to mark points corresponding to criterion values in the graph was selected, then when you click on a marker, the corresponding criterion (for positivity) will be given together with recall (sensitivity), precision (positive predictive value) and F1 score.

Precision-recall curve

Literature

Boyd K, Eng KH, Page CD (2013) Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science, vol 8190. Springer, Berlin, Heidelberg.
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. Proceedings of the 23^rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
Efron B (1987) Better Bootstrap Confidence Intervals. Journal of the American Statistical Association 82:171-185.
Efron B, Tibshirani RJ (1993) An introduction to the Bootstrap. Chapman & Hall/CRC.
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10:e0118432.

External links

Precision and recall on Wikipedia.
F1 Score on Wikipedia.

Precision-recall curve

Description

How to enter data for a precision-recall curve

Required input

Results

Graph

Literature

See also

External links