Skip to main content
MedCalc
Mail a PDF copy of this page to:
(Your email address will not be added to a mailing list)
working
Show menu Show menu

Sample size: Area under Precision-Recall curve

Description

Calculates the required sample size for the for the estimation of a confidence interval of the Area under Precision-Recall curve (AUPRC).

The calculation is based on Boyd et al, 2013 (Logit method). See Precision-recall curve.

Required input

  • Confidence level (%): select the confidence level: 90, 95 or 99%. A 95% confidence level (the value for a 95% confidence interval) is the most common selection. You can enter a different confidence level if required.
  • Area under Precision-Recall curve: the hypothesized Area under the Precision-Recall curve (the AUPRC expected to be found in the study).
  • Desired Lower or Upper bound of the confidence interval: this value defines the desired width of the confidence interval. If your aim is to show that the AUPRC is larger than a certain value, enter the value of the Lower bound; if your aim is to show that the AUPRC is lower than a certain value, enter the value for the Upper bound.
  • Ratio of sample sizes in negative / positive groups: enter the desired ratio of negative and positive cases. If you desire both groups to have an equal number of cases you enter 1; when you desire twice as many cases in the negative than in the positive group, enter 2.

Example

Sample size calculation for the Area under Precision-Recall curve.

You want to show that the AUPRC of 0.55 for a particular test is significantly higher than the value 0.4, then you enter 0.55 for Area under Precision-Recall curve and 0.4 for Desired Lower bound of the confidence interval. You expect to include 9 times more negative cases than positive cases, so for the Ratio of sample sizes in negative / positive groups you enter 9.

Click Calculate

In the example 43 cases are required in the positive group and 387 in the negative group, giving a total of 430 cases.

Reference

  • Boyd K, Eng KH, Page CD (2013) Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science, vol 8190. Springer, Berlin, Heidelberg.

See also