Sample size: Area under Precision-Recall curve
Command: | Sample size Confidence Interval estimation & Precision Area under Precision-Recall curve |
Description
Calculates the required sample size for the for the estimation of a confidence interval of the Area under Precision-Recall curve (AUPRC).
The calculation is based on Boyd et al, 2013 (Logit method). See Precision-recall curve.
Required input
- Confidence level (%): select the confidence level: 90, 95 or 99%. A 95% confidence level (the value for a 95% confidence interval) is the most common selection. You can enter a different confidence level if required.
- Area under Precision-Recall curve: the hypothesized Area under the Precision-Recall curve (the AUPRC expected to be found in the study).
- Desired Lower or Upper bound of the confidence interval: this value defines the desired width of the confidence interval. If your aim is to show that the AUPRC is larger than a certain value, enter the value of the Lower bound; if your aim is to show that the AUPRC is lower than a certain value, enter the value for the Upper bound.
- Ratio of sample sizes in negative / positive groups: enter the desired ratio of negative and positive cases. If you desire both groups to have an equal number of cases you enter 1; when you desire twice as many cases in the negative than in the positive group, enter 2.
Example
You want to show that the AUPRC of 0.55 for a particular test is significantly higher than the value 0.4, then you enter 0.55 for Area under Precision-Recall curve and 0.4 for Desired Lower bound of the confidence interval. You expect to include 9 times more negative cases than positive cases, so for the Ratio of sample sizes in negative / positive groups you enter 9.
Click Calculate
In the example 43 cases are required in the positive group and 387 in the negative group, giving a total of 430 cases.
Reference
- Boyd K, Eng KH, Page CD (2013) Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science, vol 8190. Springer, Berlin, Heidelberg.