Summary statistics
DescriptionAllows to calculate summary statistics: mean, median, standard deviation, percentiles, etc. Required inputIn the Summary statistics dialog box you select the variable of interest. You can also enter a selection criterion in the Select field, in order to include only a selected subgroup of cases, as described in the Introduction part of this manual.
You can click the Options
Sample size: the number of cases n is the number of numeric entries for the variable that fulfil the selection criterion. The lowest value and highest value of all observations (range). Arithmetic mean: the arithmetic mean
95% confidence interval (CI) for the mean: this is a range of values, calculated using the method described later (see Standard Error of the Mean), which contains the population mean with a 95% probability. Median: when you have n observations, and these are sorted from smaller to larger, then the median is equal to the value with order number (n+1)/2. The median is equal to the 50th percentile. If the distribution of the data is Normal, then the median is equal to the arithmetic mean. The median is not sensitive to extreme values or outliers, and therefore it may be a better measure of central tendency than the arithmetic mean. 95% confidence interval (CI) for the median: this is a range of values that contains the population median with a 95% probability (Campbell & Gardner, 1988). This 95% confidence interval can only be calculated when the sample size is not too small. Variance: the variance is the mean of the square of the differences of all values with the arithmetic mean. The variance (s2) is calculated using the formula:
Standard deviation: the standard deviation (s or SD) is the square root of the variance, and is a measure of the spread of the data:
When the distribution of the observations is Normal, then 95% of all observations are located in the interval mean - 1.96 SD to mean + 1.96 SD (for other values see table: Values of the Normal distribution).
This interval should not be confused with the smaller 95% confidence interval for the mean. The interval mean - 1.96 SD to mean + 1.96 SD represents a descriptive 95% confidence range for the individual observations, whereas the 95% CI for the mean represents a statistical uncertainty of the arithmetic mean. Relative standard deviation (RSD): this is the standard deviation divided by the mean. If appropriate, this number can be expressed as a percentage by multiplying it by 100 to obtain the coefficient of variation. Standard error of the mean (SEM): is calculated by dividing the standard deviation by the square root of the sample size.
The SEM is used to calculate confidence intervals for the mean. When the distribution of the observations is Normal, or approximately Normal, then there is 95% confidence that the population mean is located in the interval Skewness The coefficient of Skewness is a measure for the degree of symmetry in the variable distribution. If the corresponding P-value is low (P<0.05) then the variable symmetry is significantly different from that of a Normal distribution, which has a coefficient of Skewness equal to 0 (Sheskin, 2011).
Kurtosis The coefficient of Kurtosis is a measure for the degree of peakedness/flatness in the variable distribution. If the corresponding P-value is low (P<0.05) then the variable peakedness is significantly different from that of a Normal distribution, which has a coefficient of Kurtosis equal to 0 (Sheskin, 2011).
Test for Normal distribution: The result of this test is expressed as 'accept Normality' or 'reject Normality', with P value. If P is higher than 0.05, it may be assumed that the data have a Normal distribution and the conclusion 'accept Normality' is displayed. If the P value is less than 0.05, then the hypothesis that the distribution of the observations in the sample is Normal, should be rejected, and the conclusion 'reject Normality' is displayed. In the latter case, the sample cannot accurately be described by arithmetic mean and standard deviation, and such samples should not be submitted to any parametrical statistical test or procedure, such as e.g. a t-test. To test the possible difference between not Normally distributed samples, the Wilcoxon test can be used, and correlation can be estimated by means of rank correlation. When the sample size is small, it may not be possible to perform the selected test and an appropriate message will appear. In this case you can visually evaluate the symmetry and peakedness of the distribution using the histogram or cumulative frequency distribution. Percentiles (or "centiles"): when when you have n observations, and these are sorted from smaller to larger, then the p-th percentile is equal to the observation with rank number (Lentner, 1982; Schoonjans et al., 2011):
When the rank number R(p) is a whole number, then the percentile coincides with the sample value; if R(p) is a fraction, then the percentile lies between the values with ranks adjacent to R(p) and in this case MedCalc uses interpolation to calculate the percentile. The formula for R(p) is only valid when
E.g. the 5th and 95th percentiles can only be estimated when n
Therefore it makes no sense to quote the 5th and 95th percentiles when the sample size is less than 20. In this case it is advised to quote the 10th and 90th percentiles, at least if the sample size is not less than 10. The percentiles can be interpreted as follows: p % of the observations lie below the p-th percentile, e.g. 10% of the observations lie below the 10th percentile. The 25th percentile is called the 1st quartile, the 50th percentile is the 2nd quartile (and equals the Median), and the 75th percentile is the 3rd quartile. The numerical difference between the 25th and 75 percentile is the interquartile range. Within the 2.5th and 97.5th percentiles lie 95% of the values and this range is called the 95% central range. The 90% central range is defined by the 5th and 95th percentiles, and the 10th and 90th percentiles define the 80% central range. Log transformationIf the option Log transformation was selected, the program will display the back-transformed results. The back-transformed mean is named the Geometric mean. Variance, Standard deviation and Standard error of the mean cannot be back-transformed meaningfully and are not reported. Presentation of resultsThe description of the data in a publication will include the sample size and arithmetic mean. The standard deviation can be given as an indicator of the variability of the data: the mean was 25.6 mm (SD 3.2 mm). The standard error of the mean can be given to show the precision of the mean: the mean was 25.6 mm (SE 1.6 mm). When you want to make an inference about the population mean, you can give the mean and the 95% confidence interval of the mean: the mean was 25.6 (95% CI 22.4 to 28.8). If the distribution of the variable is positively skewed, then a mathematical transformation of the data may be applied to obtain a Normal distribution, e.g. a logarithmic or square root transformation. After calculations you can convert the results back to the original scale. It is then useless to report the back-transformed standard deviation or standard error of the mean. Instead, you can antilog the confidence interval in case a logarithmic transformation was applied, or square the confidence interval if you have applied a square root transformation (Altman et al., 1983). The resulting confidence interval will then not be symmetrical, reflecting the shape of the distribution. If, for example, after logarithmic transformation of the data, the mean is 1.408 and the 95% confidence interval is 1.334 to 1.482, then you will antilog these statistics and report: the mean was 25.6 mm (95% CI 21.6 to 30.3). If the distribution of the variable is not normal even after logarithmic or other transformation, then it is better to report the median and a percentiles range, e.g. the interquartile range, or the 90% or 95% central range: the median was 25.6 mm (95% central range 19.6 to 33.5 mm). The sample size will be taken into consideration when you decide whether to use the interquartile range or the 90% or 95% central range (see percentiles) (Altman, 1980). The precision of the reported statistics should correspond to the precision of the original data. The mean and 95% CI can be given to one decimal place more than the raw data, the standard deviation and standard error can be given with one extra decimal (Altman et al., 1983). Finally, the summary statistics in the text or table may be complemented by a graph (see distribution plots). Literature
See also |