Calculation of Trimmed Mean, SE and confidence interval

The k-times trimmed mean is calculated as the mean of the sample after the k smallest and k largest observations are deleted from the sample.

If the number of observations to be trimmed is specified as a percentage p, then p is the percentage of observations to be trimmed at each tail of the sample. If for example the percentage is 20%, then MedCalc trims 20% at the lower tail and 20% at the higher tail.

With the proportion $ \gamma $ being $ p/100 $, the number of observations to trim k is given by $$ k = trunc ( n \gamma ) $$

If $ n \gamma $ is not an integer number, it is truncated to the largest smaller integer (rounded down). Note that also SPSS, R (and Excel) round down, but SAS rounds up.

The k-times trimmed mean is calculated as

$$ \bar{x}_{tk} = \frac{1} {n-2k} \sum_{i=k+1}^{n-k}{x_i} $$

The Standard Error of the trimmed mean is based on the Winsorized mean and Winsorized sum of squared deviations (Tukey & McLauglin, 1963). The Winsorized mean is calculated as

$$ \bar{x}_{wk} = \frac{1}{n} \left( (k+1) x_{k+1} + \sum_{i=k+2}^{n-k-1}{x_i} + (k+1) x_{n-k} \right) $$

and the Winsorized sum of squared deviations is calculated as

$$ s^{2}_{wk} = (k+1) {(x_{k+1} - \bar{x}_{wk})}^2 + \sum_{i=k+2}^{n-k-1}{({x_i}-\bar{x}_{wk})}^2 + (k+1) {(x_{n-k} - \bar{x}_{wk})}^2 $$

The Standard Error of the trimmed mean can then be calculated as:

$$ \text{SE}(\bar{x}_{tk}) = \frac{s_{wk}}{\sqrt{(n-2k)(n-2k-1)} } $$

The confidence interval for the trimmed mean is defined as

$$ \bar{x}_{tk} \pm t_{(1- \frac{\alpha}{2}, n-2k-1)} \text{SE}(\bar{x}_{tk}) $$

Comparison of 2 independent trimmed means, the Yuen-Welch test

These calculations are based on the method given by Yuen, 1974 (see Wilcox, 2022).

Yuen's test statistic is

$$ T = \frac{ \bar{x}_{wk1} - \bar{x}_{wk2} } { \sqrt {d_1 + d_2 } } $$

where $d_1$ and $d_2$ are the squares of the standard errors of the 2 sample means.

The estimated degrees of freedom is:

$$ df = \frac {(d_1+d_2)^2} { \frac{d_1^2}{h_1-1}+\frac{d_2^2}{h_2-1} } $$

where $ h_j = n_j-2k_j$ is the number of observations left after trimming.

The P-value is taken from the t-distribution with df degrees of freedom.

The 95% confidence interval for the difference is

$$ (\bar{x}_{wk1} - \bar{x}_{wk2}) \pm t \sqrt {d_1 + d_2 } $$

where t is the $ 1- \alpha / 2 $ quantile of the t-distribution with df degrees of freedom.

Comparison of the trimmed means of paired samples

These calculations are based on the method given by Wilcox, 2022.

The square standard error of the difference between the means $ \bar{x}_{wk1} - \bar{x}_{wk2} $ is estimated with: $$ \frac {1}{h(h-1)} \left\{ \sum { (x_{1i}-\bar{x}_{wk1})^2 } + \sum { (x_{2i}-\bar{x}_{wk2})^2 } - 2 \sum { (x_{1i}-\bar{x}_{wk1})({x_{2i}-\bar{x}_{wk2}) } } \right\} $$

where $ h = n-2k$ is the number of observations left after trimming.

Letting

$$ d_j = \frac {1}{h(h-1)} \sum { (x_{ji}-\bar{x}_{wkj})^2 } $$

and

$$ d_{12} = \frac {1}{h(h-1)} \sum { (x_{1i}-\bar{x}_{wk1})({x_{2i}-\bar{x}_{wk2}) } } $$

The test statistic T is given by $$ T = \frac{ \bar{x}_{wk1} - \bar{x}_{wk2} } { \sqrt {d_1 + d_2 - 2d_{12}} } $$

The P-value is taken from the t-distribution with h−1 degrees of freedom.

The 95% confidence interval for the difference is

$$ (\bar{x}_{wk1} - \bar{x}_{wk2}) \pm t \sqrt {d_1 + d_2 - 2d_{12}} $$

where t is the $ 1- \alpha / 2 $ quantile of the t-distribution with h−1 degrees of freedom.

References

Tukey JM, McLaughlin DH (1963) Less Vulnerable Confidence and Significance Procedures for Location Based on a Single Sample: Trimming/Winsorization 1. Sankhya A, 25:331–352.
Wilcox RR (2022) Introduction to robust estimation and hypothesis testing. 5^th ed. Elsevier Academic Press.
Yuen KK (1974) The two-sample trimmed t for unequal population variances. Biometrika 61:165-170.