Transforming data to normality

Most statistical methods (the parametric methods) include the assumption that the sample is drawn from a population where the values have a Normal distribution. One of the first steps of statistical analysis of your data is therefore to check the distribution of the different variables.

In the summary statistics, MedCalc can automatically perform a Chi-squared test or Kolmogorov-Smirnov test to test the assumption that the data are normally distributed. These tests can however only be performed when the sample size is large enough. If the test cannot be performed then symmetry and peakedness of the distribution should be estimated from the histogram. The Normal distribution is symmetrical, not very peaked or very flat-topped. Deviation from the Normal distribution can be estimated from the cumulative frequency plot.

The following graph is the Histogram of data that are not normally distributed, but show positive skewness (skewed to the right).

Histogram of data with positive skewness.

This histogram is typical for distributions that will benefit from a logarithmic transformation.

Next follows the graph for the same data after logarithmic transformation. Transformation was obtained by entering LOG(FSH) instead of FSH in the dialog box.

Data with positive skewness after Log transformation.

Other spreadsheet functions that can be useful for transformation of data to Normality are:

SQRT(var) : square root transformation

SQRT(SQRT(var)) : equivalent to var1/4

var^(1/3) : cube root transformation (^ is the symbol for to the power off)

1/var : reciprocal transformation

The effect of these functions on the distribution of a variable can be evaluated by plotting a cumulative frequency graph.