# Correlation coefficient

 Command: StatisticsCorrelationCorrelation coefficient

## Description

Correlation analysis is used to determine whether the values of two variables are associated. The two variables should be random samples, and should have a Normal distribution (possibly after transformation).

## Required input

When you select Correlation in the menu, the following box appears on the screen:

This box has to be completed in a way similar to the box for summary statistics, but now 2 variables must be selected. If you want to select the variables from the variables list, click the button, and select the variable in the list that is displayed. Next, you move the cursor to the Variable X field, and again you click the button to select the variable in the list.

Finally, you can select a logarithmic transformation for one or both variable(s) to obtain Normal distributions. See Logarithmic transformation.

After you click OK you obtain the requested statistics in the results window:

## Results

Sample size: the number of data pairs n

Pearson's correlation coefficient r with P-value. The correlation coefficient is a number between -1 and 1. In general, the correlation expresses the degree that, on an average, two variables change correspondingly.

If one variable increases when the second one increases, then there is a positive correlation. In this case the correlation coefficient will be closer to 1. For instance the height and age of children are positively correlated.

If one variable decreases when the other variable increases, then there is a negative correlation and the correlation coefficient will be closer to -1.

The P-value is the probability that you would have found the current result if the correlation coefficient were in fact zero (null hypothesis). If this probability is lower than the conventional 5% (P<0.05) the correlation coefficient is called statistically significant.

It is, however, important not to confuse correlation with causation. When two variables are correlated, there may or may not be a causative connection, and this connection may moreover be indirect. Correlation can only be interpreted in terms of causation if the variables under investigation provide a logical (biological) basis for such interpretation.

95% confidence interval (CI) for the correlation coefficient: this is the range of values that contains with a 95% confidence the 'true' correlation coefficient.

## Presentation of results

The number of data pairs (sample size) should be reported, the correlation coefficient (two decimal places), together with the P-value and the 95% confidence interval: the correlation coefficient was 0.45 (P<0.0001, 95% CI 0.27 to 0.59).

The relationship between two variables can easily be represented graphically by a scatter diagram.

## Literature

• Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research. 4th ed. Blackwell Science. Book info
• Bland M (2000) An introduction to medical statistics, 3rd ed. Oxford: Oxford University Press. Book info
• Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall. Book info