Correlation coefficient
Command: | Statistics Correlation Correlation coefficient |
Description
Correlation analysis is used to determine whether the values of two variables are associated. The two variables should be random samples, and should have a Normal distribution (possibly after transformation).
Required input
This box has to be completed in a way similar to the box for summary statistics, but now 2 variables must be selected. If you want to select the variables from the variables list, click the button, and select the variable in the list that is displayed. Next, you move the cursor to the Variable X field, and again you click the button to select the variable in the list.
Finally, you can select a logarithmic transformation for one or both variable(s) to obtain Normal distributions. See Logarithmic transformation.
After you click OK you obtain the requested statistics in the results window:
Results
Variable Y | WEIGHT |
---|---|
Variable X | LENGTH |
Sample size | 100 |
---|---|
Correlation coefficient r | 0.4459 |
Significance level | P<0.0001 |
95% Confidence interval for r | 0.2734 to 0.5906 |
Scatter diagram |
Sample size: the number of data pairs n
Pearson's correlation coefficient r with P-value. The Pearson correlation coefficient is a number between -1 and 1. In general, the correlation expresses the degree that, on an average, two variables change correspondingly.
If one variable increases when the second one increases, then there is a positive correlation. In this case the correlation coefficient will be closer to 1. For instance the height and age of children are positively correlated.
If one variable decreases when the other variable increases, then there is a negative correlation and the correlation coefficient will be closer to -1.
The P-value is the probability that you would have found the current result if the correlation coefficient were in fact zero (null hypothesis). If this probability is lower than the conventional 5% (P<0.05) the correlation coefficient is called statistically significant.
It is, however, important not to confuse correlation with causation. When two variables are correlated, there may or may not be a causative connection, and this connection may moreover be indirect. Correlation can only be interpreted in terms of causation if the variables under investigation provide a logical (biological) basis for such interpretation.
95% confidence interval (CI) for the Pearson correlation coefficient: this is the range of values that contains with a 95% confidence the 'true' correlation coefficient.
Presentation of results
The number of data pairs (sample size) should be reported, the correlation coefficient (two decimal places), together with the P-value and the 95% confidence interval: the correlation coefficient was 0.45 (P<0.0001, 95% CI 0.27 to 0.59).
The relationship between two variables can easily be represented graphically by a scatter diagram.
Literature
- Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research. 4th ed. Blackwell Science.
- Bland M (2000) An introduction to medical statistics, 3rd ed. Oxford: Oxford University Press.
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.