Regression analysis is a statistical method used to describe the relationship between two variables and to predict one variable from another (if you know one variable, then how well can you predict a second variable?).
Whereas for correlation the two variables need to have a Normal distribution, in regression analysis only the dependent variable Y should have a Normal distribution. The variable X does not need to be a random sample with a Normal distribution (the values for X can be chosen by the experimenter). However, the variability of Y should be the same for each value of X.
When you select Regression in the menu, the following box appears on the screen:
In this dialog box you identity 2 variables. If you want to select the variables from the variables list, click the button and now you can select the variable in the list. Next, you move the cursor to the Independent X field, and again you select the button to select the variable in the list.
Optionally, you may also enter selection criteria in order to include only a selected subgroup of cases in the statistical analysis. Again, you can select the button to obtain a list of selection criteria already used for the current data.
Finally, a regression equation (regression model, equation of approximating curve) has to be selected. The program offers a choice of 5 different equations:
where X represents the independent variable and Y the dependent variable. The coefficients a, b and c are calculated by the program using the method of least squares.
The following statistics will be displayed in the results window:
Sample size: the number of data pairs n
Coefficient of determination R2: this is the proportion of the variation in the dependent variable explained by the regression model, and is a measure of the goodness of fit of the model. It can range from 0 to 1, and is calculated as follows:
where Y are the observed values for the dependent variable, is the average of the observed values and Yest are predicted values for the dependent variable (the predicted values are calculated using the regression equation).
Residual standard deviation: the standard deviation of the residuals (residuals = differences between observed and predicted values). It is calculated as follows:
The residual standard deviation is sometimes called the Standard error of estimate (Spiegel, 1961).
The equation of the regression curve: the selected equation with the calculated values for a and b (and for a parabola a third coefficient c). E.g. Y = a + b X
Next, the standard errors are given for the intercept (a) and the slope (b), followed by the t-value and the P-value for the hypothesis that these coefficients are equal to 0. If the P-values are low (e.g. less than 0.05), then you can conclude that the coefficients are different from 0.
Note that when you use the regression equation for prediction, you may only apply it to values in the range of the actual observations. E.g. when you have calculated the regression equation for height and weight for school children, this equation cannot be applied to adults.
Analysis of variance: the analysis of variance table divides the total variation in the dependent variable into two components, one which can be attributed to the regression model (labeled Regression) and one which cannot (labelled Residual). If the significance level for the F-test is small (less than 0.05), then the hypothesis that there is no (linear) relationship can be rejected.
Presentation of results
If the analysis shows that the relationship between the two variables is too weak to be of practical help, then there is little point in quoting the equation of the fitted line or curve. If you give the equation, you also report the standard error of the slope, together with the corresponding P-value. Also the residual standard deviation should be reported (Altman, 1980). The number of decimal places of the regression coefficients should correspond to the precision of the raw data.
The accompanying scatter diagram should include the fitted regression line when this is appropriate. This figure can also include the 95% confidence interval, or the 95% prediction interval, which can be more informative, or both. The legend of the figure must clearly identify the interval that is represented.