Regression
DescriptionRegression analysis is a statistical method used to describe the relationship between two variables and to predict one variable from another (if you know one variable, then how well can you predict a second variable?). Whereas for correlation the two variables need to have a Normal distribution, in regression analysis only the dependent variable Y should have a Normal distribution. The variable X does not need to be a random sample with a Normal distribution (the values for X can be chosen by the experimenter). However, the variability of Y should be the same for each value of X. Required inputWhen you select Regression in the menu, the following box appears on the screen:
In this dialog box you identity 2 variables. If you want to select the variables from the variables list, click the Optionally, you may also enter selection criteria in order to include only a selected subgroup of cases in the statistical analysis. Again, you can select the Finally, a regression equation (regression model, equation of approximating curve) has to be selected. The program offers a choice of 5 different equations:
where X represents the independent variable and Y the dependent variable. The coefficients a, b and c are calculated by the program using the method of least squares. ResultsThe following statistics will be displayed in the results window:
Sample size: the number of data pairs n Coefficient of determination R2: this is the proportion of the variation in the dependent variable explained by the regression model, and is a measure of the goodness of fit of the model. It can range from 0 to 1, and is calculated as follows:
where Y are the observed values for the dependent variable, Residual standard deviation: the standard deviation of the residuals (residuals = differences between observed and predicted values). It is calculated as follows:
The residual standard deviation is sometimes called the Standard error of estimate (Spiegel, 1961). The equation of the regression curve: the selected equation with the calculated values for a and b (and for a parabola a third coefficient c). E.g. Y = a + b X Next, the standard errors are given for the intercept (a) and the slope (b), followed by the t-value and the P-value for the hypothesis that these coefficients are equal to 0. If the P-values are low (e.g. less than 0.05), then you can conclude that the coefficients are different from 0. Note that when you use the regression equation for prediction, you may only apply it to values in the range of the actual observations. E.g. when you have calculated the regression equation for height and weight for school children, this equation cannot be applied to adults. Analysis of variance: the analysis of variance table divides the total variation in the dependent variable into two components, one which can be attributed to the regression model (labeled Regression) and one which cannot (labelled Residual). If the significance level for the F-test is small (less than 0.05), then the hypothesis that there is no (linear) relationship can be rejected. Presentation of resultsIf the analysis shows that the relationship between the two variables is too weak to be of practical help, then there is little point in quoting the equation of the fitted line or curve. If you give the equation, you also report the standard error of the slope, together with the corresponding P-value. Also the residual standard deviation should be reported (Altman, 1980). The number of decimal places of the regression coefficients should correspond to the precision of the raw data. The accompanying scatter diagram should include the fitted regression line when this is appropriate. This figure can also include the 95% confidence interval, or the 95% prediction interval, which can be more informative, or both. The legend of the figure must clearly identify the interval that is represented. Literature
See alsoExternal links
|