Multiple regression

Command:    

Statistics
Next selectRegression
Next selectMultiple regression

Description

Multiple regression is a statistical method used to examine the relationship between one dependent variable Y and one or more independent variables Xi. The regression parameters or coefficients bi in the regression equation

Regression equation

are estimated using the method of least squares. In this method, the sum of squared residuals between the regression plane and the observed values of the dependent variable are minimized. The regression equation represents a (hyper)plane in a k+1 dimensional space in which k is the number of independent variables X1, X2, X3, ... Xk, plus one dimension for the dependent variable Y.

Required input

The following need to be entered in the Multiple regression dialog box:

In this dialog box you first identify the dependent variable. For the independent variables you will enter the names of variables that you expect to influence the dependent variable.

You can click the Drop-down button button to obtain a list of variables. In this list you can select a variable by clicking the variable's name.

Options

Method: select the way independent variables are entered into the model.

  • Enter: enter all variables in the model in one single step, without checking
  • Forward: enter significant variables sequentially
  • Backward: first enter all variables into the model and next remove the non-significant variables sequentially
  • Stepwise: enter significant variables sequentially; after entering a variable in the model, check and possibly remove variables that became non-significant.

Enter variable if P<

A variable is entered into the model if its associated significance level is less than this P-value.

Remove variable if P>

A variable is removed from the model if its associated significance level is greater than this P-value.

Results

After clicking the OK button, the following results are displayed in the results window:

In the results window, the following statistics are displayed:

Sample size: the number of data records n

Coefficient of determination R2: this is the proportion of the variation in the dependent variable explained by the regression model, and is a measure of the goodness of fit of the model. It can range from 0 to 1, and is calculated as follows:

where Y are the observed values for the dependent variable, is the average of the observed values and Yest are predicted values for the dependent variable (the predicted values are calculated using the regression equation).

R2-adjusted: this is the coefficient of determination adjusted for the number of independent variables in the regression model. Unlike the coefficient of determination, R2-adjusted may decrease if variables are entered in the model that do not add significantly to the model fit.

or

Multiple correlation coefficient: this coefficient is a measure of how tightly the data points cluster around the regression plane, and is calculated by taking the square root of the coefficient of determination.

When discussing multiple regression analysis results, generally the coefficient of multiple determination is used rather than the multiple correlation coefficient.

Residual standard deviation: the standard deviation of the residuals (residuals = differences between observed and predicted values). It is calculated as follows:

The regression equation: the different regression coefficients bi with standard error sbi, rpartial, t-value and P-value.

The partial correlation coefficient rpartial is the coefficient of correlation of the variable with the dependent variable, adjusted for the effect of the other variables in the model.

If P is less than the conventional 0.05, the regression coefficient can be considered to be significantly different from 0, and the corresponding variable contributes significantly to the prediction of the dependent variable.

Analysis of variance: the analysis of variance table divides the total variation in the dependent variable into two components, one which can be attributed to the regression model (labeled Regression) and one which cannot (labeled Residual). If the significance level for the F-test is small (less than 0.05), then the hypothesis that there is no (linear) relationship can be rejected, and the multiple correlation coefficient can be called statistically significant.

Zero order correlation coefficients: these are the simple correlation coefficients for the dependent variable Y and all independent variables Xi separately.

Repeat procedure

If you want to repeat the Multiple regression procedure, possibly to add or remove variables in the model, then you only have to press function key F7. The dialog box will re-appear with the previous entries (see F7 - Repeat key).

Literature

  • Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
  • Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research. 4th ed. Blackwell Science.

See also

Privacy Contact Site map