Scatter diagram & regression line

Command:    

Statistics
Next selectRegression
Next selectScatter diagram & regression line

Description

In a scatter diagram, the relation between two numerical variables is presented graphically. One variable (the independent variable X) defines the horizontal axis and the other (dependent variable Y) defines the vertical axis. The values of the two variables on the same row in the data spreadsheet, give the points in the diagram.

Required input

The dialog box for the scatter diagram is similar to the one for Regression:

Dialog box for scatter diagram with regression line

Variables

Select the 2 variables to be represented in the graph. Optionally, you may also enter a data filter in order to include only a selected subgroup of cases in the statistical analysis.

Regression equation

By default the option Include constant in equation is selected. This is the recommended option that will result in ordinary least-squares regression. When you need regression through the origin (no constant a in the equation), you can uncheck this option (an example of when this is appropriate is given in Eisenhauer, 2003).

MedCalc offers a choice of 5 different regression equations (X represents the independent variable and Y the dependent variable):

Y    =   a    +    b    X    straight line
Y    =   a    +    b    Log(X)    logarithmic curve
Log(Y)    =   a    +    b    X    exponential curve
Log(Y)    =   a    +    b    Log(X)    geometric curve
Y    =   a    +    b    X    +    c  X²    parabola

When you select an equation that contains a Logarithmic transformation for one of the variables, the program will use a logarithmic scale for the corresponding variable.

Options

  • 95% Confidence: two curves will be drawn parallel to the regression line. These curves represent a 95% confidence interval for the regression line. This interval includes the true regression line with 95% probability.
  • 95% Prediction: two curves will be drawn parallel to the regression lines. These curves represent the 95% prediction interval for the regression curve. The 95% prediction interval is much wider than the 95% confidence interval. For any given value of the independent variable, this interval represents the 95% probability for the values of the dependent variable.
  • Line of equality: option to draw a line of equality (y=x) line in the graph.

Residuals

In regression analysis, residuals are the differences between the predicted values and the observed values for the dependent variable. The residual plot allows the visual evaluation of the goodness of fit of the selected model.

To obtain a residuals plot, select this option in the dialog box. This graph will be displayed in a second window.

Subgroups

Click the Subgroups button if you want to identify subgroups in the scatter diagram. A new dialog box is displayed in which you can select a categorical variable. The graph will use different markers for the different categories in this variable, and optionally will show regression lines for all cases and for each subgroup.

Examples


Scatter diagram with regression line


Regression line and 95% confidence interval


Regression line and 95% prediction interval


Regression line, 95% confidence interval and 95% prediction interval

 

When you click a point on the regression line, the program will give the x-value and the f(x) value calculated using the regression equation.

You can press Ctrl+P to print the scatter diagram, or function key F10 to save the picture as file on disk. To define other titles or colors in the graph, or change the axis scaling, see Format graph.

If you want to repeat the scatter diagram, possibly to select a different regression equation, then you only have to press function key F7. The dialog box will re-appear with the previous entries (see F7 - Repeat key).

Extrapolation

MedCalc does only show the regression line in the range of observed values. As a rule, it is not recommended to extrapolate the regression line beyond the observed range. For particular applications however, such as evaluation of stability data, extrapolation may be useful, see for example the ICH guideline Evaluation of Stability Data (PDF).

To allow extrapolation, right-click in the graph and select Allow extrapolation in the popup menu.

Residuals plot

When you select the option Residuals plot in the Regression line dialog box, the program will display a second window with the residuals plot. Residuals are the differences between the predicted values and the observed values for the dependent variable. The residual plot allows for the visual evaluation of the goodness of fit of the selected model or equation. Residuals may point to possible outliers (unusual values) in the data or problems with the regression model. If the residuals display a certain pattern, you should consider to select a different regression model.

Literature

  • Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
  • Eisenhauer JG (2003) Regression through the origin. Teaching Statistics 25:76-80.

See also

Privacy Contact Site map