Skip to main content
MedCalc
Mail a PDF copy of this page to:
(Your email address will not be added to a mailing list)
working
Show menu Show menu

Scatter diagram & regression line

Description

In a scatter diagram, the relation between two numerical variables is presented graphically. One variable (the independent variable X) defines the horizontal axis and the other (dependent variable Y) defines the vertical axis. The values of the two variables on the same row in the data spreadsheet, give the points in the diagram.

Required input

The dialog box for the scatter diagram is similar to the one for Regression:

Dialog box for scatter diagram with regression line

Variables

  • Variable Y and Variable X: select the dependent and independent variables Y and X.
  • Weights: select a variable containing relative weights that should be given to each observation (for weighted least-squares regression). Select the dummy variable "*** AutoWeight 1/SD^2 ***" for an automatic weighted regression procedure to correct for heteroscedasticity (Neter et al., 1996). This dummy variable appears as the first item in the drop-down list for Weights.
  • Filter: you may also enter a data filter in order to include only a selected subgroup of cases in the statistical analysis.

Regression equation

By default the option Include constant in equation is selected. This is the recommended option that will result in ordinary least-squares regression. When you need regression through the origin (no constant a in the equation), you can uncheck this option (an example of when this is appropriate is given in Eisenhauer, 2003).

MedCalc offers a choice of 5 different regression equations (x represents the independent variable and y the dependent variable):

y = a + b xstraight line
y = a + b log(x)logarithmic curve
log(y) = a + b xexponential curve
log(y) = a + b log(x)geometric curve
y = a + b x + c x2quadratic regression (parabola)

When you select an equation that contains a Logarithmic transformation for one of the variables, the program will use a logarithmic scale for the corresponding variable.

Options

  • 95% Confidence: two curves will be drawn next to the regression line. These curves represent a 95% confidence interval for the regression line. This interval includes the true regression line with 95% probability.
  • 95% Prediction: two curves will be drawn next to the regression line. These curves represent the 95% prediction interval for the regression curve. The 95% prediction interval is much wider than the 95% confidence interval. For any given value of the independent variable, this interval represents the 95% probability for the values of the dependent variable.
  • Line of equality: option to draw a line of equality (y=x) line in the graph.
  • Heat map: option to display a heatmap, where background color coding indicates density of points, suggesting clusters of observations.

Residuals

In regression analysis, residuals are the differences between the predicted values and the observed values for the dependent variable. The residual plot allows the visual evaluation of the goodness of fit of the selected model.

To obtain a residuals plot, select this option in the dialog box. This graph will be displayed in a second window.

Subgroups

Click the Subgroups button if you want to identify subgroups in the scatter diagram. A new dialog box is displayed in which you can select a categorical variable. The graph will use different markers for the different categories in this variable, and optionally will show regression lines for all cases and for each subgroup.

Examples

Scatter diagram with regression line
Scatter diagram with regression line

Regression line and 95% confidence interval
Regression line and 95% confidence interval

Regression line and 95% prediction interval
Regression line and 95% prediction interval

Regression line, 95% confidence interval and 95% prediction interval
Regression line, 95% confidence interval and 95% prediction interval

Regression line with heatmap
Regression line and heatmap

 

When you click a point on the regression line, the program will give the x-value and the f(x) value calculated using the regression equation.

Regression line show f(x)

You can press Ctrl P to print the scatter diagram, or function key F10 to save the picture as file on disk. To define other titles or colors in the graph, or change the axis scaling, see Format graph.

If you want to repeat the scatter diagram, possibly to select a different regression equation, then you only have to press function key F7. The dialog box will re-appear with the previous entries (see Recall dialog).

Extrapolation

MedCalc only shows the regression line in the range of observed values. As a rule, it is not recommended to extrapolate the regression line beyond the observed range. For particular applications however, such as evaluation of stability data, extrapolation may be useful, see for example the ICH guideline Evaluation of Stability Data (PDF).

To allow extrapolation, right-click in the graph and click Allow extrapolation on the context menu.

Allow extrapolation

Residuals plot

When you select the option Residuals plot in the Regression line dialog box, the program will display a second window with the residuals plot. Residuals are the differences between the predicted values and the observed values for the dependent variable. The residual plot allows for the visual evaluation of the goodness of fit of the selected model or equation. Residuals may point to possible outliers (unusual values) in the data or problems with the regression model. If the residuals display a certain pattern, you should consider to select a different regression model.

Residuals plot

Literature

  • Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
  • Eisenhauer JG (2003) Regression through the origin. Teaching Statistics 25:76-80.
  • Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models. 4th ed. Boston: McGraw-Hill.

See also