 # Scatter diagram

 Command: Statistics Correlation Scatter diagram

## Description

In a scatter diagram, the relation between two numerical variables is presented graphically. One variable (the variable X) defines the horizontal axis and the other (variable Y) defines the vertical axis. The values of the two variables on the same row in the data spreadsheet, give the points in the diagram.

## Required input Variables and Filter

Select the 2 variables. You can click the button to obtain a list of variables. Optionally, you may also enter a data filter in order to include only a selected subgroup of cases in the graph.

Options

• You can select a Logarithmic transformation for both variables (in this case the program will use a logarithmic scale for the corresponding axis in the graph).
• Line of equality: option to draw a line of equality (y=x) line in the graph.
• Heat map: option to display a heatmap, where background color coding indicates density of points (see example below), suggesting clusters of observations.
• Click the Subgroups button if you want to identify subgroups in the scatter diagram. A new dialog box is displayed in which you can select a categorical variable. The graph will use different markers for the different categories in this variable.
• Trend line: option to plot one of the following trend lines:
• Moving average trend line: option to plot a moving average trend line for each variable.
A moving average trendline smooths out fluctuations in data to show a pattern or trend more clearly. A moving average trendline uses a specific number of data points (set by the window width option), averages them, and uses the average value as a point in the trendline. If window width is set to 3, for example, then the average of the first 3 data points is used as the first point in the moving average trendline. The average of the 2nd, 3rd and 4th data points is used as the second point in the trendline, and so on.
• LOESS smoothing: option to plot a LOESS (Local Regression Smoothing) trendline. The degree of smoothing is controlled by the span (%) which is the proportion (expressed as a percentage) of the total number of points that contribute to each local fitted value. Larger values result in smoother trendlines.
• Reduced major axis line: option to show the reduced major axis regression line.
Reduced major axis (RMA) regression is an alternative to ordinary least squares (OLS) regression. In RMA regression measurement errors are taken into account for both the dependent (Y-axis) and independent (X-axis) variables. This is different from traditional OLS regression which takes into account only the error on the Y-axis.
• Isotonic regression curve:
• in case the tendency of the data is increasing: a fitted free-form line that is non-decreasing everywhere, and lies as close to the observations as possible
• in case the tendency of the data is decreasing: a fitted free-form line that is non-increasing everywhere, and lies as close to the observations as possible
The isotonic regression curve is estimated using the pool adjacent violators algorithm.

## Results

After you click OK you obtain the following graph: This is the same scatter diagram, but the categorical variable "Treatment" has been used to identify different subgroups in the graph. Example of scatter diagram with heat map: ## Trend line examples

Moving average trend line: LOESS smoothing: Reduced major axis regression line: Isotonic regression curve: ## Literature

• Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
• Smith RJ (2009) Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology, 140:476-486. 