Construction of a Box-and-Whisker plot
Box-and-whisker plot
A Box‑and‑Whisker plot (Tukey, 1977) is constructed as follows:
- a box is drawn from the 1st to 3rd quartile (the 25thand 75th percentiles)
- a horizontal line is drawn at the median (the 50th percentile)
- the Interquartile range (IQR) is calculated: IQR = 3rd − 1st quartile
- an imaginary line is drawn at the 3rd quartile + 1.5 × IQR; this is the Upper Inner fence
- the highest value (observation, measurement) just below the upper inner fence is the upper adjacent value; a horizontal line is drawn at this value;
- a vertical line is drawn from the 3rd quartile to the upper adjacent value
- an imaginary line is drawn at the 3rd quartile + 3 × IQR; this is the Upper Outer fence
- all values higher than the upper inner fence are always represented in the graph
- a value higher than the upper outer fence is called a Far out value (these are drawn using a different symbol)
- a value higher than the upper inner fence but not higher than the upper outer fence, is called an Outside value
- similar lines are drawn at the lower side of the plot
Note that John Tukey did not use the term 'outlier' for 'outside' and 'far out' values.
Notched Box-and-Whisker plot
A notched Box‑and‑Whisker plot (McGill et al., 1978) is constructed in the same way as a Box-and-Whisker plot (described above), but in this variation of the box-and-whisker plot a confidence interval(*) for the median is provided by means of notches surrounding the medians.
(The illustration does not show all the details of the regular Box-and-Whisker plot)
The notches surrounding the medians provide a measure of the rough significance of differences between the values. Specifically, if the notches about two medians do not overlap in the display, the medians are, roughly, significantly different at about a 95% confidence level (McGill et al., 1978).
In the following example, there is probably no significant difference between the medians in the two samples because the notches overlap.
MedCalc calculates the notches according to McGill et al. (1978), as follows:
where IQR is the Interquartile range and N is the number of cases in the sample.
(*) Important: this confidence interval is not a 95% confidence interval of the median. It is a confidence interval that allows comparison of the medians.
Literature
- McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. The American Statistician, 32, 12-16.
- Tukey JW (1977) Exploratory data analysis. Reading, Mass: Addison-Wesley Publishing Company.
MedCalc procedures that offer Box-and-Whisker plots
- Box-and-Whisker plot: Box-and-Whisker plot for one variable (no Notched Box-and-Whisker plot)
- Data comparison graphs: Box-and-Whisker plots for two variables
- Multiple comparison graphs: Box-and-Whisker plots for subgroups (one-way classification) of one variable
- Clustered multiple comparison graphs: Box-and-Whisker plots for subgroups (two-way classification) of one variable
- Multiple variables graphs: Box-and-Whisker plots for several variables
- Clustered multiple variables graphs: Box-and-Whisker plots for subgroups of several variables
See also
Recommended book
Exploratory Data Analysis
John W. Tukey
Buy from Amazon