# Violin plots

The violin plot (Hintze & Nelson 1998), combines the box-and-whisker plot (Tukey, 1977) and a data density trace into one diagram. The density trace supplements traditional summary statistics by graphically showing more detailed distributional characteristics of the data. MedCalc smooths the density trace using a kernel density estimator.

The illustration shows, at the left side, the distribution of the data as dots, and at the right side the corresponding violin plot. Inside the violin plot a small box-and-whisker diagram is drawn.

The violin plot is wider in sections where there are more data and narrower in sections where there is less data.

Like box plots, violin plots nicely illustrate differences between distributions of variables, or between subcategories of one or more variables; but only when the ranges of the distributions are not too different.

Example of violin plot in MedCalc:

Notice that the surface area of both violins is the same (see the mathematical details below).

## Formatting the violin plot

To change the line color and fill style of the violins, right-click on a violin and select "Format Violin plot":

You can select the colors and styles in the following dialog box:

## Violin maximum width

In principle, the surface area of each violin within the same graph should be the same (see the mathematical details below). By selecting the option "Make all same width", each violin is enlarged to the maximum width possible in the graph.

With the option "Make all same width", the resulting graph is:

This may be aesthetically more pleasing in some cases. But this **could be misleading** because, looking at the example below, the figure now suggests that at the maximum width of each violin, the number of cases (more correctly, the density) is the same in both samples, but that is not the case because the width of the violin at the left side is larger than it should be.

Note that this example was generated using two samples with equal sample size.

## Mathematical details

The kernel density estimator is defined as follows:

where ** K** is the kernel function:

and the parameter ** h** is a smoothing parameter, which in MedCalc is defined as:

where $\hat{\sigma}$ is the standard deviation of the sample, *n* is the sample size, and *IQR* is the interquartile range.

Note that the area under the kernel density curve for each "violin" is 1, and therefore the surface area of each violin is the same, independent of sample size.

## Literature

- Hintze JL, Nelson RD (1998) Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician 52:181-184.
- Tukey JW (1977) Exploratory data analysis. Reading, Mass: Addison-Wesley Publishing Company.

## MedCalc procedures that offer Violin plots

- Data comparison graphs: Violin plots for two variables
- Multiple comparison graphs: Violin plots for subgroups (one-way classification) of one variable
- Clustered multiple comparison graphs: Violin plots for subgroups (two-way classification) of one variable
- Multiple variables graphs: Violin plots for several variables
- Clustered multiple variables graphs: Violin plots for subgroups of several variables

To obtain a violin plot with the Box-and-Whisker diagram inside it, you have to select both options in the graph's dialog box.

## See also

## Recommended book

## Exploratory Data Analysis

John W. Tukey

Buy from Amazon