# Violin plots

The violin plot (Hintze & Nelson 1998), combines the box-and-whisker plot (Tukey, 1977) and a data density trace into one diagram. The density trace supplements traditional summary statistics by graphically showing more detailed distributional characteristics of the data. MedCalc smooths the density trace using a kernel density estimator.

The illustration shows, at the left side, the distribution of the data as dots, and at the right side the corresponding violin plot. Inside the violin plot a small box-and-whisker diagram is drawn.

The violin plot is wider in sections where there are more data and more narrow in sections where there are less data.

Like box plots, violin plots nicely ilustrate differences between distributions of variables, or between subcategories of one or more variables; but only when the ranges of the distributions are not too different.

Example of violin plot in MedCalc:

Notice that the surface area of both violins is the same (see the mathematical details below).

## Formatting the violin plot

To change the line color and fill style of the violins, right-click on a violin and select "Format Violin plot":

You can select the colors and styles in the following dialog box:

## Violin maximum width

In principle, the surface area of each violin within the same graph should be the same (see the mathematical details below). By selecting the option "Make all same width", each violin is enlarged to the maximum width possible in the graph.

With the option "Make all same width", the resulting graph is:

This may be aesthetically more pleasing in some cases. But this could be misleading because, looking at the example below, the figure now suggests that at the maximum width of each violin, the number of cases (more correctly, the density) is the same in both samples, but that is not the case because the width of the violin at the left side is larger than it should be.

Note that this example was generated using two samples with equal sample size.

Use this option with care.

## Mathematical details

The kernel density estimator is defined as follows:

$$\widehat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big)$$

where K is the kernel function:

$$K\Big(\frac{x-x_i}{h}\Big) = \frac{1}{\sqrt{2\pi}} e^{-\frac{(x-x_i)^2}{2h^2}}$$

and the parameter h is a smoothing parameter, which in MedCalc is defined as:

$$h = 0.9\, \min\left(\hat{\sigma}, \frac{IQR}{1.34}\right)\, n^{\frac{-1}{5}}$$

where $\hat{\sigma}$ is the standard deviation of the sample, n is the sample size, and IQR is the interquartile range.

Note that the area under the kernel density curve for each "violin" is 1, and therefore the surface area of each violin is the same, independent of sample size.

## Literature

• Hintze JL, Nelson RD (1998) Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician 52:181-184.
• Tukey JW (1977) Exploratory data analysis. Reading, Mass: Addison-Wesley Publishing Company.

## MedCalc procedures that offer Violin plots

To obtain a violin plot with the Box-and-Whisker diagram inside it, you have to select both options in the graph's dialog box.