πŸ“Š Violins: Plotting Groups and Density

Qual Variables
Quant Variables
Bar Charts
Column Charts
Histograms
Density Plots
Box Plots
Violin Plots
Author

Arvind V.

Published

November 15, 2022

Modified

June 26, 2024

Abstract
Quant and Qual Variable Graphs and their Siblings

Slides and Tutorials

R (Static Viz)   Radiant Tutorial  Datasets

Setting up R Packages

options(paged.print = TRUE)
library(tidyverse)
library(mosaic)
library(ggformula)

#install.packages("remotes")
#library(remotes)
#remotes::install_github("wilkelab/ggridges")
library(ggridges)
library(skimr)

library(palmerpenguins) # Our new favourite dataset

What graphs will we see today?

Variable #1 Variable #2 Chart Names Chart Shape
Quant (Qual) Violin Plot

What kind of Data Variables will we choose?

No Pronoun Answer Variable/Scale Example What Operations?
1 How Many / Much / Heavy? Few? Seldom? Often? When? Quantities, with Scale and a Zero Value.Differences and Ratios /Products are meaningful. Quantitative/Ratio Length,Height,Temperature in Kelvin,Activity,Dose Amount,Reaction Rate,Flow Rate,Concentration,Pulse,Survival Rate Correlation

Inspiration

Figure 1: Golf Drive Distance over the years

Violin Plots

Often one needs to view multiple densities at the same time. Ridge plots of course give us one option, where we get densities of a Quant variable split by a Qual variable. Another option is to generate a density plot facetted into small multiples using a Qual variable.

Yet another plot that allows comparison of multiple densities side by side is a violin plot. The violin plot combines the aspects of a boxplot(ranking of values, median, quantiles…) with a superimposed density plot. This allows us to look at medians, means, densities, and quantiles of a Quant variable with respect to another Qual variable. Let us see what this looks like!

Business Insights from diamond Violin Plots

The distribution for price is clearly long-tailed (skewed). The distributions also vary considerably based on both cut and clarity. These Qual variables clearly have a large effect on the prices of individual diamonds.

Z-scores

Often when we compute wish to compare distributions with different values for means and standard deviations, we resort to a scaling of the variables that are plotted in the respective distributions.

Although the densities all look the same, they are are quite different! The x-axis in each case has two scales: one is the actual value of the x-variable, and the other is the z-score which is calculated as:

\[ z_x = \frac{x - \mu_{x}}{\sigma_x} \]

With similar distributions (i.e. normal distributions), we see that the variation in density is the same at the same values of z-score for each variable. However since the \(\mu_i\) and \(\sigma_i\) are different, the absolute value of the z-score is different for each variable. In the first plot (from the top left), \(z = 1\) corresponds to an absolute change of \(5\) units; it is \(15\) units in the plot directly below it.

Our comparisons are done easily when we compare differences in probabilities at identical z-scores, or differences in z-scores at identical probabilities.

Conclusion

  • Histograms, Frequency Distributions, and Box Plots are used for Quantitative data variables
  • Histograms β€œdwell upon” counts, ranges, means and standard deviations
  • Frequency Density plots β€œdwell upon” probabilities and densities
  • Box Plots β€œdwell upon” medians and Quartiles
  • Qualitative data variables can be plotted as counts, using Bar Charts, or using Heat Maps
  • Violin Plots help us to visualize multiple distributions at the same time, as when we split a Quant variable wrt to the levels of a Qual variable.
  • Ridge Plots are density plots used for describing one Quant and one Qual variable (by inherent splitting)
  • We can split all these plots on the basis of another Qualitative variable.(Ridge Plots are already split)
  • Long tailed distributions need care in visualization and in inference making!

Your Turn

Datasets

  1. Click on the Dataset Icon above, and unzip that archive. Try to make distribution plots with each of the three tools.
  2. A dataset from calmcode.io https://calmcode.io/datasets.html
  3. Old Faithful Data in R (Find it!)

inspect the dataset in each case and develop a set of Questions, that can be answered by appropriate stat measures, or by using a chart to show the distribution.

References

  1. See the scrolly animation for a histogram at this website: Exploring Histograms, an essay by Aran Lunzer and Amelia McNamara https://tinlizzie.org/histograms/?s=09
  2. Minimal R using mosaic.https://cran.r-project.org/web/packages/mosaic/vignettes/MinimalRgg.pdf
  3. Sebastian Sauer, Plotting multiple plots using purrr::map and ggplot
R Package Citations
Package Version Citation
ggridges 0.5.6 Wilke (2024)
NHANES 2.1.0 Pruim (2015)
TeachHist 0.2.1 Lange (2023)
TeachingDemos 2.13 Snow (2024)
visualize 4.5.0 Balamuta (2023)
Balamuta, James. 2023. visualize: Graph Probability Distributions with User Supplied Parameters and Statistics. https://CRAN.R-project.org/package=visualize.
Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://CRAN.R-project.org/package=TeachHist.
Pruim, Randall. 2015. NHANES: Data from the US National Health and Nutrition Examination Study. https://CRAN.R-project.org/package=NHANES.
Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://CRAN.R-project.org/package=TeachingDemos.
Wilke, Claus O. 2024. ggridges: Ridgeline Plots in β€œggplot2”. https://CRAN.R-project.org/package=ggridges.
Back to top

Citation

BibTeX citation:
@online{v.2022,
  author = {V., Arvind},
  title = {πŸ“Š {Violins:} {Plotting} {Groups} and {Density}},
  date = {2022-11-15},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/28-Violins/},
  langid = {en},
  abstract = {Quant and Qual Variable Graphs and their Siblings}
}
For attribution, please cite this work as:
V., Arvind. 2022. β€œπŸ“Š Violins: Plotting Groups and Density.” November 15, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/28-Violins/.