Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Descriptive Analytics
  4. Groups and Densities
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • 🃏 Inference for a Single Mean
        • 🃏 Inference for Two Independent Means
        • 🃏 Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • 🃏 Testing a Single Proportion
        • 🃏 Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • 🕔 Modelling and Predicting Time Series
      • Predictive Modelling
        • 🐉 Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • 📐 Intro to Linear Programming
        • 💭 The Simplex Method - Intuitively
        • 📅 The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children's Games
        • Valentine’s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaire’s Disease in the USA
        • Antarctic Sea ice
        • William Farr's Observations on Cholera in London
    • R for Artists and Managers
      • 🕶 Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Hole…
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy toves…
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sir…which way to the Secret Garden?
      • Lab-10: An Invitation from the Queen…to play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: You’re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • 🐉 Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • 🕔 Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • 🕶 Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • 🧭 Using Idyll
      • 🧭 Using Apparatus
      • 🧭 Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • Slides and Tutorials
  • Setting up R Packages
  • What graphs will we see today?
  • What kind of Data Variables will we choose?
  • Inspiration
  • How do these Chart(s) Work?
  • Case Study-1: diamonds dataset
  • Wait, But Why?
  • Conclusion
  • Your Turn
  • References
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Descriptive Analytics
  4. Groups and Densities

Groups and Densities

Qual Variables
Quant Variables
Box Plots
Violin Plots
Author

Arvind V.

Published

November 15, 2022

Modified

June 19, 2025

Abstract
Quant and Qual Variable Graphs and their Siblings
WebR Status

🟢 Ready!

Slides and Tutorials

R (Static Viz)   Radiant Tutorial  Datasets

“Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can become great.”

— Mark Twain

Setting up R Packages

library(tidyverse)
library(mosaic)
library(ggformula)
library(skimr)

Plot Theme

Show the Code
# https://stackoverflow.com/questions/74491138/ggplot-custom-fonts-not-working-in-quarto

# Chunk options
knitr::opts_chunk$set(
  fig.width = 7,
  fig.asp = 0.618, # Golden Ratio
  # out.width = "80%",
  fig.align = "center"
)
### Ggplot Theme
### https://rpubs.com/mclaire19/ggplot2-custom-themes

theme_custom <- function() {
  font <- "Roboto Condensed" # assign font family up front

  theme_classic(base_size = 14) %+replace% # replace elements we want to change

    theme(
      panel.grid.minor = element_blank(), # strip minor gridlines
      text = element_text(family = font),
      # text elements
      plot.title = element_text( # title
        family = font, # set font family
        # size = 20,               #set font size
        face = "bold", # bold typeface
        hjust = 0, # left align
        # vjust = 2                #raise slightly
        margin = margin(0, 0, 10, 0)
      ), plot.title.position = "plot",
      plot.subtitle = element_text( # subtitle
        family = font, # font family
        # size = 14,                #font size
        hjust = 0,
        margin = margin(2, 0, 5, 0)
      ),
      plot.caption = element_text( # caption
        family = font, # font family
        size = 8, # font size
        hjust = 1
      ), # right align

      axis.title = element_text( # axis titles
        family = font, # font family
        size = 10 # font size
      ),
      axis.text = element_text( # axis text
        family = font, # axis family
        size = 8
      ) # font size
    )
}

# Set graph theme
theme_set(new = theme_custom())
#

What graphs will we see today?

Variable #1 Variable #2 Chart Names Chart Shape
Quant (Qual) Violin Plot

What kind of Data Variables will we choose?

No Pronoun Answer Variable/Scale Example What Operations?
1 How Many / Much / Heavy? Few? Seldom? Often? When? Quantities, with Scale and a Zero Value.Differences and Ratios /Products are meaningful. Quantitative/Ratio Length,Height,Temperature in Kelvin,Activity,Dose Amount,Reaction Rate,Flow Rate,Concentration,Pulse,Survival Rate Correlation

Inspiration

Which is the plots above is more evocative of the underlying data? The one which looks like a combo box-plot + density is probably giving us a greater sense of the spread of the data than the good old box plot.

How do these Chart(s) Work?

Often one needs to view multiple densities at the same time. Ridge plots of course give us one option, where we get densities of a Quant variable split by a Qual variable. Another option is to generate a density plot facetted into small multiples using a Qual variable.

Yet another plot that allows comparison of multiple densities side by side is a violin plot. The violin plot combines the aspects of a boxplot(ranking of values, median, quantiles…) with a superimposed density plot. This allows us to look at medians, means, densities, and quantiles of a Quant variable with respect to another Qual variable. Let us see what this looks like!

Figure 1: Violin Plots for Normal Variables

In Figure 1, the plots show (very artificial!) distributions of a single Quant variable across levels of another Qual variable. At each level of the Qual variable along the X-axis, we have a violin plot showing the density.

Case Study-1: diamonds dataset

  • Using ggformula
  • Using ggplot
  • web-r
## Set graph theme
theme_set(new = theme_custom())
##

gf_violin(price ~ "All Diamonds",
  data = diamonds,
  draw_quantiles = c(0, .25, .50, .75)
) %>%
  gf_labs(title = "Plot A: Violin plot for Diamond Prices")

## Set graph theme
theme_set(new = theme_custom())
##
diamonds %>%
  gf_violin(price ~ cut,
    draw_quantiles = c(0, .25, .50, .75)
  ) %>%
  gf_labs(title = "Plot B: Price by Cut")

## Set graph theme
theme_set(new = theme_custom())
##
diamonds %>%
  gf_violin(price ~ cut,
    fill = ~cut,
    color = ~cut,
    alpha = 0.3,
    draw_quantiles = c(0, .25, .50, .75)
  ) %>%
  gf_labs(title = "Plot C: Price by Cut")

## Set graph theme
theme_set(new = theme_custom())
##
diamonds %>%
  gf_violin(price ~ cut,
    fill = ~cut,
    colour = ~cut,
    alpha = 0.3, draw_quantiles = c(0, .25, .50, .75)
  ) %>%
  gf_facet_wrap(vars(clarity)) %>%
  gf_labs(title = "Plot D: Price by Cut facetted by Clarity") %>%
  gf_theme(theme(axis.text.x = element_text(angle = 45, hjust = 1)))

## Set graph theme
theme_set(new = theme_custom())
##

diamonds %>% ggplot() +
  geom_violin(aes(y = price, x = ""),
    draw_quantiles = c(0, .25, .50, .75)
  ) + # note: y, not x
  labs(title = "Plot A: violin for Diamond Prices")
###
diamonds %>% ggplot() +
  geom_violin(aes(cut, price),
    draw_quantiles = c(0, .25, .50, .75)
  ) +
  labs(title = "Plot B: Price by Cut")
###
diamonds %>% ggplot() +
  geom_violin(
    aes(cut, price,
      color = cut, fill = cut
    ),
    draw_quantiles = c(0, .25, .50, .75),
    alpha = 0.4
  ) +
  labs(title = "Plot C: Price by Cut")
###
diamonds %>% ggplot() +
  geom_violin(
    aes(cut,
      price,
      color = cut, fill = cut
    ),
    draw_quantiles = c(0, .25, .50, .75),
    alpha = 0.4
  ) +
  facet_wrap(vars(clarity)) +
  labs(title = "Plot D: Price by Cut facetted by Clarity") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

1
1
diamonds %>% 
1
diamonds %>% 
1
diamonds %>% 
1
diamonds %>% ggplot() + 
1
diamonds %>% ggplot() + 
1
diamonds %>% ggplot() + 
1
diamonds %>% ggplot() + 
NoteBusiness Insights from diamond Violin Plots

The distribution for price is clearly long-tailed (skewed). The distributions also vary considerably based on both cut and clarity. These Qual variables clearly have a large effect on the prices of individual diamonds.

Wait, But Why?

  • Box plots give us an idea of medians, IQR ranges, and outliers. The shape of the density is not apparent from the box.
  • Densities give us shapes of distributions, but do not provide visual indication of other metrics like means or medians ( at least not without some effort)
  • Violins help us do both!
  • Violins can also be cut in half (since they are symmetric, like Buddhist Prayer Wheels), then placed horizontally, and combined with both a boxplot and a dot-plot to give us raincloud plots that look like this. (Yes, there is code over there, which you can reuse.)

Conclusion

  • Histograms, Frequency Distributions, and Box Plots are used for Quantitative data variables
  • Histograms “dwell upon” counts, ranges, means and standard deviations
  • Frequency Density plots “dwell upon” probabilities and densities
  • Box Plots “dwell upon” medians and Quartiles
  • Qualitative data variables can be plotted as counts, using Bar Charts, or using Heat Maps
  • Violin Plots help us to visualize multiple distributions at the same time, as when we split a Quant variable wrt to the levels of a Qual variable.
  • Ridge Plots are density plots used for describing one Quant and one Qual variable (by inherent splitting)
  • We can split all these plots on the basis of another Qualitative variable.(Ridge Plots are already split)
  • Long tailed distributions need care in visualization and in inference making!

Your Turn

NoteDatasets

Datasets

  1. Click on the Dataset Icon above, and unzip that archive. Try to make distribution plots with each of the three tools.
NoteCalmCode
  1. A dataset from calmcode.io https://calmcode.io/datasets.html
NoteFrom Groups
  1. Datasets from the earlier module on Groups.

inspect the dataset in each case and develop a set of Questions, that can be answered by appropriate stat measures, or by using a chart to show the distribution.

References

  1. Winston Chang (2024). R Graphics Cookbook. https://r-graphics.org

  2. See the scrolly animation for a histogram at this website: Exploring Histograms, an essay by Aran Lunzer and Amelia McNamara https://tinlizzie.org/histograms/?s=09

  3. Minimal R using mosaic.https://cran.r-project.org/web/packages/mosaic/vignettes/MinimalRgg.pdf

  4. Sebastian Sauer, Plotting multiple plots using purrr::map and ggplot

R Package Citations
Package Version Citation
ggnormalviolin 0.2.1 Schneider (2025)
ggridges 0.5.6 Wilke (2024)
NHANES 2.1.0 Pruim (2015)
TeachHist 0.2.1 Lange (2023)
TeachingDemos 2.13 Snow (2024)
visualize 4.5.0 Balamuta (2023)
Balamuta, James. 2023. visualize: Graph Probability Distributions with User Supplied Parameters and Statistics. https://doi.org/10.32614/CRAN.package.visualize.
Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://doi.org/10.32614/CRAN.package.TeachHist.
Pruim, Randall. 2015. NHANES: Data from the US National Health and Nutrition Examination Study. https://doi.org/10.32614/CRAN.package.NHANES.
Schneider, W. Joel. 2025. ggnormalviolin: A “ggplot2” Extension to Make Normal Violin Plots. https://doi.org/10.32614/CRAN.package.ggnormalviolin.
Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://doi.org/10.32614/CRAN.package.TeachingDemos.
Wilke, Claus O. 2024. ggridges: Ridgeline Plots in “ggplot2”. https://doi.org/10.32614/CRAN.package.ggridges.
Back to top

Citation

BibTeX citation:
@online{v.2022,
  author = {V., Arvind},
  title = {\textless Iconify-Icon
    Icon=“material-Symbols:light-Group-Rounded” Width=“1.2em”
    Height=“1.2em”\textgreater\textless/Iconify-Icon\textgreater{}
    {Groups} and {Densities}},
  date = {2022-11-15},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/28-Violins/},
  langid = {en},
  abstract = {Quant and Qual Variable Graphs and their Siblings}
}
For attribution, please cite this work as:
V., Arvind. 2022. “<Iconify-Icon Icon=‘material-Symbols:light-Group-Rounded’ Width=‘1.2em’ Height=‘1.2em’></Iconify-Icon> Groups and Densities.” November 15, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/28-Violins/.
Densities
Change

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .