Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Data Viz and Analytics
  3. Descriptive Analytics
  4. Densities
  • Teaching
    • Data Viz and Analytics
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Graphs
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • 🃏 Inference for a Single Mean
        • 🃏 Inference for Two Independent Means
        • 🃏 Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • 🃏 Testing a Single Proportion
        • 🃏 Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • 🕔 Modelling and Predicting Time Series
      • Predictive Modelling
        • 🐉 Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • 📐 Intro to Linear Programming
        • 💭 The Simplex Method - Intuitively
        • 📅 The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Using AI in Analytics
        • Case Studies
          • Demo:Product Packaging and Elderly People
          • Ikea Furniture
          • Movie Profits
          • Gender at the Work Place
          • Heptathlon
          • School Scores
          • Children's Games
          • Valentine’s Day Spending
          • Women Live Longer?
          • Hearing Loss in Children
          • California Transit Payments
          • Seaweed Nutrients
          • Coffee Flavours
          • Legionnaire’s Disease in the USA
          • Antarctic Sea ice
          • William Farr's Observations on Cholera in London
      • TRIZ for Problem Solvers
        • I am Water
        • I am What I yam
        • Birds of Different Feathers
        • I Connect therefore I am
        • I Think, Therefore I am
        • The Art of Parallel Thinking
        • A Year of Metaphoric Thinking
        • TRIZ - Problems and Contradictions
        • TRIZ - The Unreasonable Effectiveness of Available Resources
        • TRIZ - The Ideal Final Result
        • TRIZ - A Contradictory Language
        • TRIZ - The Contradiction Matrix Workflow
        • TRIZ - The Laws of Evolution
        • TRIZ - Substance Field Analysis, and ARIZ
      • Math Models for Creative Coders
        • Maths Basics
          • Vectors
          • Matrix Algebra Whirlwind Tour
          • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
        • Tech
          • Tools and Installation
          • Adding Libraries to p5.js
          • Using Constructor Objects in p5.js
        • Geometry
          • Circles
          • Complex Numbers
          • Fractals
          • Affine Transformation Fractals
          • L-Systems
          • Kolams and Lusona
        • Media
          • Fourier Series
          • Additive Sound Synthesis
          • Making Noise Predictably
          • The Karplus-Strong Guitar Algorithm
        • AI
          • Working with Neural Nets
          • The Perceptron
          • The Multilayer Perceptron
          • MLPs and Backpropagation
          • Gradient Descent
        • Projects
          • Projects
      • Tech for Creative Education
        • 🧭 Using Idyll
        • 🧭 Using Apparatus
        • 🧭 Using g9.js
      • Literary Jukebox: In Short, the World
        • Italy - Dino Buzzati
        • France - Guy de Maupassant
        • Japan - Hisaye Yamamoto
        • Peru - Ventura Garcia Calderon
        • Russia - Maxim Gorky
        • Egypt - Alifa Rifaat
        • Brazil - Clarice Lispector
        • England - V S Pritchett
        • Russia - Ivan Bunin
        • Czechia - Milan Kundera
        • Sweden - Lars Gustaffsson
        • Canada - John Cheever
        • Ireland - William Trevor
        • USA - Raymond Carver
        • Italy - Primo Levi
        • India - Ruth Prawer Jhabvala
        • USA - Carson McCullers
        • Zimbabwe - Petina Gappah
        • India - Bharati Mukherjee
        • USA - Lucia Berlin
        • USA - Grace Paley
        • England - Angela Carter
        • USA - Kurt Vonnegut
        • Spain-Merce Rodoreda
        • Israel - Ruth Calderon
        • Israel - Etgar Keret
    • Posts
    • Blogs and Talks

    On this page

    • Slides and Tutorials
    • Setting up R Packages
    • What graphs will we see today?
    • What kind of Data Variables will we choose?
    • Inspiration
    • What is a “Density Plot”?
    • Case Study-1: penguins dataset
      • Examine the Data
      • Data Dictionary
      • Plotting Densities
      • Ridge Plots
    • Wait, But Why?
    • Conclusion
    • Your Turn
    • R Commands Used Here
    • References
    1. Teaching
    2. Data Viz and Analytics
    3. Descriptive Analytics
    4. Densities

    Densities

    The Hills are Shadows, said Tennyson

    Quant Variables
    Qual Variables
    Density Plots
    Ridge Plots
    Author

    Arvind V.

    Published

    June 22, 2024

    Modified

    July 6, 2025

    Abstract
    Quant and Qual Variable Graphs and their Siblings
    WebR Status

    Installing package 3 out of 8: dplyr

    Slides and Tutorials

    R (Static Viz)   Radiant Tutorial  Datasets

    “Never let the future disturb you. You will meet it, if you have to, with the same weapons of reason which today arm you against the present.”

    — Marcus Aurelius

    Setting up R Packages

    library(tidyverse)
    library(mosaic)
    library(ggformula)
    
    # install.packages("remotes")
    # library(remotes)
    # remotes::install_github("wilkelab/ggridges")
    library(ggridges)
    library(skimr)
    library(palmerpenguins) # Our new favourite dataset
    ##
    library(tidyplots) # Easily Produced Publication-Ready Plots
    library(tinyplot) # Plots with Base R
    library(tinytable) # Elegant Tables for our data
    
    ## ggplot theme
    library(hrbrthemes)
    hrbrthemes::import_roboto_condensed() # Import Roboto Condensed font for use in charts
    hrbrthemes::update_geom_font_defaults() # Update matching font defaults for text geoms
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme

    What graphs will we see today?

    Variable #1 Variable #2 Chart Names Chart Shape
    Quant None Density plot, Ridge Density Plot

    What kind of Data Variables will we choose?

    No Pronoun Answer Variable/Scale Example What Operations?
    1 How Many / Much / Heavy? Few? Seldom? Often? When? Quantities, with Scale and a Zero Value.Differences and Ratios /Products are meaningful. Quantitative/Ratio Length,Height,Temperature in Kelvin,Activity,Dose Amount,Reaction Rate,Flow Rate,Concentration,Pulse,Survival Rate Correlation

    Inspiration

    April is the cruelest month, said T.S Eliot. But December in Nebraska must be tough.

    What is a “Density Plot”?

    As we saw earlier, Histograms are best to show the distribution of raw Quantitative data, by displaying the number of values that fall within defined ranges, often called buckets or bins.

    Sometimes it is useful to consider a chart where the bucket width shrinks to zero!

    You might imagine a density chart as a histogram where the buckets are infinitesimally small, i.e. zero width. Think of the frequency density as a differentiation (as in calculus) of the histogram. By taking the smallest of steps ∼0, we get a measure of the slope of distribution. This may seem counter-intuitive, but densities have their uses in spotting the ranges in the data where there are more frequent values. In this, they serve a similar purpose as do histograms, but may offer insights not readily apparent with histograms, especially with default bucket widths. The chunkiness that we see in the histograms is removed and gives us a smooth curve showing in which range the data are more frequent.

    Case Study-1: penguins dataset

    We will first look at at a dataset that is directly available in R, the penguins dataset. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

    Examine the Data

    As per our Workflow, we will look at the data using all the three methods we have seen.

    • dplyr
    • skimr
    • mosaic
    • web-r
    glimpse(penguins)
    Rows: 344
    Columns: 8
    $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
    $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
    $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
    $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
    $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
    $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
    $ sex               <fct> male, female, female, NA, female, male, female, male…
    $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
    skim(penguins)
    Data summary
    Name penguins
    Number of rows 344
    Number of columns 8
    _______________________
    Column type frequency:
    factor 3
    numeric 5
    ________________________
    Group variables None

    Variable type: factor

    skim_variable n_missing complete_rate ordered n_unique top_counts
    species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
    island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
    sex 11 0.97 FALSE 2 mal: 168, fem: 165

    Variable type: numeric

    skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
    bill_length_mm 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
    bill_depth_mm 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
    flipper_length_mm 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
    body_mass_g 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
    year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇
    inspect(penguins)
    
    categorical variables:  
         name  class levels   n missing
    1 species factor      3 344       0
    2  island factor      3 344       0
    3     sex factor      2 333      11
                                       distribution
    1 Adelie (44.2%), Gentoo (36%) ...             
    2 Biscoe (48.8%), Dream (36%) ...              
    3 male (50.5%), female (49.5%)                 
    
    quantitative variables:  
                   name   class    min       Q1  median     Q3    max       mean
    1    bill_length_mm numeric   32.1   39.225   44.45   48.5   59.6   43.92193
    2     bill_depth_mm numeric   13.1   15.600   17.30   18.7   21.5   17.15117
    3 flipper_length_mm integer  172.0  190.000  197.00  213.0  231.0  200.91520
    4       body_mass_g integer 2700.0 3550.000 4050.00 4750.0 6300.0 4201.75439
    5              year integer 2007.0 2007.000 2008.00 2009.0 2009.0 2008.02907
               sd   n missing
    1   5.4595837 342       2
    2   1.9747932 342       2
    3  14.0617137 342       2
    4 801.9545357 342       2
    5   0.8183559 344       0
    1
    glimpse(penguins)
    1
    skim(penguins)

    Data Dictionary

    NoteQualitative Data
    • sex: male and female penguins
    • island: they have islands to themselves!!
    • species: Three adorable types!
    Figure 1: Penguin Species
    NoteQuantitative Data
    • bill_length_mm: The length of the penguins’ bills
    • bill_depth_mm: See the picture!!
    • flipper_length_mm: Flippers! Penguins have “hands”!!
    • body_mass_gm: Grams? Grams??? Why, these penguins are like human babies!!❤️
    Figure 2: Penguin Features
    NoteBusiness Insights on Examining the penguins dataset
    • This is a smallish dataset (344 rows, 8 columns).
    • There are a few missing values in sex(11 missing entries) and all the Quant variables (2 missing entries each).

    Plotting Densities

    • Using ggformula
    • Using ggplot
    • web-r
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    penguins <- penguins %>% drop_na()
    
    gf_density(~body_mass_g, data = penguins) %>%
      gf_labs(title = "Plot A: Penguin Masses", caption = "ggformula")

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    penguins %>%
      gf_density(~body_mass_g,
        fill = ~species,
        color = "black"
      ) %>%
      gf_refine(scale_color_viridis_d(
        option = "magma",
        aesthetics = c("colour", "fill")
      )) %>%
      gf_labs(
        title = "Plot B: Penguin Body Mass by Species",
        caption = "ggformula"
      )

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    penguins %>%
      gf_density(
        ~body_mass_g,
        fill = ~species,
        color = "black",
        alpha = 0.3
      ) %>%
      gf_facet_wrap(vars(sex)) %>%
      gf_labs(title = "Plot C: Penguin Body Mass by Species and facetted by Sex", caption = "ggformula")

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    penguins %>%
      gf_density(~body_mass_g, fill = ~species, color = "black") %>%
      gf_facet_wrap(vars(sex), scales = "free_y", nrow = 2) %>%
      gf_labs(
        title = "Plot D: Penguin Body Mass by Species and facetted by Sex",
        subtitle = "Free y-scale",
        caption = "ggformula"
      ) %>%
      gf_refine(scale_fill_brewer(palette = "Set1")) %>%
      gf_theme(theme(axis.text.x = element_text(
        angle = 45,
        hjust = 1
      )))

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    ## Remove the rows containing NA (11 rows!)
    penguins <- penguins %>% drop_na()
    
    ggplot(data = penguins) +
      geom_density(aes(x = body_mass_g)) +
      labs(title = "Plot A: Penguin Masses", caption = "ggplot")

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    penguins %>%
      ggplot() +
      geom_density(aes(x = body_mass_g, fill = species),
        alpha = 0.3,
        color = "black"
      ) +
      scale_color_brewer(
        palette = "Set1",
        aesthetics = c("colour", "fill")
      ) +
      labs(
        title = "Plot B: Penguin Body Mass by Species",
        caption = "ggplot"
      )

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    penguins %>% ggplot() +
      geom_density(aes(x = body_mass_g, fill = species),
        color = "black",
        alpha = 0.3
      ) +
      facet_wrap(vars(sex)) +
      labs(title = "Plot C: Penguin Body Mass by Species and facetted by Sex", caption = "ggplot")

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    penguins %>% ggplot() +
      geom_density(aes(x = body_mass_g, fill = species),
        alpha = 0.3,
        color = "black"
      ) +
      facet_wrap(vars(sex), scales = "free_y", nrow = 2) +
      labs(
        title = "Plot D: Penguin Body Mass by Species and facetted by Sex",
        subtitle = "Free y-scale", caption = "ggplot"
      ) +
      scale_fill_brewer(palette = "Set1") +
      theme(theme(axis.text.x = element_text(angle = 45, hjust = 1)))

    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    1
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    NoteBusiness Insights from penguin Densities

    Pretty much similar conclusions as with histograms. Although densities may not be used much in business contexts, they are better than histograms when comparing multiple distributions! So you should use thems!

    Ridge Plots

    Sometimes we may wish to show the distribution/density of a Quant variable, against several levels of a Qual variable. For instance, the prices of different items of furniture, based on the furniture “style” variable. Or the sales of a particular line of products, across different shops or cities. We did this with both histograms and densities, by colouring based on a Qual variable, and by facetting using a Qual variable. There is a third way, using what is called a ridge plot. ggformula support this plot by importing/depending upon the ggridges package. ggridges provides direct support for ridge plots, and can be used as an extension to ggplot2 and ggformula.

    • Using ggformula
    • Using ggplot
    • web-r
    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    gf_density_ridges(drv ~ hwy,
      fill = ~drv,
      alpha = 0.5, # colour saturation
      rel_min_height = 0.005, # separation between plots
      data = mpg
    ) %>%
      gf_refine(
        scale_y_discrete(expand = c(0.01, 0)),
        scale_x_continuous(expand = c(0.01, 0)),
        scale_fill_brewer(
          name = "Drive Type",
          palette = "Spectral"
        )
      ) %>%
      gf_labs(
        title = "Ridge Plot", x = "Highway Mileage",
        y = "Drive Type"
      )

    ggplot2::theme_set(new = theme_classic(base_family = "Roboto Condensed")) # Set consistent graph theme
    
    gf_density_ridges(drv ~ hwy,
      fill = ~drv,
      alpha = 0.5, # colour saturation
      rel_min_height = 0.005, data = mpg
    ) %>%
      gf_refine(
        scale_y_discrete(expand = c(0.01, 0)),
        scale_x_continuous(expand = c(0.01, 0)),
        scale_fill_brewer(
          name = "Drive Type",
          palette = "Spectral"
        )
      ) %>%
      gf_labs(
        title = "Ridge Plot", x = "Highway Mileage",
        y = "Drive Type"
      )

    1
    NoteBusiness Insights from mpg Ridge Plots

    This is another way of visualizing multiple distributions, of a Quant variable at different levels of a Qual variable. We see that the distribution of hwy mileage varies substantially with drv type.

    Wait, But Why?

    • Densities are sometimes easier to compare side by side. That is what Claus Wilke says, at least. Perhaps because they look less “busy” than histograms.
    • Ridge Density Plots are very cool when it comes to comparing the density of a Quant variable as it varies against the levels of a Qual variable, without having to facet or group.
    • It is possible to plot 2D-densities too, for two Quant variables, which give very evocative contour-like plots. Try to do this with the faithful dataset in R.

    Conclusion

    • Histograms and Frequency Distributions are both used for Quantitative data variables
    • Whereas Histograms “dwell upon” counts, ranges, means and standard deviations
    • Frequency Density plots “dwell upon” probabilities and densities
    • Ridge Plots are density plots used for describing one Quant and one Qual variable (by inherent splitting)
    • We can split all these plots on the basis of another Qualitative variable.(Ridge Plots are already split)
    • Long tailed distributions need care in visualization and in inference making!

    Your Turn

    NoteStar Trek Books

    Which would be the Group By variables here? And what would you summarize? With which function?

    NoteMath Anxiety! Hah! Peasants.

    R Commands Used Here

    References

    1. Winston Chang (2024). R Graphics Cookbook. https://r-graphics.org
    2. See the scrolly animation for a histogram at this website: Exploring Histograms, an essay by Aran Lunzer and Amelia McNamara https://tinlizzie.org/histograms/?s=09
    3. Minimal R using mosaic.https://cran.r-project.org/web/packages/mosaic/vignettes/MinimalRgg.pdf
    4. Sebastian Sauer, Plotting multiple plots using purrr::map and ggplot
    R Package Citations
    Package Version Citation
    ggridges 0.5.6 Wilke (2024)
    NHANES 2.1.0 Pruim (2015)
    TeachHist 0.2.1 Lange (2023)
    TeachingDemos 2.13 Snow (2024)
    tidyplots 0.2.2 Engler (2024)
    tinyplot 0.4.1 McDermott, Arel-Bundock, and Zeileis (2025)
    tinytable 0.9.0 Arel-Bundock (2025)
    visualize 4.5.0 Balamuta (2023)
    Arel-Bundock, Vincent. 2025. tinytable: Simple and Configurable Tables in “HTML,” “LaTeX,” “Markdown,” “Word,” “PNG,” “PDF,” and “Typst” Formats. https://doi.org/10.32614/CRAN.package.tinytable.
    Balamuta, James. 2023. visualize: Graph Probability Distributions with User Supplied Parameters and Statistics. https://doi.org/10.32614/CRAN.package.visualize.
    Engler, Jan Broder. 2024. “Tidyplots Empowers Life Scientists with Easy Code-Based Data Visualization.” bioRxiv. https://doi.org/10.1101/2024.11.08.621836.
    Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://doi.org/10.32614/CRAN.package.TeachHist.
    McDermott, Grant, Vincent Arel-Bundock, and Achim Zeileis. 2025. tinyplot: Lightweight Extension of the Base r Graphics System. https://doi.org/10.32614/CRAN.package.tinyplot.
    Pruim, Randall. 2015. NHANES: Data from the US National Health and Nutrition Examination Study. https://doi.org/10.32614/CRAN.package.NHANES.
    Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://doi.org/10.32614/CRAN.package.TeachingDemos.
    Wilke, Claus O. 2024. ggridges: Ridgeline Plots in “ggplot2”. https://doi.org/10.32614/CRAN.package.ggridges.
    Back to top

    Citation

    BibTeX citation:
    @online{v.2024,
      author = {V., Arvind},
      title = {\textless Iconify-Icon Icon=“clarity:bell-Curve-Line”
        Width=“1.2em”
        Height=“1.2em”\textgreater\textless/Iconify-Icon\textgreater{}
        {Densities}},
      date = {2024-06-22},
      url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/26-Densities/},
      langid = {en},
      abstract = {Quant and Qual Variable Graphs and their Siblings}
    }
    
    For attribution, please cite this work as:
    V., Arvind. 2024. “<Iconify-Icon Icon=‘clarity:bell-Curve-Line’ Width=‘1.2em’ Height=‘1.2em’></Iconify-Icon> Densities.” June 22, 2024. https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/26-Densities/.
    Groups
    Groups and Densities

    License: CC BY-SA 2.0

    Website made with ❤️ and Quarto, by Arvind V.

    Hosted by Netlify .