π Violins: Plotting Groups and Density
Slides and Tutorials
R (Static Viz) | Radiant Tutorial | Datasets |
Setting up R Packages
What graphs will we see today?
Variable #1 | Variable #2 | Chart Names | Chart Shape | |
---|---|---|---|---|
Quant | (Qual) | Violin Plot |
What kind of Data Variables will we choose?
No | Pronoun | Answer | Variable/Scale | Example | What Operations? |
---|---|---|---|---|---|
1 | How Many / Much / Heavy? Few? Seldom? Often? When? | Quantities, with Scale and a Zero Value.Differences and Ratios /Products are meaningful. | Quantitative/Ratio | Length,Height,Temperature in Kelvin,Activity,Dose Amount,Reaction Rate,Flow Rate,Concentration,Pulse,Survival Rate | Correlation |
Inspiration
Which is the plots above is more evocative of the underlying data?
Violin Plots
Often one needs to view multiple densities at the same time. Ridge plots of course give us one option, where we get densities of a Quant variable split by a Qual variable. Another option is to generate a density plot facetted into small multiples using a Qual variable.
Yet another plot that allows comparison of multiple densities side by side is a violin plot. The violin plot combines the aspects of a boxplot(ranking of values, median, quantilesβ¦) with a superimposed density plot. This allows us to look at medians, means, densities, and quantiles of a Quant variable with respect to another Qual variable. Let us see what this looks like!
## Set graph theme
theme_set(new = theme_custom())
##
gf_violin(price ~ "All Diamonds", data = diamonds,
draw_quantiles = c(0,.25,.50,.75)) %>%
gf_labs(title = "Plot A: Violin plot for Diamond Prices")
###
diamonds %>%
gf_violin(price ~ cut,
draw_quantiles = c(0,.25,.50,.75)) %>%
gf_labs(title = "Plot B: Price by Cut")
###
diamonds %>%
gf_violin(price ~ cut,
fill = ~ cut,
color = ~ cut,
alpha = 0.3,
draw_quantiles = c(0,.25,.50,.75)) %>%
gf_labs(title = "Plot C: Price by Cut")
###
diamonds %>%
gf_violin(price ~ cut,
fill = ~ cut,
colour = ~ cut,
alpha = 0.3,draw_quantiles = c(0,.25,.50,.75)) %>%
gf_facet_wrap(vars(clarity)) %>%
gf_labs(title = "Plot D: Price by Cut facetted by Clarity") %>%
gf_theme(theme(axis.text.x = element_text(angle = 45,hjust = 1)))
## Set graph theme
theme_set(new = theme_custom())
##
diamonds %>% ggplot() +
geom_violin(aes(y = price, x = ""),
draw_quantiles = c(0,.25,.50,.75)) + # note: y, not x
labs(title = "Plot A: violin for Diamond Prices")
###
diamonds %>% ggplot() +
geom_violin(aes(cut, price),
draw_quantiles = c(0,.25,.50,.75)) +
labs(title = "Plot B: Price by Cut")
###
diamonds %>% ggplot() +
geom_violin(aes(cut, price,
color = cut, fill = cut),
draw_quantiles = c(0,.25,.50,.75),
alpha = 0.4) +
labs(title = "Plot C: Price by Cut")
###
diamonds %>% ggplot() +
geom_violin(aes(cut,
price,
color = cut, fill = cut),
draw_quantiles = c(0,.25,.50,.75),
alpha = 0.4) +
facet_wrap(vars(clarity)) +
labs(title = "Plot D: Price by Cut facetted by Clarity") +
theme(axis.text.x = element_text(angle = 45,hjust = 1))
diamond
Violin Plots
The distribution for price is clearly long-tailed (skewed). The distributions also vary considerably based on both cut
and clarity
. These Qual variables clearly have a large effect on the prices of individual diamonds.
Conclusion
- Histograms, Frequency Distributions, and Box Plots are used for Quantitative data variables
- Histograms βdwell uponβ counts, ranges, means and standard deviations
- Frequency Density plots βdwell uponβ probabilities and densities
- Box Plots βdwell uponβ medians and Quartiles
- Qualitative data variables can be plotted as counts, using Bar Charts, or using Heat Maps
- Violin Plots help us to visualize multiple distributions at the same time, as when we split a Quant variable wrt to the levels of a Qual variable.
- Ridge Plots are density plots used for describing one Quant and one Qual variable (by inherent splitting)
- We can split all these plots on the basis of another Qualitative variable.(Ridge Plots are already split)
- Long tailed distributions need care in visualization and in inference making!
Your Turn
- Click on the Dataset Icon above, and unzip that archive. Try to make distribution plots with each of the three tools.
- A dataset from calmcode.io https://calmcode.io/datasets.html
- Old Faithful Data in R (Find it!)
inspect
the dataset in each case and develop a set of Questions, that can be answered by appropriate stat measures, or by using a chart to show the distribution.
References
- See the scrolly animation for a histogram at this website: Exploring Histograms, an essay by Aran Lunzer and Amelia McNamara https://tinlizzie.org/histograms/?s=09
- Minimal R using
mosaic
.https://cran.r-project.org/web/packages/mosaic/vignettes/MinimalRgg.pdf
- Sebastian Sauer, Plotting multiple plots using purrr::map and ggplot
R Package Citations
Citation
@online{v.2022,
author = {V., Arvind},
title = {π {Violins:} {Plotting} {Groups} and {Density}},
date = {2022-11-15},
url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/28-Violins/},
langid = {en},
abstract = {Quant and Qual Variable Graphs and their Siblings}
}