library(tidyverse) # includes ggplot for plotting
library(mosaic)
library(ggformula)
library(RColorBrewer) # colour palettes
library(ggbump) # Bump Charts
library(ggiraphExtra) # Radar, Spine, Donut and Donut-Pie combo charts !!
library(ggalt) # New geometries, coordinate systems, statistical transformations, scales and fonts
# install.packages("devtools")
# devtools::install_github("ricardo-bion/ggradar")
library(ggradar) # Radar Plots
Ratings and Rankings
Better than All the Rest
“I have no respect for people who deliberately try to be weird to attract attention, but if that’s who you honestly are, you shouldn’t try to”normalize” yourself.”
— Alicia Witt, actress, singer-songwriter, and pianist (b. 21 Aug 1975)
Inspiration
What do we see here? From https://www.visualcapitalist.com/sp/americas-cheapest-sources-of-electricity-in-2024/ :
From Figure 1 (a):
-
Onshore wind power
effectively costs USD0 per megawatt-hour (MWh) when subsidies are included!
- Demand for storage solutions is rising quickly. If storage is included, the minimum cost for
onshore wind
increases to $8 per MWh.
- Solar photovoltaics (PV) have similarly attractive economics. With subsidies, the minimum cost is USD6 per MWh. When including storage, USD38 per MWh. Notably, the maximum cost of solar PV with storage has significantly increased from USD102 in 2023 to USD 210 in 2024.
- For gas-combined cycle plants, which combine natural gas and steam turbines for efficient electricity generation, the maximum price has climbed $7 year-over-year to $108 per MWh.
And from From Figure 1 (b)?
- There is a clear difference in the capabilities of the three players compared, though all of them are classified as “5 tools” players.
- Each player is better than the others at one unique skill: Betts at
Throwing
, Judge atHit_power
, and Trout atHit_avg
.
Setting up R Packages
Plot Theme
Show the Code
# https://stackoverflow.com/questions/74491138/ggplot-custom-fonts-not-working-in-quarto
# Chunk options
knitr::opts_chunk$set(
fig.width = 7,
fig.asp = 0.618, # Golden Ratio
# out.width = "80%",
fig.align = "center"
)
### Ggplot Theme
### https://rpubs.com/mclaire19/ggplot2-custom-themes
theme_custom <- function() {
font <- "Roboto Condensed" # assign font family up front
theme_classic(base_size = 14) %+replace% # replace elements we want to change
theme(
panel.grid.minor = element_blank(), # strip minor gridlines
text = element_text(family = font),
# text elements
plot.title = element_text( # title
family = font, # set font family
# size = 20, #set font size
face = "bold", # bold typeface
hjust = 0, # left align
# vjust = 2 #raise slightly
margin = margin(0, 0, 10, 0)
),
plot.subtitle = element_text( # subtitle
family = font, # font family
# size = 14, #font size
hjust = 0,
margin = margin(2, 0, 5, 0)
),
plot.caption = element_text( # caption
family = font, # font family
size = 8, # font size
hjust = 1
), # right align
axis.title = element_text( # axis titles
family = font, # font family
size = 10
), # font size
axis.text = element_text( # axis text
family = font, # axis family
size = 8
), # font size
legend.text = element_text(
family = font,
size = 8
)
)
}
# Set graph theme
theme_set(new = theme_custom())
What graphs are we going to see today?
When we wish to compare the size of things and rank them, there are quite a few ways to do it.
Bar Charts and Lollipop Charts are immediately obvious when we wish to rank things on one aspect or parameter, e.g. mean income vs education. We can also put two lollipop charts back-to-back to make a Dumbbell Chart to show comparisons/ranks across two datasets based on one aspect, e.g change in mean income over two years, across gender.
When we wish to rank the multiple objects against multiple aspects or parameters, then we can use Bump Charts and Radar Charts, e.g performance of one or more products against multiple criteria (cost, size, performance…)s.
Lollipop Charts
Let’s make a toy dataset of Products and Ratings:
# Sample data set
set.seed(1)
df1 <- tibble(
product = LETTERS[1:10],
rank = sample(20:35, 10, replace = TRUE)
)
df1
# Set graph theme
theme_set(new = theme_custom())
###
gf_segment(0 + rank ~ product + product, data = df1) %>%
# A formula with shape y + yend ~ x + xend.
gf_point(rank ~ product,
colour = ~product,
size = 5,
ylab = "Rank",
xlab = "Product"
)
###
gf_segment(
0 + rank ~ fct_reorder(product, -rank) +
fct_reorder(product, -rank),
data = df1
) %>%
# A formula with shape y + yend ~ x + xend.
gf_point(rank ~ product, colour = ~product, size = 5) %>%
gf_refine(coord_flip()) %>%
gf_labs(x = "Product", y = "Rank")
We have flipped the chart horizontally and reordered the \(x\) categories in order of decreasing ( or increasing ) \(y\), using forcats::fct_reorder
.
# Set graph theme
theme_set(new = theme_custom())
###
ggplot(df1) +
geom_segment(aes(
y = 0, yend = rank,
x = product,
xend = product
)) +
geom_point(aes(y = rank, x = product, colour = product), size = 5) +
labs(title = "Product Ratings", x = "Product", y = "Rank")
###
ggplot(df1) +
geom_segment(aes(
y = 0, yend = rank,
x = fct_reorder(product, -rank),
xend = fct_reorder(product, -rank)
)) +
geom_point(aes(x = product, y = rank, colour = product), size = 5) +
labs(title = "Product Ratings", x = "Product", y = "Rank") +
coord_flip()
Yes, R has ( nearly) everything, including a geom_lollipop
command: Here!
# library(ggalt)
# Set graph theme
theme_set(new = theme_custom())
###
ggplot(df1) +
geom_lollipop(aes(x = rank, y = product),
point.size = 3, horizontal = F
) +
labs(title = "What is this BS chart?")
Warning: Using the `size` aesthetic with geom_segment was deprecated in ggplot2 3.4.0.
ℹ Please use the `linewidth` aesthetic instead.
##
ggplot(df1) +
geom_lollipop(aes(y = rank, x = product),
point.size = 3, , horizontal = F
) +
labs(title = "Yeah, but I want this horizontal...")
##
ggplot(df1) +
geom_lollipop(aes(y = rank, x = product),
point.size = 3, horizontal = T
) +
labs(title = "This also looks like BS")
##
ggplot(df1) +
geom_lollipop(
aes(
x = rank,
y = reorder(product, rank),
colour = product
),
stroke = 2,
point.size = 3, horizontal = T
) +
labs(
title = "Now you're talking",
x = "Rank", y = "Product"
)
- Very simple chart, almost like a bar chart
- Differences between the same set of data across one aspect (i.e. rank) is very quickly apparent
- Ordering the dataset by the attribute (i.e ordering product by rank) makes the message very clear.
- Even a large number of data can safely be visualized and understood
Dumbbell Charts
A lollipop chart compares a set of data against one aspect. What if we have more than one? Say sales in many product lines across two years?
Let us once again construct a very similar looking toy dataset, but with two columns for ratings, one for each of two years:
# Sample data set
# Wide Format data!
set.seed(2)
df2 <- tibble(
product = LETTERS[1:10],
rank_year1 = sample(20:35, 10, replace = TRUE),
rank_year2 = sample(15:45, 10, replace = TRUE)
)
df2
A short diversion: we can also make this data into long form: this will become useful very shortly!
Look at the data: this is wide form data. The columns pertaining to each of the Product-Features would normally be stacked into two columns, one with the Feature and the other with the score. Note the trio: Qual(product) + Qual(year) + Quant(scores):
# With Long Format Data
df2_long <- df2 %>%
pivot_longer(
cols = c(dplyr::starts_with("rank")),
names_to = "year", values_to = "scores"
)
df2_long
A cool visualization of this operation was created by Garrick Aden-Buie:
# Set graph theme
theme_set(new = theme_custom())
## With Wide Form Data
##
df2 %>%
gf_segment(product + product ~ rank_year1 + rank_year2,
size = 3, color = "#e3e2e1",
arrow = arrow(
angle = 30,
length = unit(0.25, "inches"),
ends = "last", type = "open"
)
) %>%
gf_point(product ~ rank_year1,
size = 3,
colour = "#123456"
) %>%
gf_point(product ~ rank_year2,
size = 3,
colour = "#bad744"
) %>%
gf_labs(x = "Rank", y = "Product")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
## Rearranging `product` in order of rank_year2
df2 %>%
gf_segment(
reorder(product, rank_year2) +
reorder(product, rank_year2) ~
rank_year1 + rank_year2,
size = 3, color = "#e3e2e1",
arrow = arrow(
angle = 30,
length = unit(0.25, "inches")
)
) %>%
gf_point(product ~ rank_year1,
size = 3,
colour = "#123456"
) %>%
gf_point(product ~ rank_year2,
size = 3,
colour = "#bad744"
) %>%
gf_labs(
x = "Rank", y = "Product",
title = "In Decreasing order of Year2 Rank"
)
# Set graph theme
theme_set(new = theme_custom())
## With Wide Format Data
ggplot(df2, aes(y = product, yend = product, x = rank_year1, xend = rank_year2)) +
geom_segment(
size = 3, color = "#e3e2e1",
arrow = arrow(
angle = 30,
length = unit(0.25, "inches")
)
) +
geom_point(aes(rank_year1, product),
colour = "#5b8124", size = 3
) +
geom_point(aes(rank_year2, product),
colour = "#bad744", size = 3
) +
labs(x = "Rank", y = "Product")
## Rearranging `product` in order of rank_year2
ggplot(df2, aes(y = reorder(product, rank_year2), yend = reorder(product, rank_year2), x = rank_year1, xend = rank_year2)) +
geom_segment(
size = 3, color = "#e3e2e1",
arrow = arrow(
angle = 30,
length = unit(0.25, "inches")
)
) +
geom_point(aes(rank_year1, product),
colour = "#5b8124", size = 3
) +
geom_point(aes(rank_year2, product),
colour = "#bad744", size = 3
) +
labs(
x = "Rank", y = "Product",
title = "In Decreasing order of Year2 Rank"
)
# Set graph theme
theme_set(new = theme_custom())
df2 %>% ggplot() +
geom_dumbbell(
aes(
y = reorder(product, rank_year2),
x = rank_year1,
xend = rank_year2
),
size = 3, color = "#e3e2e1",
colour_x = "#5b8124",
colour_xend = "#bad744",
dot_guide = TRUE, # Try FALSE
dot_guide_size = 0.25
) +
labs(
x = NULL, y = NULL,
title = "ggplot2 geom_dumbbell with dot guide",
subtitle = "Products in Decreasing order of Year2 Rank",
caption = "Made with ggalt"
) +
# theme_minimal() +
theme(panel.grid.major.x = element_line(size = 0.05)) +
theme(panel.grid.major.y = element_blank())
- Dumbbell Plots are clearly they are more intuitive and clear than the bar chart
- Differences between the same set of data at two different aspects is very quickly apparent
- Differences in differences(DID) are also quite easily apparent. Experiments do use these metrics and these plots would be very useful there.
-
ggalt
works nicely with additional visible guides rendered in the chart
Bump Charts
Bump Charts track the ranking of several objects based on other parameters, such as time/month or even category. For instance, what is the opinion score of a set of products across various categories of users?
year <- rep(2019:2021, 4)
position <- c(4, 2, 2, 3, 1, 4, 2, 3, 1, 1, 4, 3)
product <- c(
"A", "A", "A",
"B", "B", "B",
"C", "C", "C",
"D", "D", "D"
)
df3 <- tibble(year, position, product)
df3
ggbump
uses ggplot
syntax
We need to use a new package called, what else, ggbump
to create our Bump Charts
: Here again we do not yet have a ggformula
equivalent. ( Though it may be possible with a combination of gf_point
and gf_polygon
, and pre-computing the coordinates. Seems long-winded.)
Note the +
syntax with ggplot
code!!
# library(ggbump)
# Set graph theme
theme_set(new = theme_custom())
###
df3 %>%
ggplot() +
geom_bump(aes(x = year, y = position, color = product)) +
geom_point(aes(x = year, y = position, color = product),
size = 6
) +
xlab("Year") +
ylab("Rank") +
scale_color_brewer(palette = "RdBu") + # Change Colour Scale
scale_x_discrete(limits = c(2019, 2020, 2021)) # Check warning here...
We can add labels along the “bump lines” and remove the legend altogether:
# Set graph theme
theme_set(new = theme_custom())
###
ggplot(df3) +
geom_bump(aes(x = year, y = position, color = product)) +
geom_point(aes(x = year, y = position, color = product),
size = 6
) +
scale_color_brewer(palette = "RdBu") + # Change Colour Scale
# Same as before up to here
# Add the labels at start and finish
geom_text(
data = df3 %>% filter(year == min(year)),
aes(
x = year - 0.1, label = product,
y = position
),
size = 5, hjust = 1
) +
geom_text(
data = df3 %>% filter(year == max(year)),
aes(
x = year + 0.1, label = product,
y = position
),
size = 5, hjust = 0
) +
xlab("Year") +
ylab("Rank") +
scale_x_discrete(limits = c(2019:2021)) +
theme(legend.position = "")
- Bump charts are good for depicting Ranks/Scores pertaining to a set of data, as they vary over another aspect, for a set of products
- Cannot have too many levels in the aspect parameter, else the graph gets too hard to make sense with.
- For instance if we had 10 years in the data above, we would have lost the plot, literally! Perhaps better to use a Sankey in that case!!
Radar Charts
What if your marketing folks had rated some products along several different desirable criteria? Such data, where a certain set of items (Qualitative!!) are rated (Quantitative!) against another set (Qualitative again!!) can be plotted on a roughly circular set of axes, with the radial distance defining the rank against each axes. Such a plot is called a radar plot.
Of course, we will use the aptly named ggradar
, which is at this time (Feb 2023) a development version and not yet part of CRAN. We will still try it, and another package ggiraphExtra
which IS a part of CRAN (and has some other capabilities too, which are worth exploring!)
# library(ggradar)
set.seed(4)
df4 <- tibble(
Product = c("G1", "G2", "G3"),
Power = runif(3),
Cost = runif(3),
Harmony = runif(3),
Style = runif(3),
Size = runif(3),
Manufacturability = runif(3),
Durability = runif(3),
Universality = runif(3)
)
df4
ggradar
ggradar::ggradar(
plot.data = df4,
axis.label.size = 3, # Titles of Params
grid.label.size = 4, # Score Values/Circles
group.point.size = 3, # Product Points Sizes
group.line.width = 1, # Product Line Widths
fill = TRUE, # fill the radar polygons
fill.alpha = 0.3, # Not too dark, Arvind
legend.title = "Product"
)
Using ggiraphExtra
From the ggiraphExtra
website:
Package
ggiraphExtra
contains many useful functions for exploratory plots. These functions are made by both ‘ggplot2’ and ‘ggiraph’ packages. You can make a static ggplot or an interactive ggplot by setting the parameter interactive=TRUE.
# library(ggiraphExtra)
ggiraphExtra::ggRadar(
data = df4,
aes(colour = Product),
interactive = FALSE, # try TRUE
rescale = FALSE,
title = "Using ggiraphExtra"
) + # recale = TRUE makes it look different...try!!
theme_minimal()
- Differences in scores for a given item across several aspect or parameters are readily apparent.
- These can also be compared, parameter for parameter, with more than one item
- the same set of data at two different aspects is very quickly apparent
- Data is clearly in wide form
- Both
ggradar
andggiraphExtra
render very similar-looking radar charts and the syntax is not too intimidating!!
Wait, But Why?
- Bump Charts can show changes in Rating and Ranking over time, or some other Qual variable too!
- Lollipop Charts are useful in comparing multiple say products or services, with only one aspect for comparison, or which defines the rank
- Radar Charts are also useful in comparing multiple say products or services, but against several aspects or parameters for simultaneous comparisons.
Conclusion
- These are easy and simple charts to use and are easily understood too
- Bear in mind the data structure requirements for different charts/packages: Wide vs Long.
Your Turn
- Take the
HELPrct
dataset from our well usedmosaicData
package. Plot ranking charts using each of the public health issues that you can see in that dataset. What choice will you make for the the axes? - Try the
SaratogaHouses
dataset also frommosaicData
.
References
Highcharts Blog. Why you need to start using dumbbell charts
https://github.com/hrbrmstr/ggalt#lollipop-chartsSee this use of Radar Charts in Education. Choose the country/countries of choice and plot their ranks on various educational parameters in a radar chart. https://gpseducation.oecd.org/Home
R Package Citations
Citation
@online{v.2023,
author = {V., Arvind},
title = {\textless Iconify-Icon Icon=“ph:ranking-Bold” Width=“1.2em”
Height=“1.2em”\textgreater\textless/Iconify-Icon\textgreater{}
{Ratings} and {Rankings}},
date = {2023-02-10},
url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/80-Ranking/},
langid = {en},
abstract = {Comparisons between observations and between variables}
}