Flow

Yes…On the River, In the River, With the River, By the River…

Qual Variables and Quant Variables
Author

Arvind V

Published

April 22, 2024

Modified

May 27, 2024

Abstract
Changes in Information over Space and Time

What graphs will we see today?

Variable #1 Variable #2 Chart Names Chart Shape
Quant None Sankey Plot
No Pronoun Answer Variable/Scale Example What Operations?
2 How Many / Much / Heavy? Few? Seldom? Often? When? Quantities with Scale. Differences are meaningful, but not products or ratios Quantitative/Interval pH,SAT score(200-800),Credit score(300-850),SAT score(200-800),Year of Starting College Mean,Standard Deviation
3 How, What Kind, What Sort A Manner / Method, Type or Attribute from a list, with list items in some " order" ( e.g. good, better, improved, best..) Qualitative/Ordinal Socioeconomic status (Low income, Middle income, High income),Education level (HighSchool, BS, MS, PhD),Satisfaction rating(Very much Dislike, Dislike, Neutral, Like, Very Much Like) Median,Percentile

How do these Chart(s) Work?

Sometimes Qual data can itself vary over, or depend upon, over a bunch of independent Qual data categories. For instance we can contemplate enrollment at a University, and show how students move from course to course in a University. Or how customers drift from one category of products or brands to another….or the movement of cricket players from one IPL Team to another !!

As can be surmised, the independent categories can be interpreted both as time ( e.g semesters / cycles / years) and space (teams / courses / departments). And we can chart another Quant or Qual variable that moves across levels of the first chosen Qual variable.

  • The Qualitative variables being connected are mapped to stages/axes
  • Each level within a Qual variable is mapped to nodes / strata / lodes;
  • And the connections between the strata of the axes are called flows / links / alluvia.

Such diagrams are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages E.g Students going through multiple courses during a semester of study.

Here is an example of a Sankey Diagram for the Titanic dataset:

Figure 1: Titanic Sankey Plot

It is seen from Figure 1 that the x-axis has Qual variables stages shown as “pillars” and these are split into nodes based on the levels for each Qual variable stage respectively. Flows with variable thickness connect one node at one stage to another node and another stage.

Sankey, Parallel sets, and Alluvial Charts

Here is what Thomas Lin Pedersen says:

A parallel sets diagram is a type of visualisation showing the interaction between multiple categorical variables.

If the variables have an intrinsic order the representation can be thought of as a Sankey Diagram.

If each variable is a point in time it will resemble an Alluvial diagram.

Creating Sankey Plots

Dataset: Course Allocations

Let us try one more dataset:

Examine the Data

Let us still import into Orange and see the data anyway!

Figure 4: Course Allocation Data

From Figure 4, we see that we have 300 students and their course allocations over one Foundation year at SMI. (The data is anonymized but accurate; no staff or students were harmed in the collection of this data)!

Data Dictionary

Quantitative Data
  • None!!
Qualitative Data
  • Major(chr): Student Major

  • All other columns: Courses they were allocated to during the course of the year.

  • MM = Mark Making; DM = Digital Making; FM = Form Making;

  • DTT = Digital Thinking Tools; VTT = Visual Thinking Tools;

  • O&C = Order and Chaos; P&I = Play and Invent; S&P = Space and Place; C&P = Communities and Practices; F&S = Form and Structure; B&C = Body and Context; L&L = Layers and Lenses; E&C = Everything’s Connected; P&P = Patterns and Paradigms(Oh all right)

Research Questions

Let’s try a few questions and see if they are answerable with Sankey/Alluvial Plots

Question

Q1. Do all DMA/Film/CAP students take one B&C course?

Figure 5
Question

Q2. Do all IADP, HCD, and PSD students take one P&I course?

What is the Story Here?

Write in!!

Your Turn

  1. Within the ggalluvial package from R, are two datasets, majors and vaccinations. Plot alluvial charts for both of these, and write their stories.



  1. Go to the American Life Panel Website where you will find many public datasets. Try to take one and make charts from it that we have learned in this Module.

  2. Try this from Vincent Arel-Bundock’s website: Cybersecurity breaches reported to the US Department of Health and Human Services. The dataset is downloadable here CSV. NOTE: data may require some cleaning beforehand in Excel!

Wait, But Why?

  • Mosaic Charts gave us a view of counts of data across level-combinations of Qual variables.
  • Sankey/Alluvial Charts give us a sense of flow: how did different observations flow from one Qual level to another?
  • This is very valuable if these Qual variables and their levels have a natural sequence. E.g. Choices made in purchases, Attitudes over time and situation, Affiliations and Friendships over time etc.
  • The sequence may even be conceptualized as a consequence, provided you have adequate insight into the situations involved.
  • You get a sense of the sub-populations in each combo of Qual variables and can decide what to do about both plenitude and rarity!

References

  1. A good pictorial introduction to different parts of a Sankey Chart. https://github.com/davidsjoberg/ggsankey

  2. Minard’s famous Alluvial Plot of Napoleon’s Invasion of Russia. https://www.andrewheiss.com/blog/2017/08/10/exploring-minards-1812-plot-with-ggplot2/?utm_content=buffer70e4b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

  3. Minard Revisited. https://www.masswerk.at/nowgobang/2020/minard-revisited

  4. 100+ years of the Titanic data. https://www.datavis.ca/papers/titanic/

Back to top