No | Pronoun | Answer | Variable/Scale | Example | What Operations? |
---|---|---|---|---|---|
1 | How Many / Much / Heavy? Few? Seldom? Often? When? | Quantities, with Scale and a Zero Value.Differences and Ratios /Products are meaningful. | Quantitative/Ratio | Length,Height,Temperature in Kelvin,Activity,Dose Amount,Reaction Rate,Flow Rate,Concentration,Pulse,Survival Rate | Correlation |
π Rhythm
Ups and Downs, Rhymes and Reasons, Seasons and Rhythms
What graphs will we see today?
Variable #1 | Variable #2 | Chart Names | Chart Shape |
---|---|---|---|
Quant | Quant | Line Plot |
|
What kind of Data Variables will we choose?
Inspiration
Ek Ledecky bheegi-bhaagi si, is it?
Yeh Ledecky hai, ya jal-pari?
In Figure 1, the black line is the average of the 50 best times at each distance since 2000. The top 200 times for each distance since 2000 are also plotted, with light orange lines each representing one swimmer.
Her races and her career essentially follow the same pattern β the more she swims, the more she separates from the field.
Her 1500 metres record timing is better than the best time for 800m!!π±
How do these Chart(s) Work?
Line Plots take two separate Quant variables as inputs. Each of the variables is mapped to a position, or coordinate: one for the X-axis, and the other for the Y-axis. Each pair of observations from the two Quant variables ( which would be in one row!) give us a point
. All this much is identical with the Scatter Plot.
And here, the points are connected together and sometimes thrown away altogether, leaving just the line.
Looking at the lines, we get a very function-al sense of the variation: is it upward or downward? Is it linear or nonlinear? Is it periodic or seasonalβ¦all these questions can be answered with Line Charts.
Line charts often have one variable as a time variable. In such case the data is said to be a time series. We might deal with Time Series later.
Plotting a Scatter Plot
Although this widget exists, , it does not provide quite the kind of Line Chart is considered standard in the data viz industry.
Nonetheless, let us at least look at this data in Orange, before plotting it elsewhere. Yes, peasants.
Import this into Orange and seeβ¦(sigh)
Figure 2 states that there are 432 data points, with 7 variables in the dataset; some missing data.
For now, the variables we need are :
-
Year
: (int) Year in which RIAA revenue was logged -
Value (For Charting)
: (int) Revenue in million USD We can ignore the rest for now, unless we plan to work more with this data, and need to know more. The other numerical data showing billions of USD are not easily decipherable, an example of data that is not documented wellβ¦
-
Category
: (chr) Form of the Music released ( CD etc..)
https://academy.datawrapper.de/article/23-how-to-create-a-line-chart
Here be dragons: DataWrapper wants the data in wide format: each Format
of music needs to have its figures in a separate column! π€¦. And this is not a data transformation that we can achieve within DataWrapper. Bah.
We are probably better off plotting a regular scatter plot. Here too there seem to be limitations because we are not able to colour the series based on type of music Format
.
What is the Story here?
- Over the years different music formats have had their place in the sun
- All physical forms are on the wane; streaming music is the current mode of music consumption.
The Shape of You Data
Never mind that silly song now.
As mentioned above, data can be in wide or long form. How does one imagine this shape-shifting that seems needed now and then? Letβs see.
Several tools such as DataWrapper (and others, yes, I admit, even with code, as we will see) need data transformed to a specific shape. We should now look at this idea of shape in data. Consider the data tables below:
Product | Power | Cost | Harmony | Style | Size | Manufacturability | Durability | Universality |
---|---|---|---|---|---|---|---|---|
G1 | 0.5858003 | 0.2773750 | 0.7244059 | 0.0731445 | 0.1000535 | 0.4551024 | 0.9622046 | 0.9966129 |
G2 | 0.0089458 | 0.8135742 | 0.9060922 | 0.7546750 | 0.9540688 | 0.9710557 | 0.7617024 | 0.5062709 |
G3 | 0.2937396 | 0.2604278 | 0.9490402 | 0.2860006 | 0.4156071 | 0.5839880 | 0.7145085 | 0.4899432 |
Product | Parameter | Rating |
---|---|---|
G1 | Power | 0.5858003 |
G1 | Cost | 0.2773750 |
G1 | Harmony | 0.7244059 |
G1 | Style | 0.0731445 |
G1 | Size | 0.1000535 |
G1 | Manufacturability | 0.4551024 |
G1 | Durability | 0.9622046 |
G1 | Universality | 0.9966129 |
G2 | Power | 0.0089458 |
G2 | Cost | 0.8135742 |
G2 | Harmony | 0.9060922 |
G2 | Style | 0.7546750 |
G2 | Size | 0.9540688 |
G2 | Manufacturability | 0.9710557 |
G2 | Durability | 0.7617024 |
G2 | Universality | 0.5062709 |
G3 | Power | 0.2937396 |
G3 | Cost | 0.2604278 |
G3 | Harmony | 0.9490402 |
G3 | Style | 0.2860006 |
G3 | Size | 0.4156071 |
G3 | Manufacturability | 0.5839880 |
G3 | Durability | 0.7145085 |
G3 | Universality | 0.4899432 |
What we have done is:
- convert all the variable names into a stacked column Parameter
- Put all the numbers into another column Rating
- Repeated the Product
column values as many times as needed to cover all Parameter
s (8 times).
See the gif below to get an idea of how this transformation can be worked reversibly. (Yeah, never mind the code also.)
So how can we actually do this? Turns out there are some nice people at U. San Diego who have built an R-oriented app called Radiant for Business Analytics that can do this pretty much click-and-point style, though it is nowhere as much fun as Orange. Head off there:
https://vnijs.shinyapps.io/radiant
We upload our original data, pivot it, and download the pivotted data. Peasants. Now the pivotted wide-form data should work in DataWrapper.
Whatever.
Dataset: Cancer
Examine the Data
Data Dictionary
-
Diagnosis
: (text) (B)enign, or (M)alignant
Research Questions
Q1.
Q2.
What is the Story Here?
Your Turn
Wait, But Why?
References
Charles Chambliss (1989). The Mundanity of Excellence: An ethnographical report on Stratification and Olympic Swimmers.
Nijs V (2023). radiant: Business Analytics using R and Shiny. R package version 1.6.0, https://github.com/radiant-rstats/radiant.