Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Data Science with No Code
  3. Surveys
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • πŸƒ Inference for a Single Mean
        • πŸƒ Inference for Two Independent Means
        • πŸƒ Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • πŸƒ Testing a Single Proportion
        • πŸƒ Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • πŸ•” Modelling and Predicting Time Series
      • Predictive Modelling
        • πŸ‰ Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • πŸ“ Intro to Linear Programming
        • πŸ’­ The Simplex Method - Intuitively
        • πŸ“… The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children's Games
        • Valentine’s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaire’s Disease in the USA
        • Antarctic Sea ice
        • William Farr's Observations on Cholera in London
    • R for Artists and Managers
      • πŸ•Ά Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Hole…
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy toves…
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sir…which way to the Secret Garden?
      • Lab-10: An Invitation from the Queen…to play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: You’re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • πŸ‰ Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • πŸ•” Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • πŸ•Ά Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • 🧭 Using Idyll
      • 🧭 Using Apparatus
      • 🧭 Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • What graphs will we see today?
  • What kind of Data Variables will we choose?
  • How do these Chart(s) Work?
  • Plotting Likert Charts
  • Dataset: CareGivers
    • Examine the Data
    • Data Dictionary
    • Research Questions
    • What is the Story Here?
  • Dataset: Who Does the Housework?
    • Examine the Data
    • Data Dictionary
    • Research Questions
    • What is the Story Here?
  • Your Turn
  • Wait, But Why?
  • Dataset: Edible Insects
  • References
  1. Teaching
  2. Data Science with No Code
  3. Surveys

Surveys

Extra Cheese with my 5-insect burger, please!

Published

April 30, 2024

Modified

August 16, 2024

What graphs will we see today?

Variable #1 Variable #2 Chart Names Chart Shape
Qual Qual Likert Plots Bipolar Scale by Aenne Brielmann from Noun Project (CC BY 3.0)

What kind of Data Variables will we choose?

No Pronoun Answer Variable/Scale Example What Operations?
3 How, What Kind, What Sort A Manner / Method, Type or Attribute from a list, with list items in some " order" ( e.g. good, better, improved, best..) Qualitative/Ordinal Socioeconomic status (Low income, Middle income, High income),Education level (HighSchool, BS, MS, PhD),Satisfaction rating(Very much Dislike, Dislike, Neutral, Like, Very Much Like) Median,Percentile

How do these Chart(s) Work?

In many design project situations, we perform say target audience surveys to get Likert Scale data, where several respondents rate a product or a service on a scale of Very much like, somewhat like, neutral, Dislike and Very much dislike, for example.

Some examples of Likert Scales are shown below.

Figure 1: Likert Scale Questionnaire Samples

As seen, we can use Likert Scale based questionnaire for a variety of aspects in our survey instruments.

NoteVariable Labels and Value Labels

Variable label is human readable description of the variable. R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier. Variable label can give a nice, long description of variable. With this description it is easier to remember what those variable names refer to.
Value labels are similar to variable labels, but value labels are descriptions of the values a variable can take. Labeling values means we don’t have to remember if 1=Extremely poor and 7=Excellent or vice-versa. We can easily get dataset description and variables summary with info function.

Plotting Likert Charts

  • Using Orange
  • Using RAWgraphs
  • Using DataWrapper

The description of the Orange widget for mosaic charts is here.

Let us take a very sadly famous data set (no, not iris again πŸ™€), but titanic and examine it in Orange.

Not a mosaic plot, but a Matrix Plot.

Download this RAWGraphs workflow file and import there and see.

Does not seem to have a mosaic diagram capability.

Dataset: CareGivers

Here is another example of Likert data from the healthcare industry.

efc is a German data set from a European study titled EUROFAM study, on family care of older people. Following a common protocol, data were collected from national samples of approximately 1,000 family carers (i.e. caregivers) per country and clustered into comparable subgroups to facilitate cross-national analysis. The research questions in this EUROFAM study were:

  1. To what extent do family carers of older people use support services or receive financial allowances across Europe? What kind of supports and allowances do they mainly use?

  2. What are the main difficulties carers experience accessing the services used? What prevents carers from accessing unused supports that they need? What causes them to stop using still-needed services?

  3. In order to improve support provision, what can be understood about the service characteristics considered crucial by carers, and how far are these needs met? and,

  4. Which channels or actors can provide the greatest help in underpinning future policy efforts to improve access to services/supports?

We will select the variables from the efc data set that related to coping (on part of care-givers) and plot their responses after inspecting them:

```{r}
#| label: efc_data
#| layout-nrow: 2
#| column: body-outset-right
data(efc,package = "sjPlot")

efc %>% 
  select(dplyr::contains("cop")) %>% 
  head(20)
efc %>% 
  select(dplyr::contains("cop")) %>% 
  str()
```
ABCDEFGHIJ0123456789
 
 
c82cop1
<dbl>
c83cop2
<dbl>
c84cop3
<dbl>
c85cop4
<dbl>
c86cop5
<dbl>
c87cop6
<dbl>
c88cop7
<dbl>
c89cop8
<dbl>
c90cop9
<dbl>
1322211233
2333341322
3221411143
4413111124
5321222144
6223332211
7424112414
8322111233
9323221313
10321211113
Next
12
Previous
1-10 of 20 rows
'data.frame':   908 obs. of  9 variables:
 $ c82cop1: num  3 3 2 4 3 2 4 3 3 3 ...
  ..- attr(*, "label")= chr "do you feel you cope well as caregiver?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ c83cop2: num  2 3 2 1 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "do you find caregiving too demanding?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c84cop3: num  2 3 1 3 1 3 4 2 3 1 ...
  ..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your friends?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c85cop4: num  2 3 4 1 2 3 1 1 2 2 ...
  ..- attr(*, "label")= chr "does caregiving have negative effect on your physical health?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c86cop5: num  1 4 1 1 2 3 1 1 2 1 ...
  ..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your family?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c87cop6: num  1 1 1 1 2 2 2 1 1 1 ...
  ..- attr(*, "label")= chr "does caregiving cause financial difficulties?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c88cop7: num  2 3 1 1 1 2 4 2 3 1 ...
  ..- attr(*, "label")= chr "do you feel trapped in your role as caregiver?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c89cop8: num  3 2 4 2 4 1 1 3 1 1 ...
  ..- attr(*, "label")= chr "do you feel supported by friends/neighbours?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ c90cop9: num  3 2 3 4 4 1 4 3 3 3 ...
  ..- attr(*, "label")= chr "do you feel caregiving worthwhile?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"

The coping related variables have responses on the Likert Scale (1,2,3,4) which correspond to (never, sometimes, often, always), and each variable also has a label defining each variable. The labels are actually ( and perhaps usually ) the questions in the survey.

Examine the Data

(a) Titanic Data Table
(b) Titanic Data Table
Figure 2

Data Dictionary

NoteQuantitative Data

None.

NoteQualitative Data
  • survived: (chr) yes or no
  • status: (chr) Class of Travel, else β€œcrew”
  • age: (chr) Adult, Child
  • sex: (chr) Male / Female.

Research Questions

NoteQ.1. What is the dependence of survived upon sex?
Figure 3: Titanic Mosaic Chart

Note the huge imbalance in survived with sex: men have clearly perished in larger numbers than women. Which is why the colouring by the Pearson Residuals show large positive residuals for men who died, and large negative residuals for women who died

NoteQ.2. How does survived depend upon status?
Figure 4: Titanic Mosaic Chart

Crew has seen deaths in large numbers, as seen by the large negative residual for crew-survivals. First Class passengers have had speedy access to the boats and have survived in larger proportions than say second or third class. There is a large positive residual for first-class survivals.

What is the Story Here?

In Figure 4, we have plotted sex vs status, and coloured by whether the (subset of) people survived or not. (Red is YES, Blue is NO!). As can be seen the areas are very dissimilar across both variables. More deaths occurred among the crew than among the passengers; and first class passengers have survived more than third class passengers. And of course, more men died than women.

So we can state that:

  • Status and Survived are not un-correlated
  • Sex and Survived are not un-correlated
  • Does ticking the Compare with Total box in Orange help to arrive at this inference? How so?

It remains to figure out just how serious this correlation is.

ImportantActual and β€œExpected” Counts

The mosaic chart is a visualization of the obtained count on which the tile is constructed.

It is also possible to compute a per-cell expected count, if the categorical variables are assumed independent, that is, not correlated. This is the NULL Hypothesis. The test for whether they are independent or not, as any inferential test, is based on comparing the observed counts with these expected counts under the null hypothesis. So, what might the expected frequency of a cell be in cross-tabulation table for cell i,j given no relationship between the variables of interest?

Represent the sum of row i with n+i, the sum of column j with nj+, and the grand total of all the observations with n. And independence of variables means that their joint probability is the product of their probabilities. Therefore, the Expected Cell Frequency/Count is given by:

 Expected Count ei,j=rowSum Γ— colSumn=(n+i)(nj+)n

The comparison of what occurred to what is expected is based on their difference, scaled by the square root of the expected, the Pearson Residual:

ri,j=(Actualβˆ’Expected)Expected=(oi,jβˆ’ei,j)ei,j

The sum of all the squared Pearson residuals is the chi-square statistic, Ο‡2, upon which the inferential analysis follows.

Note Ο‡2 For the Cat-egorically Curious

For the intrepid and insatiably curious, there is an intuitive explanation, and some hand-calculations and walk-through of the Contingency table and the Ο‡2-test here.

Dataset: Who Does the Housework?

Let us take this dataset on household tasks, and who does them. Download this dataset and import in into your Mosaic Chart workflow.

Examine the Data

Figure 5: Household Tasks Distribution Raw Data

Data Dictionary

NoteQuantitative Data
  • Freq: (int) No of times a task was carried (in different ways)
NoteQualitative Data
  • Who: (chr) Who carried out the task?
  • Task: (chr) Task? Which task? Can’t you see I’m tired?
Figure 6: Household Tasks Distribution Raw Data

This data looks fine all right, but the mosaic plot looks bewildering and of course is wrong. The reason for this is that the basic HouseTasks.csv data is pre-aggregated: we have a neat column of counts already in the Freq data. And why is this a problem? Orange expects data to be purely categorical and does it own counting, and is not able to sensibly use this Freq column. Orange simply counts categories, which are of course utterly symmetric and unique.

NoteStat Figures and Stats

Most, if not all, statistical graphs do some internal computation. For instance the bar chart performs counts vs Qual variables; a Histogram both bins the Quant variable, and counts for entries in each bin. This is a good thing, people, but it does mean that the data needs to be in specific format before using it for plots.

So now what? We need to (wait for it):

  • uncount the data πŸ™€
  • Take each combination of Quals Who and Task
  • Repeat ( i.e copy-paste) that combo line as many times as the value in Freq
  • (optionally) Deleting the Freq column, or at least not using it further

All this is (to the best of my ability) not possible in any of these trifling tools that we are using here, and can be done in a jiffy in R or Python. Didn’t I tell you coding was far far far far simpler? Peasants.

Research Questions

NoteQ.1 Is there correlation between Who carries out the task, and the Task itself?
Figure 7: Household Tasks Mosaic

What is the Story Here?

Your Turn

  1. Clothing and Intelligence Rating of Children!! Are well-dressed actually smarter? Is that the exact reverse with SMI faculty?
  1. Pre-marital Sex and Divorce

Wait, But Why?

Dataset: Edible Insects

GBIF.org (26 April 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.texc32

  1. Shelomi. (2022). Dataset for: Factors Affecting Willingness and Future Intention to Eat Insects in Students of an Edible Insect Course [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7379294

References

  1. Piping Hot Data: Leveraging Labelled Data in R, https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/>

  2. Dataset: Edible Insects

GBIF.org (26 April 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.texc32

  1. Shelomi. (2022). Dataset for: Factors Affecting Willingness and Future Intention to Eat Insects in Students of an Edible Insect Course [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7379294
Back to top
Networks
Experiments

License: CC BY-SA 2.0

Website made with ❀️ and Quarto, by Arvind V.

Hosted by Netlify .