Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Case Studies
  4. Coffee Flavours
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • 🃏 Inference for a Single Mean
        • 🃏 Inference for Two Independent Means
        • 🃏 Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • 🃏 Testing a Single Proportion
        • 🃏 Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • 🕔 Modelling and Predicting Time Series
      • Predictive Modelling
        • 🐉 Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • 📐 Intro to Linear Programming
        • 💭 The Simplex Method - Intuitively
        • 📅 The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children's Games
        • Valentine’s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaire’s Disease in the USA
        • Antarctic Sea ice
        • William Farr's Observations on Cholera in London
    • R for Artists and Managers
      • 🕶 Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Hole…
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy toves…
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sir…which way to the Secret Garden?
      • Lab-10: An Invitation from the Queen…to play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: You’re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • 🐉 Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • 🕔 Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • 🕶 Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • 🧭 Using Idyll
      • 🧭 Using Apparatus
      • 🧭 Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • Setting up R Packages
  • Introduction
  • Read the Data
  • Inspect, Clean the Data
  • Data Dictionary
  • Research Question
  • Analyse/Transform the Data
  • Plot the Data
  • Discussion
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Case Studies
  4. Coffee Flavours

Coffee Flavours

Coffee with Hansel and Gretel

Setting up R Packages

library(tidyverse)
library(mosaic)
library(skimr)
library(ggformula)
library(ggbump)
Show the Code
extrafont::loadfonts(quiet = TRUE)
font <- "Roboto Condensed"
theme_set(new = theme_classic(base_size = 14))

theme_update(
  panel.grid.minor = element_blank(),
  text = element_text(family = font),
  # text elements
  plot.title = element_text( # title
    family = font, # set font family
    size = 20, # set font size
    face = "bold", # bold typeface
    hjust = 0, # left align
    # vjust = 2                #raise slightly
    margin = margin(0, 0, 10, 0)
  ),
  plot.subtitle = element_text( # subtitle
    family = font, # font family
    size = 14, # font size
    hjust = 0,
    margin = margin(2, 0, 5, 0)
  ),
  plot.caption = element_text( # caption
    family = font, # font family
    size = 8, # font size
    hjust = 1
  ), # right align

  axis.title = element_text( # axis titles
    family = font, # font family
    size = 10 # font size
  ),
  axis.text = element_text( # axis text
    family = font, # axis family
    size = 8
  ) # font size
)

Introduction

This dataset pertains to scores various types of coffees on parameters such as aroma, flavour, after-taste etc.

NoteBreadcrumbs

Since there are some interesting pre-processing actions required of data, and some choices to be made as well, I will leave some breadcrumbs, and some intermediate results, for you to look at and figure out the analysis/EDA path that you might take! You can then vary these at will after getting a measure of confidence!

Read the Data

Rows: 1,339
Columns: 43
$ total_cup_points      <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
$ species               <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
$ owner                 <chr> "metad plc", "metad plc", "grounds for health ad…
$ country_of_origin     <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
$ farm_name             <chr> "metad plc", "metad plc", "san marcos barrancas …
$ lot_number            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ mill                  <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
$ ico_number            <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
$ company               <chr> "metad agricultural developmet plc", "metad agri…
$ altitude              <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
$ region                <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
$ producer              <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
$ number_of_bags        <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
$ bag_weight            <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
$ in_country_partner    <chr> "METAD Agricultural Development plc", "METAD Agr…
$ harvest_year          <chr> "2014", "2014", NA, "2014", "2014", "2013", "201…
$ grading_date          <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
$ owner_1               <chr> "metad plc", "metad plc", "Grounds for Health Ad…
$ variety               <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
$ processing_method     <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
$ aroma                 <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
$ flavor                <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
$ aftertaste            <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
$ acidity               <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
$ body                  <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
$ balance               <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
$ uniformity            <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
$ clean_cup             <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
$ sweetness             <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
$ cupper_points         <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
$ moisture              <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
$ category_one_defects  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ quakers               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ color                 <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
$ category_two_defects  <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
$ expiration            <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
$ certification_body    <chr> "METAD Agricultural Development plc", "METAD Agr…
$ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
$ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
$ unit_of_measurement   <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
$ altitude_low_meters   <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
$ altitude_high_meters  <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
$ altitude_mean_meters  <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …

Inspect, Clean the Data

What are the non-numeric, or Qualitative variables here?

ABCDEFGHIJ0123456789
name
<chr>
class
<chr>
levels
<int>
n
<int>
missing
<int>
distribution
<chr>
speciescharacter213390Arabica (97.9%), Robusta (2.1%)
ownercharacter31513327juan luis alvarado romero (11.6%) ...
country_of_origincharacter3613381Mexico (17.6%), Colombia (13.7%) ...
farm_namecharacter571980359various (4.8%), rio verde (2.3%) ...
lot_numbercharacter22727610631 (6.5%), 020/17 (2.2%) ...
millcharacter4601024315beneficio ixchel (8.8%) ...
ico_numbercharacter84711881510 (6.5%), Taiwan (2.6%) ...
companycharacter2811130209unex guatemala, s.a. (7.6%) ...
altitudecharacter39611132261100 (3.9%), 1200 (3.8%) ...
regioncharacter356128059huila (8.8%), oriente (6.2%) ...
Next
123
Previous
1-10 of 24 rows

Look at the number of levels in those Qual variables!!Some are too many and some are so few… Suppose we count the data on the basis of a few?

ABCDEFGHIJ0123456789
processing_method
<chr>
n
<int>
Natural / Dry258
Other26
Pulped natural / honey14
Semi-washed / Semi-pulped56
Washed / Wet815
NA170
6 rows
ABCDEFGHIJ0123456789
country_of_origin
<chr>
n
<int>
Brazil132
Burundi2
China16
Colombia183
Costa Rica51
Cote d?Ivoire1
Ecuador3
El Salvador21
Ethiopia44
Guatemala181
Next
1234
Previous
1-10 of 37 rows
NoteBreadcrumb 1

Why did I choose these Qual factors to count with?

Data Dictionary

NoteQuantitative Variables

Write in.

NoteQualitative Variables

Write in.

NoteObservations

Write in.

Research Question

NoteBreadcrumb 1

Among the country_of_origin with the 5 highest average total_cup_points, how do the average ratings vary in ranks on the other coffee parameters?

Why this somewhat long-winded question? Why all this averagestuff??

Why did I choose country_of_origin?Are there any other options?

Analyse/Transform the Data

```{r}
#| label: data-preprocessing
#
# Write in your code here
# to prepare this data as shown below
# to generate the plot that follows
```
ABCDEFGHIJ0123456789
country_of_origin
<chr>
total_cup_points
<dbl>
aroma
<dbl>
flavor
<dbl>
aftertaste
<dbl>
acidity
<dbl>
body
<dbl>
Ethiopia90.588.678.838.678.758.50
Ethiopia89.928.758.678.508.588.42
Guatemala89.758.428.508.428.428.33
Ethiopia89.008.178.588.428.428.50
Ethiopia88.838.258.508.258.508.42
Brazil88.838.588.428.428.508.25
Peru88.758.428.508.338.508.25
Ethiopia88.678.258.338.508.428.33
Ethiopia88.428.678.678.588.428.33
Ethiopia88.258.088.588.508.507.67
Next
123456
...
134
Previous
1-10 of 1,339 rows
NoteBreadcrumb 3

We have too much coffee here! We need to compress this data!

What??? Why? How? Where???

ABCDEFGHIJ0123456789
country_of_origin
<chr>
total_cup_points_mean
<dbl>
aroma_mean
<dbl>
flavor_mean
<dbl>
aftertaste_mean
<dbl>
acidity_mean
<dbl>
body_mean
<dbl>
Papua New Guinea85.750008.3300008.4200007.8300008.3300008.000000
Ethiopia85.484097.8963648.0090917.8938648.0436367.924091
Japan84.670007.7500007.7500007.7500007.4200008.080000
United States84.433007.8340007.9920007.8500007.9350007.842000
Kenya84.309607.7868007.7824007.7132007.8660007.726400
5 rows
NoteBreadcrumb 4

Where did all that coffee go??? Why are there only 5 rows in the data? Why the names of the columns take on a surname, ’_mean`??

ABCDEFGHIJ0123456789
country_of_origin
<chr>
total_rank
<int>
aroma_rank
<int>
aftertaste_rank
<int>
body_rank
<int>
acidity_rank
<int>
Papua New Guinea11321
Ethiopia22132
Japan35415
United States43243
Kenya54554
5 rows
NoteBreadcrumb 5

What just happened? How did we convert those mean numbers to ranks?

ABCDEFGHIJ0123456789
country_of_origin
<chr>
coffee_parameter
<chr>
ranks
<int>
Papua New Guineaacidity_rank1
Ethiopiaacidity_rank2
Japanacidity_rank5
United Statesacidity_rank3
Kenyaacidity_rank4
Papua New Guineaaftertaste_rank3
Ethiopiaaftertaste_rank1
Japanaftertaste_rank4
United Statesaftertaste_rank2
Kenyaaftertaste_rank5
Next
123
Previous
1-10 of 25 rows

Plot the Data

Discussion

Complete the Data Dictionary. Select and Transform the variables as shown. Create the graphs shown below and discuss the following questions:

  • Identify the type of charts
  • Identify the variables used for various geometrical aspects (x, y, fill…). Name the variables appropriately.
  • What research activity might have been carried out to obtain the data graphed here? Provide some details.
  • What might have been the Hypothesis/Research Question to which the response was Chart?
  • Write a 2-line story based on the chart, describing your inference/surprise.
Back to top
Seaweed Nutrients
Legionnaire’s Disease in the USA

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .