Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Statistical Inference
  4. 🧭 Basics of Statistical Inference
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • šŸŽ² Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • šŸƒ Inference for a Single Mean
        • šŸƒ Inference for Two Independent Means
        • šŸƒ Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • šŸƒ Testing a Single Proportion
        • šŸƒ Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • šŸ•” Modelling and Predicting Time Series
      • Predictive Modelling
        • šŸ‰ Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • šŸ“ Intro to Linear Programming
        • šŸ’­ The Simplex Method - Intuitively
        • šŸ“… The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children’s Games
        • Valentine’s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaire’s Disease in the USA
        • Antarctic Sea ice
        • William Farr’s Observations on Cholera in London
    • R for Artists and Managers
      • šŸ•¶ Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Hole…
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy toves…
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sir…which way to the Secret Garden?
      • Lab-10: An Invitation from the Queen…to play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: You’re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • šŸ‰ Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • šŸ•” Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • šŸ•¶ Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • 🧭 Using Idyll
      • 🧭 Using Apparatus
      • 🧭 Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • Introduction
  • The Big Ideas in Stats
    • 1.Aggregation
    • 2.Information
    • 3.Likelihood
    • 4.Intercomparison
    • 5.Regression
    • 6.Design of Experiments and Observations
    • 7.Residuals
  • What is a Statistical Model?
  • Types of Statistical Models Based on Purpose
    • Types of Models Based on Data Variables
  • Linear Models Everywhere
  • A Flowchart of Statistical Inference Tests
  • References
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Statistical Inference
  4. 🧭 Basics of Statistical Inference

🧭 Basics of Statistical Inference

Published

November 9, 2022

Modified

Invalid Date

Abstract
What Do you want to Find Out Today?

Introduction

In this set of modules we will explore Data, understand what types of data variables there are, and the kinds of statistical tests and visualizations we can create with them.

The Big Ideas in Stats

Steven Stigler(Stigler 2016) is the author of the book ā€œThe Seven Pillars of Statistical Wisdomā€. The Big Ideas in Statistics from that book are:

1.Aggregation

The first pillar I will call Aggregation, although it could just as well be given the nineteenth-century name, ā€œThe Combination of Observations,ā€ or even reduced to the simplest example, taking a mean. Those simple names are misleading, in that I refer to an idea that is now old but was truly revolutionary in an earlier day—and it still is so today, whenever it reaches into a new area of application. How is it revolutionary? By stipulating that, given a number of observations, you can actually gain information by throwing information away! In taking a simple arithmetic mean, we discard the individuality of the measures, subsuming them to one summary.

2.Information

In the early eighteenth century it was discovered that in many situations the amount of information in a set of data was only proportional to the square root of the number n of observations, not the number n itself.

3.Likelihood

By the name I give to the third pillar, Likelihood, I mean the calibration of inferences with the use of probability. The simplest form for this is in significance testing and the common P-value.

4.Intercomparison

It represents what was also once a radical idea and is now commonplace: that statistical comparisons do not need to be made with respect to an exterior standard but can often be made in terms interior to the data themselves. The most commonly encountered examples of intercomparisons are Student’s t-tests and the tests of the analysis of variance.

5.Regression

I call the fifth pillar Regression, after Galton’s revelation of 1885, explained in terms of the bivariate normal distribution. Galton arrived at this by attempting to devise a mathematical framework for Charles Darwin’s theory of natural selection, overcoming what appeared to Galton to be an intrinsic contradiction in the theory: selection required increasing diversity, in contradiction to the appearance of the population stability needed for the definition of species.

6.Design of Experiments and Observations

The sixth pillar is Design, as in ā€œDesign of Experiments,ā€ but conceived of more broadly, as an ideal that can discipline our thinking in even observational settings.Starting in the late nineteenth century, a new understanding of the topic appeared, as Charles S. Peirce and then Fisher discovered the extraordinary role randomization could play in inference.

7.Residuals

The most common appearances in Statistics are our model diagnostics (plotting residuals), but more important is the way we explore high-dimensional spaces by fitting and comparing nested models.

In our work with Statistical Models, we will be working with all except Idea 6 above.

What is a Statistical Model?

From Daniel Kaplan’s book:

ā€œModelingā€ is a process of asking questions. ā€œStatisticalā€ refers in part to data – the statistical models you will construct will be rooted in data. But it refers also to a distinctively modern idea: that you can measure what you don’t know and that doing so contributes to your understanding.

The conclusions you reach from data depend on the specific questions you ask. The word ā€œmodelingā€ highlights that your goals, your beliefs, and your current state of knowledge all influence your analysis of data. You examine your data to see whether they are consistent with the hypotheses that frame your understanding of the system under study.

Types of Statistical Models Based on Purpose

There are three main uses for statistical models. They are closely related, but distinct enough to be worth enumerating.

Description. Sometimes you want to describe the range or typical values of a quantity. For example, what’s a ā€œnormalā€ white blood cell count? Sometimes you want to describe the relationship between things. Example: What’s the relationship between the price of gasoline and consumption by automobiles?

Classification or Prediction. You often have information about some observable traits, qualities, or attributes of a system you observe and want to draw conclusions about other things that you can’t directly observe. For instance, you know a patient’s white blood-cell count and other laboratory measurements and want to diagnose the patient’s illness.

Anticipating the consequences of interventions. Here, you intend to do something: you are not merely an observer but an active participant in the system. For example, people involved in setting or debating public policy have to deal with questions like these: To what extent will increasing the tax on gasoline reduce consumption? To what extent will paying teachers more increase student performance?

The appropriate form of a model depends on the purpose. For example, a model that diagnoses a patient as ill based on an observation of a high number of white blood cells can be sensible and useful. But that same model could give absurd predictions about intervention: Do you really think that lowering the white blood cell count by bleeding a patient will make the patient better?

To anticipate correctly the effects of an intervention you need to get the direction of cause (polarity) and effect (magnitude) correct in your models.

Note

An effect size tells how the output of a model changes when a simple change is made to the input.Effect sizes always involve two variables: a response variable and a single explanatory variable. Effect size is always about a model. The model might have one explanatory variable or many explanatory variables. Each explanatory variable will have its own effect size, so a model with multiple explanatory variables will have multiple effect sizes.

Note

But for a model used for classification or prediction, it may be unnecessary to represent causation correctly. Instead, other issues, e.g., the reliability of data, can be the most important. One of the thorniest issues in statistical modeling – with tremendous consequences for science, medicine, government, and commerce – is how you can legitimately draw conclusions about interventions from models based on data collected without performing these interventions.

Types of Models Based on Data Variables

Let us look at the famous dataset pertaining to Francis Galton’s work on the heights of children and the heights of their parents. We can create 4 kinds of models based on the types of variables in that dataset.

Variables and Models

Variables and Models

Linear Models Everywhere

One method in this set of modules is to take the modern view that all these models can be viewed from a standpoint of the Linear Model, also called Linear Regression y=β1āˆ—x+β0 . For example, it is relatively straightforward to imagine Plot B (Quant vs Quant ) as an example of a Linear Model, with the dependent variable modelled as y and the independent one as x. We will try to work up to the intuition that this model can be used to understand all the models in the Figure.

A Flowchart of Statistical Inference Tests

Means

Check Assumptions

Multiple-Means

p

np indep

np dep

Multiple Means

ANOVA

kruskal.test

friedman.test

Paired-Means

p

np

Paired Means

t.test with pairs

wilcox.test with pairs\n Mann-Whitney U Test

Single-Mean

p

p diff var

np

Single Mean

t.test

t.test \n Welch

wilcox.test

Inference for Means

Normality: Shapiro-Wilk Test shapiro.test\n Variances: Fisher F-test var.test\n Outliers: Box Plots

Means

References

  1. Chester Ismay and Albert Y. Kim. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. Available Online https://moderndive.com/index.html

  2. http://drafts.jsvine.com/the-magic-criteria/

  3. TihamƩr von Ghyczy, The Fruitful Flaws of Strategy Metaphors. Harvard Business Review, 2003. https://hbr.org/2003/09/the-fruitful-flaws-of-strategy-metaphors

  4. Daniel T. Kaplan, Statistical Models (second edition). Available online. https://dtkaplan.github.io/SM2-bookdown/

  5. Daniel T. Kaplan, Compact Introduction to Classical Inference, 2020. Available Online. https://dtkaplan.github.io/CompactInference/

  6. Daniel T. Kaplan and Frank Shaw, Statistical Modeling: Computational Technique. Available online https://www.mosaic-web.org/go/SM2-technique/

  7. Jonas Kristoffer LindelĆøv, Common statistical tests are linear models (or: how to teach stats) https://lindeloev.github.io/tests-as-linear/

Back to top

References

Stigler, Stephen M. 2016. ā€œThe Seven Pillars of Statistical Wisdom,ā€ March. https://doi.org/10.4159/9780674970199.
Statistical Inference
šŸŽ² Samples, Populations, Statistics and Inference

License: CC BY-SA 2.0

Website made with ā¤ļø and Quarto, by Arvind V.

Hosted by Netlify .