2025-06-28
Orange is a visual drag-and-drop tool for
and much more. You can download and install Orange from here:
Let us create some simple visualizations using Orange.
File Widget to import the iris dataset into your sessionData Table Widget to look at the data, and note its variable namesVisualization Widgets ( Scatter Plot, Bar Plot, and Distributions) to look at the properties of the variables, and examine relationships between them.Steven Stigler (2016) in “The Seven Pillars of Statistical Wisdom”:
Before we plot a single chart, it is wise to take a look at several numbers that summarize the dataset under consideration. What might these be? Some obviously useful numbers are:
Auto MPGFeature Statistics widgetDatasets widget to the Feature Statistics widgetIn the Sherlock Holmes story, The Adventure of the Dancing Men, a criminal known to one of the characters communicates with her using a childish/child-like drawing which looks like this:
| Variable #1 | Variable #2 | Chart Names | Chart Shape |
|---|---|---|---|
| Qual | None | Bar Chart |
OK, Let’s get some data to count:
And let’s for now use a pre-set Workflow in Orange
Author: Author of the book (Qual)Title: Title of the book (Qual)Origin: Origin of the Challenge (Qual)Type of Ban: Type of ban on the book (Qual)State: State in which the book was banned (Qual)District: District in which the book was banned (Qual)Origin: Origin of the Challenge (Qual)Note
Tourist: Any famous people born around here?
Guide: No sir, best we can do is babies.
OK, Let’s get some data to chart:
And let’s for now use a pre-set Workflow in Orange
Qualitative Variables
year: Year of birth (Qual)month: Month of the year (Qual)day_of_month: Day of the month (Qual)day_of_week: Day of the week (Qual)Quantitative Variables
births: Number of births on that day (Quant)| ABCDEFGHIJ0123456789 |
year <dbl> | month <dbl> | date_of_month <dbl> | day_of_week <dbl> | births <dbl> |
|---|---|---|---|---|
| 2000 | 1 | 1 | 6 | 9083 |
| 2000 | 1 | 2 | 7 | 8006 |
| 2000 | 1 | 3 | 1 | 11363 |
| 2000 | 1 | 4 | 2 | 13032 |
| 2000 | 1 | 5 | 3 | 12558 |
| 2000 | 1 | 6 | 4 | 12466 |
day_of_week, we can see how the number of births varies by day of the week.Qualitative Variables
rank: Rank of the academic (Qual)discipline: Discipline of the academic (Qual)sex: Male / FemaleQuantitative Variables
yrs.since.phd: Years since PhD (Quant). Can be Qual??salary: Salary of the academic (Quant)t-test / ANOVA would tell us if that is true.t-test and ANOVA report at the bottom.Let’s get the titanic data, using the Datasets widget in Orange.
There were 2201 passengers, as per this dataset.
And let’s use a pre-set Workflow in Orange
titanic| Variable #1 | Variable #2 | Chart Names | Chart Shape |
|---|---|---|---|
| Qual | Qual |
Here, area∼count, so the area of the tile is proportional to the count of observations in that tile.
Note
survived with sexSo sadly Jack is far more likely to have died than Rose.
Note
first class and Jack was third class. So again the odds are stacked against him.When differences between the actual and expected counts are large, we deduce that one Qual variable has an effect on the other Qual variable. (speaking counts-wise or ratio-wise)
| Chamber | Bullet |
|---|---|
| 1 | Y / N |
| 2 | Y / N |
| .. | … |
| 6 | Y / N |
And Gabbar’s Hypothesis?
Is ethnicity (as revealed by first names) a basis for racial discrimination, in the US?
This dataset was generated as part of a landmark research study done by Marianne Bertrand and Senthil Mullainathan.
Read the description therein to really understand how you can prove causality with a well-crafted research experiment.
| ABCDEFGHIJ0123456789 |
name <chr> | ethnicity <chr> | call <fct> |
|---|---|---|
| Allison | cauc | no |
| Kristen | cauc | no |
| Lakisha | afam | no |
| Latonya | afam | no |
| Carrie | cauc | no |
| Jay | cauc | no |
| ABCDEFGHIJ0123456789 |
ethnicity <chr> | call <fct> | n <int> |
|---|---|---|
| afam | yes | 157 |
| afam | no | 2278 |
| cauc | yes | 235 |
| cauc | no | 2200 |
| ABCDEFGHIJ0123456789 |
ethnicity <chr> | call_prop <dbl> |
|---|---|
| afam | 6.447639 |
| cauc | 9.650924 |
diffprop
0.108199
ethnicity does not matter.afam candidates were discriminated against.| Gabbar | Stats Teacher |
|---|---|
| “Kitne aadmi thay?” | How many observations do you have? n < 30 is a joke. |
| Kya Samajh kar aaye thay? Gabbar khus hoga? Sabaasi dega kya? | What are the levels in your Factors? Are they binary? Don’t do ANOVA just yet! |
| (Fires off three rounds ) Haan, ab theek hai! | Yes, now the dataset is balanced wrt the factor (Treatment and Control). |
| Is pistol mein teen zindagi aur teen maut bandh hai. Dekhte hain kisko kya milega. | This is our Research Question, for which we will Design an Experiment. |
| Hume kuchh nahi pataa! | Let us perform a non-parametric Permutation Test for this Factor! |
| Kamaal ho gaya! | Fantastic! Our p-value is so small that we can reject the NULL Hypothesis!! |
Arvind V.| VizChitra2025 | June 2025