Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Math Models for Creative Coders
  3. AI
  4. The Multilayer Perceptron
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • 🃏 Inference for a Single Mean
        • 🃏 Inference for Two Independent Means
        • 🃏 Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • 🃏 Testing a Single Proportion
        • 🃏 Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • 🕔 Modelling and Predicting Time Series
      • Predictive Modelling
        • 🐉 Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • 📐 Intro to Linear Programming
        • 💭 The Simplex Method - Intuitively
        • 📅 The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children's Games
        • Valentine’s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaire’s Disease in the USA
        • Antarctic Sea ice
        • William Farr's Observations on Cholera in London
    • R for Artists and Managers
      • 🕶 Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Hole…
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy toves…
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sir…which way to the Secret Garden?
      • Lab-10: An Invitation from the Queen…to play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: You’re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • 🐉 Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • 🕔 Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • 🕶 Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • 🧭 Using Idyll
      • 🧭 Using Apparatus
      • 🧭 Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • What is a Multilayer Perceptron?
  • Wait, But Why?
  • MLPs in Code
  • References
  1. Teaching
  2. Math Models for Creative Coders
  3. AI
  4. The Multilayer Perceptron

The Multilayer Perceptron

Published

November 23, 2024

Modified

May 17, 2025

What is a Multilayer Perceptron?

This was our bare bones Perceptron, or neuron as we will refer to it henceforth:

yk=sign ( ∑k=1nWk∗xk+b )

For the multi-layer perceptron, two changes were made:

  • Changing the hard-threshold activation into a more soft sigmoid activation

  • addition of (one or more ) hidden layers.

Let us discuss these changes in detail.

What is the Activation Block?

  • We said earlier that the weighting and adding is a linear operation.
  • While this is great, simple linear translations of data are not capable of generating what we might call learning or generalization ability.
  • The outout of the perceptron is a “learning decision” that is made by deciding if the combined output is greater or smaller than a threshold.
  • We need to have some non-linear block to allow the data to create nonlinear transformations of the data space, such as curving it, or folding it, or creating bumps, depressions, twists, and so on.

Activation

Activation
  • This nonlinear function needs to be chosen with care so that it is both differentiable and keeps the math analysis tractable. (More later)
  • Such a nonlinear mathematical function is implemented in the Activation Block.
  • See this example: red and blue areas, which we wish to separate and classify these with our DLNN, are not separable unless we fold and curve our 2D data space.
  • The separation is achieved using a linear operation, i.e. a LINE!!
Figure 1: From Colah Blog, used sadly without permission
  • For instance in Figure 2, no amount of stretching or compressing of the surface can separate the two sets ( blue and red ) using a line or plane, unless the surface can be warped into another dimension by folding.

What is the Sigmoid Function?

The hard-threshold used in the Perceptron allowed us to make certain decisions based on linear combinations of the input data. But what is the dataset possesses classes that are not separable in a linear way? What if different categories of points are intertwined with a curved boundary between classes?

We need to have some non-linear block to allow the data to create nonlinear transformations of the data space, such as curving it, or folding it, or creating bumps, depressions, twists, and so on.

Activation

Activation
  • This nonlinear function needs to be chosen with care so that it is both differentiable and keeps the math analysis tractable. (More later)
  • Such a nonlinear mathematical function is implemented in the Activation Block.
  • See this example: red and blue areas, which we wish to separate and classify these with our DLNN, are not separable unless we fold and curve our 2D data space.
  • The separation is achieved using a linear operation, i.e. a LINE!!
Figure 2: From Colah Blog, used sadly without permission
  • For instance in Figure 2, no amount of stretching or compressing of the surface can separate the two sets ( blue and red ) using a line or plane, unless the surface can be warped into another dimension by folding.

So how do we implement this nonlinear Activation Block?

  • One of the popular functions used in the Activation Block is a function based on the exponential function ex.
  • Why? Because this function retains is identity when differentiated! This is a very convenient property!

Sigmoid Activation

Sigmoid Activation
NoteRemembering Logistic Regression

Recall your study of Logistic Regression. There, the Sigmoid function was used to model the odds of the (Qualitative) target variable against the (Quantitative) predictor.

NoteBut Why Sigmoid?

Because the Sigmoid function is differentiable. And linear in the mid ranges. Oh, and remember the Chain Rule?

df(x)dx=ddx∗11+e−x =−(1+e−x)−2∗ddx(1+e−x)  (Using Chain Rule)=−(1+e−x)−2∗(−e−x)=e−x(1+e−x)2=(1+e−x)−1(1+e−x)2=11+e−x∗(1+e−x1+e−x−11+e−x)  and therefore df(x)dx=f(x)∗(1−f(x))

What are Hidden Layers?

The MLP adds several layers of perceptrons, in layers, as shown below:


  • Here, i1, i2, and i3 are input neurons: they are simply inputs and are drawn as circles in the literature.
  • The h1, h2, h3 are neuron in the so-called hidden layer; hidden because they are not inputs!
  • The neurons o1, o2, and o3 are output neurons.
  • The signals/information flows from left to right in the diagram. And we have shown every neuron connected to everyone in the next layer downstream.

How do we mathematically, and concisely, express the operation of the MLP? Let us setup a notation for the MLP weights.

  • l : layer index;
  • j, k : neuron index in two adjacent layers
  • Wjkl (i.e. Wsource destnlayer) : weight from jth neuron / (l−1)th layer to kth neuron / lth layer;
  • bkl : bias of the kth neuron in the lth layer.
  • akl : activation (output) of kth neuron / lth layer.


We can write the outputs of the layer-2 as:

(k=1): a12=sigmoid ( W112∗a11+W212∗a21+W312∗a31 +b12)(k=2): a22=sigmoid ( W122∗a11+W222∗a21+W322∗a31 +b22)(k=3): a32=sigmoid ( W132∗a11+W232∗a21+W332∗a31 +b32)

In (dreaded?) matrix notation :

[a12a22a32]=sigmoid ([W112W212W312W122W222W322W132W232W332]∗[a11a21a31]+[b12b22b32])

In compact notation we write, in general:

Al=σ(WlAl−1+Bl)

(1)ajl=σ(∑kWjkl∗akl−1+bjl)

Wait, But Why?

  • The “vanilla” perceptron was big advance in AI and learning. However, it was realized that this can only make classification decisions with data that are linearly separable.
  • Including a differentiable non-linearity in the activation block allows us to deform the coordinate space in which the data points are mapped.
  • This deformation may permit unique views of the data wherein the categories of data are separable by an n-dimensional plane.
  • This idea is also used in a machine learning algorithm called Support Vector Machines.

MLPs in Code

  • Using p5.js
  • Using R

Using torch.

References

  1. Tariq Rashid. Make your own Neural Network. PDF Online
  2. Mathoverflow. Intuitive Crutches for Higher Dimensional Thinking. https://mathoverflow.net/questions/25983/intuitive-crutches-for-higher-dimensional-thinking
  3. 3D MatMul Visualizerhttps://bhosmer.github.io/mm/ref.html
Back to top
The Perceptron
MLPs and Backpropagation

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .