Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Math Models for Creative Coders
  3. AI
  4. MLPs and Backpropagation
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • 🃏 Inference for a Single Mean
        • 🃏 Inference for Two Independent Means
        • 🃏 Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • 🃏 Testing a Single Proportion
        • 🃏 Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • 🕔 Modelling and Predicting Time Series
      • Predictive Modelling
        • 🐉 Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • 📐 Intro to Linear Programming
        • 💭 The Simplex Method - Intuitively
        • 📅 The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children's Games
        • Valentine’s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaire’s Disease in the USA
        • Antarctic Sea ice
        • William Farr's Observations on Cholera in London
    • R for Artists and Managers
      • 🕶 Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Hole…
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy toves…
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sir…which way to the Secret Garden?
      • Lab-10: An Invitation from the Queen…to play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: You’re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • 🐉 Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • 🕔 Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • 🕶 Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • 🧭 Using Idyll
      • 🧭 Using Apparatus
      • 🧭 Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • How does an MLP Learn?
  • What is the Learning Process?
    • What is the Output Error?
    • What is the Cost Function?
    • What is Backpropagation of Error?
  • Here Comes the Rain Maths Again!
  • Backpropagation in Code
  • References
  1. Teaching
  2. Math Models for Creative Coders
  3. AI
  4. MLPs and Backpropagation

MLPs and Backpropagation

Published

November 23, 2024

Modified

May 17, 2025

How does an MLP Learn?

We saw how each layer works:

(1)[a12a22a32]=sigmoid ([W112W212W312W122W222W322W132W232W332]∗[a11a21a31]+[b12b22b32])

and:

(2)Al=σ(WlAl−1+Bl)

See how the connections between neurons are marked by weights: these multiply the signal from the previous neuron. The multiplied/weighted products are added up in the neuron, and the sum is given to the activation block therein.

So learning?

The only controllable variables in a neural network are these weights! So learning involves adapting these weights so that they can perform a useful function.

What is the Learning Process?

The process of adapting the weights of a neural network can be described in the following steps:

  • Training Set: Training is over several known input-output pairs (“training data”)
  • Training Epoch: For each input, the signals propagate forward until we have an output
  • Error Calculation: Output is compared with desired output, to calculate error
  • Backpropagation: Each neuron (and its weights) need to be told what is their share of the error! Errors therefore need to be sent backward from the output to input, unravelling them from layer l to layer l−1. (like apportioning blame !!).
  • Error-to-Cost: How does error at any given neuron relate to the idea of an overall Cost function? Is the Cost function also apportioned in the same way?
  • Differentiate: Evaluate the effect of each weight/bias on the (apportioned) error overall Cost. (Slope!!)
  • Gradient Descent: Adapt the weights/biases with a small step in the opposite direction to the slope.

There.

What is the Output Error?

If d(k) are the desired outputs of the NN (over an entire training set), and y(k) are the outputs of the output layer, then we calculate the error at the outputs of the NN as:

(3)e(k)=a(k)−d(k)

This error is calculated at each output for each training epoch/sample/batch. (More about the batch-mode in a bit.)

What is the Cost Function?

We define the cost or objective function as the squared error averaged over all neurons:

(4)C(W,b)=12n∑i=1n neuronse2(i)=12n∑k=1n neurons(ai−di)2

The ais are the outputs of n neurons and di are the desired outputs for each of the training samples.

The Cost Function is of course dependent upon the Weights and the biases, and is to be minimized by adapting these. Using the sum of squared errors, along with the linear operations in the NN guarantees that the Cost Function (usually) has one global, minimum.

What is Backpropagation of Error?

As we stated earlier, error is calculated at the output. In order to adapt all weights, we need to send error proportionately back along the network, towards the input. This proportional error will give us a basis to adapt the individual weights anywhere in the network.

What does “proportional” mean here? Consider the diagram below:

e11  ∼∼   e12∗W11Sum of Weights to h1h1e21  ∼∼   e12∗W21Sum of Weights to h1h1e31  ∼∼   e12∗W31Sum of Weights to h1h1

e11  ∼∼   e12∗W11W11+W21+W31W11+W21+W31e21  ∼∼   e12∗W21W11+W21+W31W11+W21+W31e31  ∼∼   e12∗W31W11+W21+W31W11+W21+W31

These are the contributions of the error e12 to each of the previous neurons.

Another way of looking at this:

e11= e12∗W11Sum of weights to h1h1+ e22∗W21Sum of Weights to h2h2+ e32∗W31Sum of Weights to h3h3

e11= e12∗W11W11+W21+W31W11+W21+W31+ e22∗W12W12+W22+W32W12+W22+W32+ e32∗W13W13+W23+W33W13+W23+W33

e21=similar expression!! e31=similar expression!!

Equation corrected by Gayatri Jadhav, April 2025

This is the total error at e11 from all the three output errors. So:

  • We have taken each output error, e∗2 and parcelled it back to the preceding neurons in proportion to the connecting Weight. This makes intuitive sense; we are making those neurons put their money where their mouth is. As Nassim Nicholas Taleb says, people (and neurons!) need to pay for their opinions, especially when things go wrong!
  • The accumulated error at each neuron in layer l−1 is the weighted sum of back-propagated error contributions from all layer l neurons to which we are connected.
  • So we can compactly write the relationships above as:

[e11e21e31]=([W11D11W12D12W13D13W21D21W22D22W23D23W31D31W32D32W33D33]∗[e12e22e32])

The denominators make things look complicated! But if we are able to simply ignore them for a moment, then we see a very interesting thing:

[e11e21e31]∼∼[W11W12W13W21W22W23W31W32W33]∗[e12e22e32]

This new approximate matrix is the tranpose of our original Weight matrix from Equation 1! The rows there have become columns here!! That makes intuitive sense: in the forward information direction, we were accounting for information from the point of view of the destinations; in the reverse error backpropagation direction, we are accounting for information from the point of view of the sources.

Writing this equation in a compact way:

(5)el−1 ∼∼ WlTT∗el

This is our equation for backpropagation of error.

Why is ignoring all those individual denominators justified? Let us park that question until we have understood the one last step in NN training, the Gradient Descent.

Here Comes the Rain Maths Again!

Now, we are ready (maybe?) to watch these two very beautifully made videos on Backpropagation. One is of course from Dan Shiffman, and the other from Grant Sanderson a.ka. 3Blue1Brown.

Backpropagation in Code

  • Using p5.js
  • Using R

Using torch.

References

  1. Tariq Rashid. Make your own Neural Network. PDF Online
  2. Mathoverflow. Intuitive Crutches for Higher Dimensional Thinking. https://mathoverflow.net/questions/25983/intuitive-crutches-for-higher-dimensional-thinking
  3. Interactive Backpropagation Explainer https://xnought.github.io/backprop-explainer/
Back to top
The Multilayer Perceptron
Gradient Descent

License: CC BY-SA 2.0

Website made with ❤️ and Quarto, by Arvind V.

Hosted by Netlify .