The Multilayer Perceptron

Published

November 23, 2024

Modified

January 4, 2025

What is a Multilayer Perceptron?

This was our bare bones Perceptron, or neuron as we will refer to it henceforth:

We add (one or more ) hidden layers to setup a Multilayer Perceptron:

Here, i1, i2, and i3 are input neurons: they are simply inputs and are drawn as circles in the literature.
The h1, h2, h3 are neuron in the so-called hidden layer; hidden because they are not inputs!
The neurons o1, o2, and o3 are output neurons.
The signals/information flows from left to right in the diagram. And we have shown every neuron connected to everyone in the next layer downstream.

How do we mathematically, and concisely, express the operation of the MLP? Let us setup a notation for the MLP weights.

\(l\) : layer index;
\(j\), \(k\) : neuron index in two adjacent layers
\(W^l_{jk}\) (i.e. \(W^{layer}_{{source}~{destn}}\)) : weight from \(j\)th neuron / \((l−1)\)th layer to \(k\)th neuron / \(l\)th layer;
\(b^l_k\) : bias of the \(k\)th neuron in the \(l\)th layer.
\(a^l_k\) : activation (output) of \(k\)th neuron / \(l\)th layer.

We can write the outputs of the layer-2 as:

\[ \begin{align} (k = 1): ~ a_{12} = sigmoid~(~\color{red}{W^2_{11}*a_{11}} + \color{skyblue}{W^2_{21}*a_{21}} + \color{forestgreen}{W^2_{31}*a_{31}} ~ + b_{12})\\ (k = 2): ~ a_{22} = sigmoid~(~W^2_{12}*a_{11} + W^2_{22}*a_{21} + W^2_{32}*a_{31}~ + b_{22} )\\ (k = 3): ~ a_{32} = sigmoid~(~W^2_{13}*a_{11} + W^2_{23}*a_{21} + W^2_{33}*a_{31}~ + b_{32})\\ \end{align} \]

In (dreaded?) matrix notation :

\[ \begin{bmatrix} a_{12}\\ a_{22}\\ a_{32}\\ \end{bmatrix} = sigmoid~\Bigg( \begin{bmatrix} \color{red}{W^2_{11}} & \color{skyblue}{W^2_{21}} & \color{forestgreen}{W^2_{31}}\\ W^2_{12} & W^2_{22} & W^2_{32}\\ W^2_{13} & W^2_{23} & W^2_{33}\\ \end{bmatrix} * \begin{bmatrix} \color{red}{a_{11}}\\ \color{skyblue}{a_{21}}\\ \color{forestgreen}{a_{31}}\\ \end{bmatrix} + \begin{bmatrix} b_{12}\\ b_{22}\\ b_{32}\\ \end{bmatrix} \Bigg) \]

In compact notation we write, in general:

\[ A^l = \sigma\Bigg(W^lA^{l-1} + B^l\Bigg) \]

\[ a^l_j=σ(\sum_kW^l_{jk} * a^{l−1}_k+b^l_j) \tag{1}\]

Using torch.

References

Tariq Rashid. Make your own Neural Network. PDF Online
Mathoverflow. Intuitive Crutches for Higher Dimensional Thinking. https://mathoverflow.net/questions/25983/intuitive-crutches-for-higher-dimensional-thinking