The Multilayer Perceptron
What is a Multilayer Perceptron?
This was our bare bones Perceptron, or neuron as we will refer to it henceforth:
We add (one or more ) hidden layers to setup a Multilayer Perceptron:
- Here,
i1
,i2
, andi3
are input neurons: they are simply inputs and are drawn as circles in the literature. - The
h1
,h2
,h3
are neuron in the so-called hidden layer; hidden because they are not inputs! - The neurons
o1
,o2
, ando3
are output neurons. - The signals/information flows from left to right in the diagram. And we have shown every neuron connected to everyone in the next layer downstream.
How do we mathematically, and concisely, express the operation of the MLP? Let us setup a notation for the MLP weights.
- \(l\) : layer index;
- \(j\), \(k\) : neuron index in two adjacent layers
- \(W^l_{jk}\) (i.e. \(W^{layer}_{{source}~{destn}}\)) : weight from \(j\)th neuron / \((l−1)\)th layer to \(k\)th neuron / \(l\)th layer;
- \(b^l_k\) : bias of the \(k\)th neuron in the \(l\)th layer.
- \(a^l_k\) : activation (output) of \(k\)th neuron / \(l\)th layer.
We can write the outputs of the layer-2
as:
\[ \begin{align} (k = 1): ~ a_{12} = sigmoid~(~\color{red}{W^2_{11}*a_{11}} + \color{skyblue}{W^2_{21}*a_{21}} + \color{forestgreen}{W^2_{31}*a_{31}} ~ + b_{12})\\ (k = 2): ~ a_{22} = sigmoid~(~W^2_{12}*a_{11} + W^2_{22}*a_{21} + W^2_{32}*a_{31}~ + b_{22} )\\ (k = 3): ~ a_{32} = sigmoid~(~W^2_{13}*a_{11} + W^2_{23}*a_{21} + W^2_{33}*a_{31}~ + b_{32})\\ \end{align} \]
In (dreaded?) matrix notation :
\[ \begin{bmatrix} a_{12}\\ a_{22}\\ a_{32}\\ \end{bmatrix} = sigmoid~\Bigg( \begin{bmatrix} \color{red}{W^2_{11}} & \color{skyblue}{W^2_{21}} & \color{forestgreen}{W^2_{31}}\\ W^2_{12} & W^2_{22} & W^2_{32}\\ W^2_{13} & W^2_{23} & W^2_{33}\\ \end{bmatrix} * \begin{bmatrix} \color{red}{a_{11}}\\ \color{skyblue}{a_{21}}\\ \color{forestgreen}{a_{31}}\\ \end{bmatrix} + \begin{bmatrix} b_{12}\\ b_{22}\\ b_{32}\\ \end{bmatrix} \Bigg) \]
In compact notation we write, in general:
\[ A^l = \sigma\Bigg(W^lA^{l-1} + B^l\Bigg) \]
\[ a^l_j=σ(\sum_kW^l_{jk} * a^{l−1}_k+b^l_j) \tag{1}\]
MLPs in Code
Using torch
.
References
- Tariq Rashid. Make your own Neural Network. PDF Online
- Mathoverflow. Intuitive Crutches for Higher Dimensional Thinking. https://mathoverflow.net/questions/25983/intuitive-crutches-for-higher-dimensional-thinking