The Perceptron

Published

November 20, 2024

Modified

January 6, 2025

Inspiration

What is a Perceptron?

The perceptron was invented by Frank Rosenblatt is considered one of the foundational pieces of neural network structures. The output is viewed as a decision from the neuron and is usually propagated as an input to other neurons inside the neural network.

Math Intuition

We can imagine this as a set of inputs that averaged in weighted fashion.

\[ y_k = sigmoid~(~\sum_{k=1}^n W_k*x_k + b~) \]

Since the inputs are added with linear weighting, this effectively acts like a linear transformation of the input data.
- A linear equation of this sort is the equation of an n-dimensional plane.
- If we imagine the input as representing the n-coordinates in a plane, then the multiplications scale/stretch/compress the plane, like a rubber sheet. (But do not fold it.)
- If there were only 2 inputs, we could mentally picture this.
More metaphorically, it seems like the neuron is consulting each of the inputs, asking for their opinion, and then making a decision by attaching different amounts of significance to each opinion.
The Structure should remind you of Linear Regression !

Why “Linear”?

Why are (almost) all operations linear operations in a NN?

We said that the weighted sums are a linear operation, but why is this so?
We wish to be able to set-up analytic functions for performance of the NN, and be able to differentiate them to be able to optimize them.
Non-linear blocks, such as threshold blocks/signum-function based slicers are not differentiable and we are unable to set up such analysis.
Note the title of this reference.

Why is there a Bias input?

We want the weighted sum of the inputs to mean something significant, before we accept it.
The bias is subtracted from the weighted sum of inputs, and the bias input could also (notionally) have a weight.
The bias is like a threshold which the weighted sum has to exceed; if it does, the neuron is said to fire.

What is the Activation Block?

We said earlier that the weighting and adding is a linear operation.
While this is great, simple linear translations of data are not capable of generating what we might call learning or generalization ability.
We need to have some non-linear block to allow the data to create nonlinear transformations of the data space, such as curving it, or folding it, or creating bumps, depressions, twists, and so on.

This nonlinear function needs to be chosen with care so that it is both differentiable and keeps the math analysis tractable. (More later)
Such a nonlinear mathematical function is implemented in the Activation Block.
See this example: red and blue areas, which we wish to separate and classify these with our DLNN, are not separable unless we fold and curve our 2D data space.
The separation is achieved using a linear operation, i.e. a LINE!!

Figure 1: From Colah Blog, used sadly without permission

For instance in Figure 1, no amount of stretching or compressing of the surface can separate the two sets ( blue and red ) using a line or plane, unless the surface can be warped into another dimension by folding.

What is the Sigmoid Function?

So how do we implement this nonlinear Activation Block?

One of the popular functions used in the Activation Block is a function based on the exponential function \(e^x\).
Why? Because this function retains is identity when differentiated! This is a very convenient property!

Remembering Logistic Regression

Recall your study of Logistic Regression. There, the Sigmoid function was used to model the odds of the (Qualitative) target variable against the (Quantitative) predictor.

But Why Sigmoid?

Because the Sigmoid function is differentiable. And linear in the mid ranges. Oh, and remember the Chain Rule?

\[ \begin{align} \frac{df(x)}{dx} &= \frac{d}{dx} * \frac{1}{1 + e^{-x}} \\\ &= -(1 + e^{-x})^{-2} * \frac{d}{dx}(1 + e^{-x})~~\text{(Using Chain Rule)}\\ &= -(1 + e^{-x})^{-2} * (-e^{-x})\\ &= \frac{(1 + e^{-x}) -1}{(1 + e^{-x})^{2}}\\ &= \frac{1}{1 + e^{-x}} * \Bigg({\frac{1 + e^{-x}}{1 + e^{-x}}} - \frac{1}{1 + e^{-x}}\Bigg)\\ &= f(x) * (1 - f(x))\\ \end{align} \]

So with all that vocabulary, we might want to watch this longish video by the great Dan Shiffman:

Let us try a simple single layer NN in R. We will use the R package neuralnet.

Show the Code

# Load the package
# library(neuralnet)

# Use iris
# Create Training and Testing Datasets
df_train <- iris %>% slice_sample(n = 100)
df_test <- iris %>% anti_join(df_train)
head(iris)

Show the Code

# Create a simle Neural Net
nn <- neuralnet(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
  data = df_train,
  hidden = 0,
  # act.fct = "logistic", # Sigmoid
  linear.output = TRUE
) # TRUE to ignore activation function

# str(nn)

# Plot
plot(nn)

# Predictions
# Predict <- compute(nn, df_test)
# Predict
# cat("Predicted values:\n")
# print(Predict$net.result)
#
# probability <- Predict$net.result
# pred <- ifelse(probability > 0.5, 1, 0)
# cat("Result in binary values:\n")
# pred %>% as_tibble()

To Be Written Up.

References

The Neural Network Zoo - The Asimov Institute. http://www.asimovinstitute.org/neural-network-zoo/
It’s just a linear model: neural networks edition. https://lucy.shinyapps.io/neural-net-linear/
Neural Network Playground. https://playground.tensorflow.org/
Rohit Patel (20 Oct 2024). Understanding LLMs from Scratch Using Middle School Math: A self-contained, full explanation to inner workings of an LLM. https://towardsdatascience.com/understanding-llms-from-scratch-using-middle-school-math-e602d27ec876
Machine Learning Tokyo: Interactive Tools for ML/DL, and Math. https://github.com/Machine-Learning-Tokyo/Interactive_Tool
Anyone Can Learn AI Using This Blog. https://colab.research.google.com/drive/1g5fj7W6QMER4-03jtou7k1t7zMVE9TVt#scrollTo=V8Vq_6Q3zivl
Neural Networks Visual with vcubingx
- Part 1. https://youtu.be/UOvPeC8WOt8
- Part 2. https://www.youtube.com/watch?v=-at7SLoVK_I
Practical Deep Learning for Coders: An Online Free Course.https://course.fast.ai

Text Books

Michael Nielsen. Neural Networks and Deep Learning, a free online book. http://neuralnetworksanddeeplearning.com/index.html
Simone Scardapane. (2024) Alice’s Adventures in a differentiable Wonderland.https://www.sscardapane.it/alice-book/

Using R for DL

David Selby (9 January 2018). Tea and Stats Blog. Building a neural network from scratch in R. https://selbydavid.com/2018/01/09/neural-network/
torch for R: An open source machine learning framework based on PyTorch. https://torch.mlverse.org
Akshaj Verma. (2020-07-24). Building A Neural Net from Scratch Using R - Part 1 and Part 2. https://rviews.rstudio.com/2020/07/20/shallow-neural-net-from-scratch-using-r-part-1/ and https://rviews.rstudio.com/2020/07/24/building-a-neural-net-from-scratch-using-r-part-2/

Maths

Parr and Howard (2018). The Matrix Calculus You Need for Deep Learning.https://arxiv.org/abs/1802.01528

R Package Citations

Package	Version	Citation
neuralnet	1.44.2	@neuralnet