Slide View : Computer Vision : Fall 2023

Introduction to Neural Networks

Previous | Next --- Slide 28 of 130

Back to Lecture Thumbnails

yccc61 about a year ago

why we need activation function? Whats the difference bewteen different activation functions?

motoole2 about a year ago

Good question. It's actually essential for the perceptron to do anything interesting.

Consider the case where you're dealing with a MLP (multi-layer perceptron) where the activation functions are linear. That is, the output of a particular layer can be written in matrix form: $y = W x$, where the vector $x$ of length N is the input, the vector $y$ of length M is the output of M perceptrons, and $W$ is a matrix of weights. One row from the matrix $W$ corresponds to the weights used in one perceptron.

Now, if we were to stack multiple $K$ layers together, the result of your MLP would be $y = W^K W^{K-1} \cdots W^2 W^1 x$, which can be simplified as $y = \hat{W} x$. In other words, irrespective of the number of layers used, you would end up with a linear function for your perceptron!

The activation functions are therefore essential to avoid this from happening.