Wednesday, August 3, 2016

An Introduction to Convolutional Neural Networks


In one of my previous posts I showed how to build a neural network from scratch. Convolutional Neural Networks are a little more complex than the basic batch gradient descent model but the characteristics are largely the same. I will not be building a CNN from scratch (I would like to in the near future) instead, I want to go over the basics of how a CNN works, how it differentiates itself from a fully connected layer and why it preforms so well at computer vision tasks. I will provide another post where I will show how to code up a CNN which will get us a score of well over 99% accuracy on the MNIST training set. We can also try it out on some more interesting data sets as well.

In order to understand why CNN's are so powerful let's first look at some of the problems which undue the fully connected network. A fully connected neural network consists of every neuron in layer $l$ having a connection with every neuron in layer $l+1$. For us to create a flexible network which adapts well to new information we want to include many hidden layers. A deeper network means more neurons which in turn give us the ability to learn more precise features. The problem arises however when we realize that with each successive layer we are adding more and more free parameters. The network must learn each of these free parameters, therefore deeper and deeper networks quickly become computational issues.  As we build the network deeper learning becomes slower as all the extra free parameters must be updated on each iteration. Another problem is that backpropagation with certain activation functions has a tendency to update weights closer to the output much faster than weights nearer the inputs. This is what is known as the vanishing gradient problem. Both of these problems contribute to deep, fully connected nets learning rather slowly and failing to converge at the global minimum. The CNN provides a solution to both of these issues.