Wednesday, August 3, 2016

An Introduction to Convolutional Neural Networks


In one of my previous posts I showed how to build a neural network from scratch. Convolutional Neural Networks are a little more complex than the basic batch gradient descent model but the characteristics are largely the same. I will not be building a CNN from scratch (I would like to in the near future) instead, I want to go over the basics of how a CNN works, how it differentiates itself from a fully connected layer and why it preforms so well at computer vision tasks. I will provide another post where I will show how to code up a CNN which will get us a score of well over 99% accuracy on the MNIST training set. We can also try it out on some more interesting data sets as well.

In order to understand why CNN's are so powerful let's first look at some of the problems which undue the fully connected network. A fully connected neural network consists of every neuron in layer $l$ having a connection with every neuron in layer $l+1$. For us to create a flexible network which adapts well to new information we want to include many hidden layers. A deeper network means more neurons which in turn give us the ability to learn more precise features. The problem arises however when we realize that with each successive layer we are adding more and more free parameters. The network must learn each of these free parameters, therefore deeper and deeper networks quickly become computational issues.  As we build the network deeper learning becomes slower as all the extra free parameters must be updated on each iteration. Another problem is that backpropagation with certain activation functions has a tendency to update weights closer to the output much faster than weights nearer the inputs. This is what is known as the vanishing gradient problem. Both of these problems contribute to deep, fully connected nets learning rather slowly and failing to converge at the global minimum. The CNN provides a solution to both of these issues.

Sunday, July 24, 2016

Cost Functions and the Backpropagation Derivation



There are dozens of ways to construct neural networks and it can be very difficult to decipher which are best used for which problems and why. However, while there are stark differences between different neural networks there are common elements to all. We will call these the vital elements. They are the DNA of the neural network. Each is different but each adheres to the same general principles. These vital elements are the cost function and the backpropagation calculation (the activation function is another vital element and we will discuss it somewhat here but I will get into more depth about the activation function in future posts). In this post I want to provide the general derivation of the backpropagation algorithm. The principles from this derivations are used to compute the gradients of all neural network. I will also provide a few of the more popular cost functions and there gradients at the conclusion of this post. We will not be going into great detail about the benefits of each but my hope is that by understanding the cost function and backpropagation in the general so we can better understand their applications in the specific.

Thursday, July 14, 2016

Building a Neural Network from Scratch

Code to follow along with

Neural networks are one the most powerful learning techniques and with Python libraries such as TensorFlow, Theano, Torch, etc. their implementation is only getting easier. Machine learning and neural networks are buzz words being thrown around a lot in business and computer science circles. People seem to think these ideas will unlock the big breakthrough in AI we have been waiting 60 years for. They may be right but before we crown machine learning as the prized jewel of 21st century ingenuity let's first figure out how they work. This will hopefully give us some intuition about what is and is not possible with machine learning. To a majority of the public neural networks are a black box. We want to get over that and hopefully understand some of the practical uses of neural networks, as well as some of the potential drawbacks. I believe that the best way to learn about something is to build it yourself, test it, break it and test it again. This will really challenge our understanding of the algorithm. In this post my goal is to show you how I built a neural network from scratch.