Convolutional, Recurrent Neural Networks

Abhijeet Kamble
13 min readApr 22, 2019

Automated recommendations are everywhere: Netflix, Amazon, YouTube, and more. The other day I asked my google assistant to play some music for me, I was shocked to see it playing my favourite tracks from youtube. Recommender systems learn about our unique interests and show the products or content they think we’ll like the best. Here, we can go through the basics of how to build your own recommender systems from one of the pioneers in the field. We will explore recommendation algorithms based on neighborhood-based collaborative filtering and more modern techniques, including matrix factorization and even deep learning with artificial neural networks. We will also take into account the real-world challenges of applying these algorithms at a large scale with real-world data, test algorithms and building your own neural networks using technologies like Amazon DSSTNE, AWS SageMaker, and TensorFlow.

We have all heard about AI, machine learning and deep learning. Well, at this point we know what they mean, but, how would you go about applying some of this knowledge in reality? We will get a broad perspective of these artificial intelligence terms and how you could conceptually design a neural network.

Now, let’s try the process of designing a neural network, what are inputs and outputs, neurons and synapses. Finally, we’ll apply this knowledge by designing a neural network putting pen to paper and challenge to design our own. So, if you’re ready to explore how to design a neural network, get your pen and paper ready and let’s get started.

— — — -Now that we have a good understanding of what AI is, let’s explore what machine learning is specifically. As mentioned before, it is a branch of AI, but machine learning is the ability to learn without being explicitly programmed. Well, what does that mean? In essence, computers or CPUs are pretty dumb if we don’t program them. Without AI a computer usually works in a simple input to output paradigm. We input a command, either through programming or UI, open Word for example, and then the computer responds with opening the program, the output.

Another example is your calculator, we input two times five, and the calculator outputs 10 as the response. So this simple input to output process is how computers used to work before AI started to permeate every piece of software. In a machine learning paradigm, there is another factor added to this input to output equation, the learning part. We have an input, learning model, and then the output. In this paradigm, the machine learns from your inputs and makes better output over time.

Let me demonstrate with a simple Google search. If you were to type and make the mistake to type in the Google search’s query what is the biggest dessert in the world? But you meant desert. In a simple input to output paradigm, Google may show you the biggest cookie or cake in the world, but because Google is built with AI at its core, it will have inferred that when someone is asking for the biggest dessert in the world, they probably are looking for desert, not food.

And this is where machine learning comes into play, over years of gathering data or training the ML model, or machine learning model, Google has learned that in most cases when someone is searching for biggest dessert in the world, they truly mean desert. And that is the important item of machine learning, it needs to be trained for its model to be efficient, accurate, or in other words intelligent. A machine learning paradigm needs to be fed with hundreds, thousands of more data sets to work.

A great example is when DeepMind trained their machine learning computer to play Go, a Chinese board game. They spent hundreds of hours of feeding their machine all types of plays before the machine was able to predict what move to do next. So this is what machine learning is.

— — Deep learning is one approach to doing machine learning, but one of the most popular ones. It was inspired by the structure of the brain, connecting many neurons to mimic the composition of the brain. Depth of learning is achieved by having each layer of neurons to focus on specific learning. For example, a set of neurons were to focus on handwriting recognition, therefore, the common-use term of neural networks. In summary, a neural network is made of an input layer, a hidden layer, and an output layer.

Each layer includes multiple nodes, or neurons, and dictate the input, make inferences from those inputs in the hidden layers, and then outputs the results. The synapses are the connections in between all these neurons, pretty much like the brain. If we compare the typical way a computer thinks, we give them input and then an output is generated. But if you’d like to be able to determine how many days it will take to lose 40 pounds based on how many hours you sleep and how many hours you workout, you need a better way to handle this type of problem.

This is where deep learning, or a neural network would be useful. So the input nodes would be the hours slept and spent in the gym, then inside of our neural network, we would have many nodes computing and having a weight value. And then pass on the value to another node until we get a result and pass it to the output layer, which would give us the number of pounds we could lose based on the inputs we fed to the network. Like machine learning, we would need to feed our neural network with data so it can learn from previous examples.

And then through the weight applied to each node, the neural network would calculate the impact of each input nodes and render an output which would be an estimate of how many pounds we could lose based on the inputs we placed into the neural network.

— — — As with any applications build, it is as important, if not more, to design or sketch your neural network or machine learning on paper before doing a single line of code, or start feeding your network with data. Consider that truly understanding your problem, the type of data you have access to, what type of neural network you’ll use to get the proper results will go a long way and potentially avoid many problems in being accurate down the road.

First things first, if you feed your model with data as the first step, It’s the same when you start coding an application without having a good idea of the proper features you’d like to have in your application or designing and prototyping your application before going down the route of coding. It prevents many hours of painful regression after you’ve started developing your application or neural network.

The importance of brainstorming or how do we think about the problem we want to solve, the proper data sets that will allow us to get efficient results, what are the weights of each node, is it better to leverage a forward or a backward propagation network, et cetera. You need to think about all these items before a single line of code will be written, or at least have a brief idea and this is what this post is about.

Before we start designing our neural network let’s take an overview of the process. First and most importantly we’ll start with a problem. What is it that we want to answer in this neural network? Then we’ll continue by defining specifically what should be the output of this neural network. What is the most accurate data output to answer our problem? Next, we’ll explore datasets, which is how successful you’ll be at getting your answers. What you feed into your neural network or any AI base computational models will determine the success of our results.

Then we’ll go over the neurons and synapses and how they impact each other with their own weights. Explore how the hidden layers will impact your final result. Finally, we’ll explore the type of neural networks, the RNN and CNN and what other types of networks you can leverage, which ones make the most sense for your specific needs, et cetera.

— — — In machine learning, before we even start to decide which data sets will be fed into our network we need to determine first what is the problem or question we’d like to have answered. It needs to be clear and precise, so everything else we put into our AI model is aligned with this question. In machine learning concepts we also call this a label, or the Y variable, what we are trying to predict. In the spam filtering program, we would define this as spam or not spam.

In an image recognition program, we would try to determine what is this image, you get the idea. So, as you start designing your neural network, make sure first you have a precise idea of what is the Y variable, or label, or the question you’re trying to answer. In complex neural networks, you might have multiple labels. For example, in an image recognition program, we would try to determine, is this image a bridge, a road, a sign, etc, these are multiple labels.

For this entire post, I will continue with this example of an image recognition neural network so it helps us visualize all the steps. So, in this case, we’ll ask ourselves as the specific question, or Y variable, what are we seeing in this image. As human beings we know we are seeing a road, a sign, and a bridge, not a dog or a cat. So, the Y variables, or labels, we would need to answer are in this image do we have a bridge, a sign, a road, a cat, or a dog, and determine if they are in this image.

This is an oversimplification of the labels or Y variables, but in an image recognition program, it would need to be trained to recognize the proper elements in it and none of the hundreds of thousands of other items that could be in this picture.

— — — — Now that we have labels or the output that we seek in our program, what about the input? What do we feed our neural network with? Data sets are crucial in the success of your network as the more you feed it, the better the results. That’s what we call training our network. Like our brains, we need to train the network to recognize patterns to be able to make the right assumptions. When we were born, we weren’t able to talk right away. We had to learn and make associations between specific sounds meaning specific things.

A neural network works pretty much in the same way. We feed it with data sets, and over time, it learns to recognize specific patterns that mean specific things. And because computers think in data, that’s what we need to feed it with. So if you feed a computer with intangible things like an image, it needs to be converted into data. So if we look at our image example, it would need to be converted into values, and typically images would be converted into pixel values.

pixels being converted to data

So each pixel would be a value of whatever. For example, the pixels here show numbers that represent the color of that pixel. So as we feed the neural networks with thousands or millions of images, over time the computer will recognize that when you have a block of thousands of pixels in a specific pattern, for example, lots of grey with some yellow or white close together, the probability of it being a road is at 80% versus a cat or a dog.

Therefore, the importance of feeding the neural network with thousands or millions of data sets, so it can learn to predict better than a specific image that has a bridge or a road versus a cat or a dog.

— — —-The neurons and synapses are what will make your machine successful or not in achieving the accurate responses to your questions. Each neuron in a network has an impact on the result and the impact will be determined by the training of the network. The synapses are the connections between neurons. So, if you ever looked into medical references, this is like a brain. In neural network or machine learning concepts, each neuron is a feature.

A feature is simply an item that impacts the evaluation of the dataset towards the end result. It is also a machine learning formula considered the X factor. So here’s . an example to make things a bit clearer. Consider the email spam example, their features would be sender’s address, time of day received, words in the subject, words in the email, an email containing specific words. Each of these would be considered a feature and over time have a weight on how it impacts the email as being spam or not.

So for example, an email containing the word sale could have a bigger impact on an email being spam over the time an email was sent. And therefore have a bigger impact on the probability of an email being spam when our network qualifies them as spam or not. Now let’s introduce a Model. The model is the relationship in between our labels, Y, and the features which are X.

So the model is the entire formula that is being trained to become better at learning if the X features represent a Y label or not. So if we look at our spam model, all the X features will have an impact on Y, if it is spam or not. The same for the image recognition program. The Y labels would be cat, dog, road, sign, bridge, and the X would be the combination of pixel colors, and if their combination is a probability of being any labels or the Ys.

— — Let us familiarise with another concept of a neural network, forward and backward propagation. They are mysterious terms that translate the direction, ie, does the network learn only in a forward manner, or from the results too. So if we take the forward propagation first. The network would only learn from the inputs or data sets it gets, and the results would reflect this approach. They are also called feet forward, or convolutional neural network, or CNN. They are typically used in pattern recognition, like the image recognition example we’ve been using since the beginning.

CNN

So in these type of scenarios, we’d feed data and train our network in a forward motion and over time the network would get better at recognizing patterns in the data we feed it and be better at predicting if an image is more of a dog or a road.

Next, you have backward propagation, which is also called a recurrent neural network, or RNNs. In this type of network, the network trains itself both ways, in a loop manner.

RNN

It would not only learn from the data sets we feed it, but also from the results and back. This is typically a great approach for speech or handwriting recognition and is applied to tools like Google Assistant or Siri. This is especially useful when a network needs to learn from errors or mistakes provided in the results. So needless to say that backwards propagation could be useful in most situations but might be a bit over the top for certain situations. The key difference between a CNN and an RNN is that the Recurrent network has an index of time to it.

Okay, now that we’ve explored some theory around neural networks, let’s brainstorm our own. I’d create my own neural network, so if there is a specific problem your company is facing or a type of information you’d like to have a neural network delivering to you, then apart from coding, you should also brainstorm.

Let’s think of a quick scenario, A company has created a social network around its products on their website, and anyone can register and post product comments or reviews in this section, but as of late, this particular section has become a ground for bad behaviour where people are saying profanities to each other. So in this first step of the challenge, I would determine what is the problem and what would be the labels we should assign to our neural network.

So we have a situation where a social section of our products where profanities or bad language is used. We need to be able to use neural networks to clean this. So our problem or the Y output label is fairly simple. Does the text entered contain profane words or not? As we enter data into our network, it will learn to determine if Y is true or not. Therefore, is text entered into the review passing our filters or not? Okay, so now that we have our Y label defined, next comes the X or the features of the network.

We also want to determine if this neural network is better with a forward or backward propagation.

The final solution was to determine which features, or X, or what exactly would impact the Y label of the problem. For the features or X variables, I define the following that would impact my Y label, is this text using profanities or not, use of specific words, a combination of a few words, use of specific emojis and so on, this list could get more exhaustive. And as we train our model to define these types of words, emojis, and combination of words, as profane words are not, it will start to filter out the bad comments or reviews.

So which type of network would I use in this case? I think both CNN or RNN might be efficient in this case, but if you’d like the neural network to learn from the results, and loop through the layers to have better results, then yes, the recurrent neural network might be a great choice.

The next thing we should look at is the Long Short Term Memory, RNN.

This is the advanced version of a RNN, where a small memory is added to it.

--

--