Thursday, June 18, 2015

Why there is a big unsupervised learning shaped whole in the universe

The problems I had when I first started reading about unsupervised learning was, I didn't understand why it needed to exist. Supervised learning makes sense, you give a neural net an input and output and tell it "find a way to get from one to the other". Unsupervised learning doesn't have that. You are just giving it some data and saying "do... something". Even the diagram were confusing, I see input nodes, I see hidden nodes, what's the output?

The best way to get across the need for unsupervised learning is too talk about the end goal. Across the first year of a babies life it learns a huge amount of things. How to focus it's eyes, how to coordinate it's limbs, how to distinguish it's parents from other adults, that objects are permanent things. Until it learns to talk it is getting almost no feedback about any of this, yet it still learns. Getting good training data is hard, so the idea is: wouldn't it be great if we could set up a machine and it would just go off and learn on it's own.

Unfortunately so far we are a long way from that and the technique shown here seems trivial compared to that goal. But the goal is interesting enough that it is worth pursuing. The answer to the question "what am I asking an unsupervised network to do?" is "learn the data".  The output will be a representation of the data that is simpler than the original. If the input is 10,000 pixels of an image the output can be any smaller number. What a lot of the simpler unsupervised nets do is transform into a single number that represents groups of similar sets of inputs. These are called clusters.

An example competitive learning neural net

A competitive learning neural net attempts groups it's inputs into clusters. The code for it is really very simple. Here is the all that is needed in the constructor(if you don't like Python it is also available in C#, Java and F#):

from random import uniform

class UnsupervisedNN(object):
   def __init__(self, size_of_input_arrays, number_of_clusters_to_group_data_into):
     #we have 1 hidden node for each cluster
     self.__connections = [[uniform(-0.5, 0.5) for j in range(number_of_clusters_to_group_data_into)] for i in range(size_of_input_arrays)]  
     self.__hidden_nodes = [0.0]*number_of_clusters_to_group_data_into  

When we give it an input, it will activate the hidden nodes based on the sum of the connections between that input and each hidden node. It makes more sense in code, like so:

def feed_forward(self, inputs):  
     #We expect inputs to be an array of floats of length size_of_input_arrays.
     for hidden_node_index in range(len(self.__hidden_nodes)):  
       activation = 0.0
       #each hidden node will be activated from the inputs.  
       for input_index in range(len(self.__connections)):  
         activation += inputs[input_index]*self.__connections[input_index][hidden_node_index]  

       self.__hidden_nodes[h] = activation

     #now we have activated all the hidden nodes we check which has the highest activation
     #this node is the winner and so the cluster we think this input belongs to
     return self.__hidden_nodes.index(max(self.__hidden_nodes))  

So as it stands we have a method for randomly assigning data to clusters. To make it something useful we need to improve the connections. There are many ways this can be done, in competitive learning after you have selected a winner you make your connections to it more like that input. A good analogy is imagine we 3 inputs one for each color red, green and blue. If we get the color yellow the inputs were red and green. So after a wining node is selected it's connections to those colors are increased so future red and green items are more likely to be considered a part of the same cluster. But because there is no blue the connection to this is weakened:

def Train(self, inputs):
     wining_cluster_index = self.feed_forward(inputs)
     learning_rate = 0.1
     for input_index in range(len(self.__connections)):
       weight = self.__connections[input_index][winner]
       self.__connections[input_index][wining_cluster_index] = weight + learning_rate*(inputs[input_index]-weight)

A problem we can have here though is that a cluster can be initialized with terrible weights, such that nothing is ever assigned to it. In order to fix this a penalty added to each hidden node. when ever a hidden node is selected it's penalty is increased. So that over time if a node keeps winning it's the other nodes will eventually start getting selected. This penalty is also known as a conscience or bias.

To add a bias we just need to initialize an array in the constructor for each cluster
     self.__conscience = [0.0]*number_of_clusters_to_group_data_into

Change our feed forward to
def feed_forward(self, inputs): for hidden_node_index in range(len(self.__hidden_nodes)): activation = self.__conscience[hidden_node_index] for input_index in range(len(self.__connections)): activation += inputs[input_index]*self.__connections[input_index][hidden_node_index] self.__hidden_nodes[h] = activation return self.__hidden_nodes.index(max(self.__hidden_nodes))

Then in training we just make a small substitution every time a cluster wins
     self.__conscience[winning_cluster_index] -= self.conscience_learning_rate

Competitive learning nets are nice but come along long way from the goal of full unsupervised learning. In a future post I'm going to do a Restricted Boltzman Machine which is used in deep learning for the shallow layers to give us a simpler representation of an image to work with.

Full code is available on git hub in PythonC#Java and F#