Project 4: Supervised Learning Networks

This project will focus on superised learning networks of multiple forms. The network is a self-supervised supervised learning network using backpropagation. We will build our code from an ODE, building on the framework from previous projects.

Developing your Backpropagation Network

First, you will need to develop a backpropagation algorithm. Develop a two-layer backpropagation algorithm code that can have an arbitrary number of

Inputs
Hidden-Layer Neurons
Outputs

Having these items as parameters will be essential for using and reusing your code that you will develop. You will want to submit your code as a separate file.

A test dataset: Google has a site for a straight-forward, two-input, one output classification network ( NN Playground) that has a graphical interface to try many different cases. It is a fun application to experiment as you are working through the supervised learning concepts. It is suggested that you train your network using this dataset, particularly the dataset for the two spiral dataset. You would still only have a single hidden layer in your case. Their output sigmoid goes from +1 to -1. They have their code for their networks at Github. Their training set is in their code (they generate a random dataset for the particular 2-D function).

Items for Backpropagation Learning:

Start with a simple dataset and a small network size. If the code is parameterized, a larger network should act correctly when tested on the smaller network.
To develop the ODEs for training, developing a simplier network training, such as LMS. Remember your multiple timescales, one timescale for the signals (e.g. x(t) ), and one timescale for your weight matrix adaptation (e.g. W(t) ), as well as the gap between these two timescales.
After successfully developing an LMS network, move to a one-layer with sigmoid outputs.
After successfully developing your one layer with sigmoid outputs, move to building a two-layer network from your one-layer network components. Make sure to start with a small network size with known number of hidden layer nodes.
Debug any time-delay input to the network separately first to make sure you are getting the desired functionality.
For your larger problem of two NN layers, you will likely have to experiement with the number of hidden layer nodes for your solution.
Utilize knowledge whereever possible about the input problem whereever possible to simplify your network.
If one can generate a good initial guess on the starting weights (e.g. W), use it.
More nodes means more opportunities to minimize the error; fewer nodes leads to better generalization if the problem can be solved.
Training multiple layers (more than two) often requires training layers independantly and iterating.

Items that require an artistic touch:

Number of hidden nodes
Sigmoid gain
Sigmoids on the last layer

Self-Supervised Backpropagation Network with Sound Input

Train your network for a self-supervised algorithm. In this particular self-supervised learning case, use low-gain sigmoids (e.g. a gain of 2 to 4), and you can use linear elements for the final layer. Self-supervised networks will not require having a labeled data set as you already have the desired training signal. You will train your network using a piece of music (voice, instruments, etc.) as your training set. Likely you will need to repeat your piece of music multiple times for the weights to converge. You should identify the piece of music, and the type of music, etc, in your report. If you can submit an mp4 or similar version of the music, that would be appreciated. Train this network (remember to use your ODE formulation) in two cases.

For a single input epoch of your data, construct a co-varience matrix and report the eigenvalue spread of the input data. How will this spread affect your convergance time (e.g. iterations) for the training?
Train your network to reproduce the given input assuming you have 64 input samples (or more). You might expect you have a classical delay line where the 64 samples increment by one position for each network computation. You will want to use the minimum number of hidden layer nodes Make a plot of the average error as a function of iteration (e.g. per epoch) for the training.
Train your network to predict 8 samples in the future for this music with the same number of hidden layer nodes. Start by only training the output layer, freezing the weights for the hidden layer. Then using the original weights for the hidden layer, train the entire network for this sample prediction problem.

You will want to submit your code for the self-supervised algorithm (first part) as a separate file. <\html>