1. Neural Networks, Intro and Intuition#

Some resources to learn about Neural Networks:

But what is a Neural Network? - 3Blue1Brown

Aurelion Geron, Hands-On Machine Learning, Chapter 10

The lecture below is courtesy of Prof Ryan Johnson.

1.1. A long data story using NNs (10-15 mins)#

USGS NN data story

Lecture slides on NNs (10-15 mins)

https://www.yout-ube.com/watch?v=aircAruvnKk (Visualizing NNs - first 13 mins)

1.2. The TensorFlow NN Playground#

TensorFlow NN Playground

The TensorFlow Playground http://playground.tensorflow.org/ is a network simulator useful to get an idea of how it all works. Take some time to explore this tool until you have a better idea of how NNs work.

  • First: options: Reset/Run/Step ** Epoch (solver uses Gradient Descent) ** Learning rate (higher value means faster convergence but potentially more error) ** Activation Function ** Regularization of data and Rate (alpha) ** Problem type (classification or regression)

  • Second: Choose your dataset and features ** can modify train/test split, how noisy/spread-out the data is, and the batch size (used for first forward pass through network)

  • Third: Build your network- choose the number of hidden layers, and how many neurons per layer; since this is a binary classifier, there should be 2 output neurons

  • Examine the loss as a function of epoch, the goal is to minimize the loss over each epoch when arriving at a final solution

  • Examine the connections between neurons, and the resulting output boundaries: blue are positive weights, orange are negative weights, and magnitude of the weight is indicated by the thickness of the connection.

To explore, try the following:

  • Click the Run button (top left), and notice how quickly it finds a good solution. Try replacing the tanh activation function with the ReLU. What do you notice about the boundaries of the final solution?

Run it several times, you’ll notice it can get stuck in a local minima.

Keep only 2 hidden layers of neurons. You’ll notice that for some datasets, the model is not capable of finding a good solution. The model has too few parameters and underfits the training set.

Select a different dataset and repeat the above.

Your turn! Try to play with some of the model parameters

  • number of features

  • number of hidden layer neurons

  • Activation

  • Regularization

  • try adjusting individual neuron weights and see how the model responds

#Playground Exercises (Due at beginning of next class):

Instructions

Our goal with the following exercises will be to create networks that can completely separate the data with the least complexity possible. In addition to the number of features, layers, and neurons, you should feel free to modify the activation function, learning rate, and regularization in your experimentation.

The purpose of this activity is to explore NNs and get an intuitive feeling for how a NN works. You may complete this activity by submitting the details of your solutions to the following questions to Moodle by the beginning of our next class period. For each exercise and model, record the learning rate, activation function, regularization penalty, input features and network architecture (number of neurons in each layer), along with the network complexity defined above. Presenting your data in tabular form is preferred.

For each of the splits in exercises 1 and 2 (8 total), the fewest epochs for the least complex model that results in a cleanly separated (2D) dataset gets +1 Mastery Q points. Mastery points are added onto your homework score at the end of the term (In order to qualify, your model and solution have to be reproducible, so be sure and screenshot your network parameters and solution)


Let us define the complexity (Z) of a playground network with d-hidden layers, and n-neurons in the i-th layer, as:
\begin{gather} Z = (n_{feat} \times n_1) + \sum_{i=2}^d(n_{i+1} \times n_i) \end{gather}

This definition requires only 1 hidden layer, which will encompass the most simple models, though you may find it difficult to solve each problem with i=1.

hint In optimizing your network architecture, start with more layers and neurons than you think you need and examine the weights between layers. Those weights which are low represent less important connections. As your network’s complexity exceeds what is necessary, some neurons may become inactive showing that there may be too many neurons in that layer.

Exercises

  1. Using a 70/30 train/test split, what is the simplest network architecture (in terms of the complexity defined above) that can completely separate each dataset? Record the number of features, neurons, and layers (the network architecture) for each model, and compute your model’s complexity. At what epoch does the model completely separate the data?

  2. Repeat exercise 1 with a 50/50 train/test split, and again with a 90/10 train/test split. Specifically discuss any significant differences in resulting model complexity and epoch of solution.

  3. Using a 70/30 split, explore bivariate models only to find the least complex model that can classify each data set (2 inputs, 2 outputs). Which 2 features and which architecture produce the best model for each data set?

# example
# start with dataset 2 (4-corners), x1 and x2 as inputs, (3,2) hidden layers
  1. In dataset 2 with a 70% ratio of training to test data, I was able to separate the dataset in about 250 epochs. This used features X1 and X2, with 2 hidden layers with 3 neurons in each. This model had a complexity of 15, (26 + (33)).

  2. Using a 50% split, the model still worked but took as little longer than 250 epochs closer to 500. The 90% split was different it was only able to separate the data about 50% of the time, when it did it it only took about 200 epochs. Adding an extra neuron in the first layer made the model much more consistent and it split the data in less than 200 epochs, this raises the complexity to 20.

  3. Dataset 1, using X1^2 and X2^2 and 1 neuron in each layer sorts the data in 100 epochs. Complexity of 4. Dataset 2, using X1X2 and sin(X1) and 1 neuron in each layer sorts the data in 150 epochs. Complexity of 4. Dataset 3, using X1 and X2, and 1 neuron in each layer, sorts the data in 150 epochs. Complexity of 4. Dataset 4, I was unable to find a solution to correctly separate the data. While all the solutions above used Tanh activation and no regularization; modifying these did not help me solve the 4th set. By using more than 2 features (all except X1^2 and X2^2) and 6 neurons in each layer I could get it to split fairly well but there was still about .2 test loss.