CNN Assignment - Milestone 2

Overview

The purpose of this assignment is to learn about the inner workings of CNNs. You are asked to implement a pioneering CNN, LeNet5, which is a CNN for MNIST digit recognition.

Assignment

This assignment is worth 125 points. Please work on it by yourself. Please do not use GenAI tools to develop and test the code.

For this milestone, you will implement and test the backpropagtion pass ae well as experiement with various hyper parameters as well as the network architecture. The latter is for extra credit.
Continue working with the two files from milestone 1: "LeNet5.java" and "MNISTCNN.java"
For milestone 1, we made some simplifying assumptions and for testing, I asked you to initiliaze the weights that suited testing. Initially, we will keep some of those assumptions:
1. Data curation. I leave it up to you to use black and white images or grey scale images. I can get pretty much the same accuracy. Recall that after S2, due to tanh, everything is squashed into the range of [-1..1[
2. Initial weights: We initial the weigths for C1 as follows:
```
double std = Math.sqrt(2.0 / (C1_FILTER_SIZE * C1_FILTER_SIZE));
c1Filters[f][i][j] = random.nextGaussian() * std;
```
3. Activation function: For now, keep the plain tanh without adding the A and S coefficients.
4. Pooling layers: For now, simply perform average pooling without weights and biases.
5. Please start with a learning rate of 0.01
6. For training purposes, please use MSE (Mean Square Error) as discussed in class.
7. The error at the output layer should be calculated as follows:
```
2 * (predicted[i] - target[i]) * tanhDerivative(predicted[i]);
```
8. Afterwards, when you adjust weights, you need to subtract from the current weight.
9. Here is the derivative for tanh:
```
private double tanhDerivative(double x) {
    return 1 - x * x; // Derivative of tanh is 1 - tanh^2(x)
}
```
10. Here is the "derivative" for the pooling layer:
```
private double poolDerivative(double output) {
   return 1.0 / 4.0; // For average pooling, the gradient is distributed evenly
}
```
The key challange of the backprop pass is to determine the errors. Use the information we worked out in class to implement this pass. For the feedforward layers, you may reuse your code from the FFnet assignment.
Once you understand backprop at the convolutional and pooling layers, the assignment is kind of repetitive.
You may wish to implement backpropagation begining at the output layer, working backwards towards the input layer.
Test your network. You may wish to begin with just the first example of the trainig set, to debug any errors. Then run the network on one epoch. For additional debugging, consider running five epochs. Your CNN should get above 80% for the testing set. As you will find out, the CNN can be quite fickle. If your code does not get the desired performance, please ask for help.
To fill in the lab manual, please do the following:
1. Run it through the training and test sets and collect the same data as for the FFnet MNIST assignment:
  1. For testing, count the number of times the highest output value matches the desired output.
  2. List the learning rate, and the number of epochs necessary to get at 90% accuracy on the test set and the actual accuracy. Additionally, list the accuracy of the learning set. For your reference, given the hyperparameters and network architecature mentioned above, we consistently get over 90% test accuracy after 10 epochs of training.
2. Display the following images, ensuring they are properly labeled. (Simply pass in the layer name when creating a new JFrame.) Input, C1, S2, C3, S4, C5, F6, Output. Please only display the images of the first of the feature maps and the first of the pooling layer map at each level and only for the first training example. Also, print the activation values at the output layer.
Extra credit. For extra credit consider the following modifications. In all cases, please submit and discuss the effect the modification has on the performance, as measured by:
1. Accuracy of training and testing sets.
2. Number of epochs
3. Training time. You can calculate it as follows:
```
long startTime = System.currentTimeMillis();
   // code to time, ideally a procedure call
long endTime = System.currentTimeMillis()
```
You may wish to run the network a few times, to ensure/observe consistent performance.
1. [10+/- pts] RELU activation function. Ensure you use the appropriate derivative during training.
2. [10+/- pts] Batch processing. Experiment with different batch sizes, say 10, 100, 1000.
3. [10+/- pts] Different weight initializations, including small pos/neg random values and the "fan-in" as described in LeCun's paper.
4. [Negotiable] Other experiments. Just ask me first.

Submission

Please submit "LeNet5.java" and "MNISTCNN.java" files as well as the lab manual to the appropriate drop-box on Moodle.