- Data curation. I leave it up to you to use black and white images
or grey scale images. I can get pretty much the same accuracy. Recall
that after S2, due to tanh, everything is squashed into the range of [-1..1[
- Initial weights: We initial the weigths for C1 as follows:
double std = Math.sqrt(2.0 / (C1_FILTER_SIZE * C1_FILTER_SIZE));
c1Filters[f][i][j] = random.nextGaussian() * std;
- Activation function: For now, keep the plain tanh without adding
the A and S coefficients.
- Pooling layers: For now, simply perform average pooling without
weights and biases.
- Please start with a learning rate of 0.01
- For training purposes, please use MSE (Mean Square Error) as
discussed in class.
- The error at the output layer should be calculated as follows:
2 * (predicted[i] - target[i]) * tanhDerivative(predicted[i]);
- Afterwards, when you adjust weights, you need to subtract from the
current weight.
- Here is the derivative for tanh:
private double tanhDerivative(double x) {
return 1 - x * x; // Derivative of tanh is 1 - tanh^2(x)
}
- Here is the "derivative" for the pooling layer:
private double poolDerivative(double output) {
return 1.0 / 4.0; // For average pooling, the gradient is distributed evenly
}