Log

Week 4

Ran many experiments
- Maybe we just need more data to obtain better performance
- Try inceptionv3
Improved CLI
Normalized luminance and applied contrast stretching which seemed to improve performance
- Try more color correction techniques

Refactored data pipeline to do offline data augmentation
- Online data augmentation changes the input 𝑥 at each iteration, and therefore changes the surface and the location of the minimizer 𝜃* (which makes optimization more difficult)
- Empirically verify online vs offline data augmentation
Class-balanced the dataset by oversampling the minority class
- Weighted loss function is not equivalent when dealing with mini-batches https://datascience.stackexchange.com/questions/44755/why-doesnt-class-weight-resolve-the-imbalanced-classification-problem?newreg=4985fd7dac8543a5a29355c9f6686d5a
- Sampling method is the same data augmentation procedure
Empirically justify and support why we need more training data than the provided 2k samples
There is potentially no need for validation data if we don't do early stopping
Improve the scripts' CLI:
- Validate file paths
- Write help menus
- Set all required arguments as required
The layers which are cutoff from the pretrained model are still present in TensorFlow's computational graph, which means it's doing unnnecessary computation
Empirically verify if regularization is necessary

Refactored data pipeline:
- Use the ISIC 2017 dataset which allows for performance comparison with top competitors
- Crop squares of the center of the image such that resizing them to fit the CNN input tensor will maintain the aspect ratio (alternatively this is promising https://github.com/keras-team/keras-preprocessing/pull/81)
- Data augmentation is performed on the CPU in parallel to training on the GPU
Introduced elastic (L1 and L2) weight regularization
- Disregarded dropout because it is "too magic"
Started defining experiments
- There is an allowed range of values for hyperparameters and their cartesian product is computed in order to obtain all possible combinations of values which we want to try
- We must keep the range of values to a minimum in order to minimize the number of possible combinations so as to reduce computation
- There is no need to perform cross validation for the purposes of hyperparameter search. I am not interested in finding the optimal hyperparameter values, instead I want to try them all in order see how they affect performance and study transfer learning in that respect by drawing conclusions from plots
Settle on a simple cyclical learning rate schedule, potentially one for each network architecture
Switch out Adam optimizer for simple SGD with Nesterov momentum
Start working on the custom CNN
Validation F1-score is always zero and I am not sure why

Data from ISIC challenges are not fixed size
Very few papers describe exactly how they process ISIC data, but the best way seems to be to crop the center into a square and resize to fit the CNN input tensor
It does not make sense to freeze layers in the middle of convolutional blocks, because the convolutional block is trying to progressively build up a higher level feature, so it only makes sense to freeze entire blocks
Transfer learning definition, strategies and taxonomy
- In general most strategies take some higher level representation of the input features and build a classifier on top of that, but I think we should only consider NN classifiers because e.g. SVM with the kernel tricke are equivalent to feed forward neural networks with a non-linear activation function (citation needed)
Validation F1-score is frequently zero because FN and TN are zero, likely because of a problem in splitting the data into train, validation, and test
Transfer learning from models trained on ImageNet may not be adequate for cancer binary classification (experimentally visualize blocks https://github.com/philipperemy/keract for each considered network) because ImageNet is a very diverse dataset whereas the ISIC-Archive is a very restrict subset of skin images, so very likely only the initial layers of any model trained on ImageNet are (theoretically) useful for transfer learning