-
Notifications
You must be signed in to change notification settings - Fork 0
Log
Fábio Maia edited this page Apr 4, 2019
·
8 revisions
- Ran many experiments
- Maybe we just need more data to obtain better performance
- Try inceptionv3
- Improved CLI
- Normalized luminance and applied contrast stretching which seemed to improve performance
- Try more color correction techniques
- Refactored data pipeline to do offline data augmentation
- Online data augmentation changes the input 𝑥 at each iteration, and therefore changes the surface and the location of the minimizer 𝜃* (which makes optimization more difficult)
- Empirically verify online vs offline data augmentation
- Class-balanced the dataset by oversampling the minority class
- Weighted loss function is not equivalent when dealing with mini-batches https://datascience.stackexchange.com/questions/44755/why-doesnt-class-weight-resolve-the-imbalanced-classification-problem?newreg=4985fd7dac8543a5a29355c9f6686d5a
- Sampling method is the same data augmentation procedure
- Empirically justify and support why we need more training data than the provided 2k samples
- There is potentially no need for validation data if we don't do early stopping
- Improve the scripts' CLI:
- Validate file paths
- Write help menus
- Set all required arguments as required
- The layers which are cutoff from the pretrained model are still present in TensorFlow's computational graph, which means it's doing unnnecessary computation
- Empirically verify if regularization is necessary
- Refactored data pipeline:
- Use the ISIC 2017 dataset which allows for performance comparison with top competitors
- Crop squares of the center of the image such that resizing them to fit the CNN input tensor will maintain the aspect ratio (alternatively this is promising https://github.com/keras-team/keras-preprocessing/pull/81)
- Data augmentation is performed on the CPU in parallel to training on the GPU
- Introduced elastic (L1 and L2) weight regularization
- Disregarded dropout because it is "too magic"
- Started defining experiments
- There is an allowed range of values for hyperparameters and their cartesian product is computed in order to obtain all possible combinations of values which we want to try
- We must keep the range of values to a minimum in order to minimize the number of possible combinations so as to reduce computation
- There is no need to perform cross validation for the purposes of hyperparameter search. I am not interested in finding the optimal hyperparameter values, instead I want to try them all in order see how they affect performance and study transfer learning in that respect by drawing conclusions from plots
- Settle on a simple cyclical learning rate schedule, potentially one for each network architecture
- Switch out Adam optimizer for simple SGD with Nesterov momentum
- Start working on the custom CNN
- Validation F1-score is always zero and I am not sure why
- Data from ISIC challenges are not fixed size
- Very few papers describe exactly how they process ISIC data, but the best way seems to be to crop the center into a square and resize to fit the CNN input tensor
- It does not make sense to freeze layers in the middle of convolutional blocks, because the convolutional block is trying to progressively build up a higher level feature, so it only makes sense to freeze entire blocks
- Transfer learning definition, strategies and taxonomy
- In general most strategies take some higher level representation of the input features and build a classifier on top of that, but I think we should only consider NN classifiers because e.g. SVM with the kernel tricke are equivalent to feed forward neural networks with a non-linear activation function (citation needed)
- Validation F1-score is frequently zero because FN and TN are zero, likely because of a problem in splitting the data into train, validation, and test
- Transfer learning from models trained on ImageNet may not be adequate for cancer binary classification (experimentally visualize blocks https://github.com/philipperemy/keract for each considered network) because ImageNet is a very diverse dataset whereas the ISIC-Archive is a very restrict subset of skin images, so very likely only the initial layers of any model trained on ImageNet are (theoretically) useful for transfer learning