-
Notifications
You must be signed in to change notification settings - Fork 172
Walkthrough: AlexNet
Contents:
- About Minerva owl.net
- About ImageNet
- About AlexNet
- Training AlexNet using Minerva
- Multi-view classification using AlexNet
- Using AlexNet to extract feature
owl.net is a DNN training framework build on Minerva's python interface owl. The main purposes of this package are:
- Provide a simple way for Minerva users to train deep neural network for computer vision problems.
- Provide a prototype about how to build user applications utilizing the advantages of Minerva.
We borrow Caffe's well-defined network and solver configure file format but the execution is conducted in Minerva engine. It's a showcase of Minerva's flexibile interface (building Caffe's main functionality in several hundreds of lines) and computation efficiency (Multi-GPU training).
See also: https://github.com/dmlc/minerva/tree/master/owl/owl/net and API document
If you are not familiar with ImageNet Large Scale Visual Recognition Challenge, please see here. The classification task contains 1.28 million images belong to 1000 classes.
To make IO efficient, we recommend transfer the original image into LMDB after you download the dataset. We could use the tool provided by Caffe to do the convertion. After converting the images, we need to compute the mean value of each pixel among the dataset. When training, mean values are subtracted from the image to produce a zero-mean input. Mean_file for ILSVRC12 can be downloaded by the script provided by Caffe.
In ILSVRC2012, AlexNet was proposed. It's the winning model of ILSVRC2012 classification task and it achieved a large accuracy margin compared with the non-DNN methods. It contains 5 convolutional layers and 3 fully-connected layers. During training, some randomness is introduced in the data augmentation process and dropout layer. Those details are shown below and should be defined in the configure file provided by Caffe (You should modify the data path and mean_file path to your own's). Note that currently we don't support convolutions with more than one group, for one GPU released recently has enough RAM to hold the whole model.
Layer | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
Type | conv+max+norm | conv+max+norm | conv | conv | conv+max | full | full | full |
Channels | 96 | 256 | 384 | 384 | 256 | 4096 | 4096 | 1000 |
Filter Size | 11*11 | 5*5 | 3*3 | 3*3 | 3*3 | - | - | - |
Convolution Stride | 4*4 | 1*1 | 1*1 | 1*1 | 1*1 | - | - | - |
Pooling Size | 3*3 | 3*3 | - | - | 3*3 | - | - | - |
Pooling Stride | 2*2 | 2*2 | - | - | 2*2 | - | - | - |
Padding Size | 2*2 | 1*1 | 1*1 | 1*1 | 1*1 | - | - | - |
We implemented the DNN training logic in trainer.py. The main body of training code and updating code is below, which explicitly show how to train and update using multiple-gpu:
for iteridx in range(self.snapshot * self.owl_net.solver.snapshot, self.owl_net.solver.max_iter):
# train on multi-gpu
for gpuid in range(self.num_gpu):
owl.set_device(self.gpu[gpuid])
self.owl_net.forward('TRAIN')
self.owl_net.backward('TRAIN')
for wid in wunits:
wgrad[gpuid].append(self.owl_net.units[wid].weightgrad)
bgrad[gpuid].append(self.owl_net.units[wid].biasgrad)
# weight update
for i in range(len(wunits)):
wid = wunits[i]
upd_gpu = i * self.num_gpu / len(wunits)
owl.set_device(self.gpu[upd_gpu])
for gid in range(self.num_gpu):
if gid == upd_gpu:
continue
wgrad[upd_gpu][i] += wgrad[gid][i]
bgrad[upd_gpu][i] += bgrad[gid][i]
self.owl_net.units[wid].weightgrad = wgrad[upd_gpu][i]
self.owl_net.units[wid].biasgrad = bgrad[upd_gpu][i]
self.owl_net.update(wid)
The training procedure is controlled by solver file. The information need to be defined through solver are:
- network configuration file
- snapshot saving directory
- max iteration
- testing interval
- test iteration
- snapshot saving interval
- learning rate tuning strategy
- momentum
- weight decay
AlexNet usually need traverse the training set 70-90 times before converging and the learning rate should be tune smaller several times. The standard solver for AlexNet could be found here. Note that you should modify the network configuration file path and snapshot saving path to your own path.
User could use following command to start training given Caffe's solver under the scripts folder.
./net_trainer.py <solver_file> <SNAPSHOT> <NUM_GPU>
-
solver_file
is the file name in Caffe's solver format. -
SNAPSHOT
is the index of the snapshot to start with (default: 0). -
NUM_GPU
is the number of gpu to use.
If we set NUM_GPU
greater than 1, our code will slice a training batch into NUM_GPU
pieces and FF/BP in parallel. The update is executed synchronously, so the training result using one gpu or NUM_GPU
gpus will be the same.
The SNAPSHOT
parameter will guide the system find the saved model under "snapshot saving directory" with that index. If the model can be found and the weight dimension is matched with the network configure file, OwlNet will load the model and continue training. Otherwise, it will initialize the weight according to the weight_filler parameter in configure file.
An example of call:
./net_trainer.py /path/to/solver_file 0 4
We support top-1/top-5 single_view/multi_view test in owl.
In multi-view classification, each image will be cropped out 5 patches located on left-upper, left-down, right-upper, right-down, center, and also generate the horizontal flipped version of them. Our data provider will handle this augmentation for you when you set the multiview flag "MULTIVIEW" to one when calling script.
Each data batch will be turned to ten batches with different view. The final prediction will be made after we sum up the softmax value of different views. The main body of the code is below:
for testiteridx in range(self.owl_net.solver.test_iter[0]):
for i in range(10):
self.owl_net.forward('TEST')
if i == 0:
softmax_val = loss_unit.ff_y
batch_size = softmax_val.shape[1]
softmax_label = loss_unit.y
else:
softmax_val = softmax_val + loss_unit.ff_y
test_num += batch_size
predict = softmax_val.argmax(0)
truth = softmax_label.argmax(0)
correct = (predict - truth).count_zero()
acc_num += correct
In top-5 error prediction, the accuracy parameter top_k should be set to 5 in the accuracy layer in the network configure file.
accuracy_param {
top_k: 5
}
Use following command to perform testing on the given trained network
./net_tester.py <softmax_layer_name> <accuracy_layer_name> <SNAPSHOT> <GPU_IDX> <MULTIVIEW>
-
softmax_layer_name
indicate the layer to produce softmax distribution. -
accuracy_layer_name
indicate the layer to produce accuracy. If you want to get top-5 accuracy, make sure to declare the accuracy_paramtop-k: 5
in the network configuration file. -
SNAPSHOT
is the index of the snapshot to test with (default: 0). -
GPU_IDX
is the id of the gpu on which you want the testing to be performed (default: 0). -
MULTIVIEW
indicate whether to use multiview testing (default: 0).
An example to run multiview testing: Example:
./net_tester.py /path/to/solver.txt loss3/loss3 loss3/top-5 0 1 1
When AlexNet finish training, it could be used as a feature extractor for other computer vision applications. Feature will be stored in a txt file as a matrix. The size of the feature matrix is [num_img, feature_dimension] Use following command to extract the feature of a certain layer from the given trained network
./feature_extractor.py <solver_file> <layer_name> <feature_path> <SNAPSHOT> <GPU_IDX>
-
layer_name
is the name of the layer to extract feature - The feature will be written to the
feature_path
in readable float format (not binary). -
SNAPSHOT
is the index of the snapshot to test with (default: 0). -
GPU_IDX
is the id of the gpu on which you want the testing to be performed (default: 0).
Example:
./feature_extractor.py /path/to/solver.txt fc6 /path/to/save/feature.txt 60 1