Skip to content

Knowledge distillation methods implemented with Tensorflow (now there are 8 methods, and will be added more.)

Notifications You must be signed in to change notification settings

bhheo/Knowledge_distillation_methods_wtih_Tensorflow

 
 

Repository files navigation

Knowledge Distillation Methods with Tensorflow

Knowledge distillation is the method to enhance student network by teacher knowledge. So annually knowledge distillation methods have been proposed but each paper's do experiments with different networks and compare with different methods. And each method is implemented by each author, so if a new researcher wants to study knowledge distillation, they have to find or implement all of the methods. Surely it is very hard work. To reduce this burden, I publish some code that is modified from my research codes. I'll update the code and knowledge distillation algorithm, and all of the things will be implemented by Tensorflow.

If you want something a new method, please notice to me :)

Implemented Knowledge Distillation Methods

below methods are implemented and base on insight with TAKD, I make each category. I think they are meaningful categories, but if you think it has problems please notice for me :)

Response-based Knowledge

Defined knowledge by the neural response of the hidden layer or the output layer of the network

Multi-connection Knowledge

Increases knowledge by sensing several points of the teacher network

Shared-representation Knowledge

Defined knowledge by the relation between two feature maps

Relational Knowledge

Defined knowledge by intra-data relation

Experimental Results

The below table and plot are sample results using ResNet.

I use the same hyper-parameter for training each network, and only tune hyper-parameter of each distillation algorithm. But the results may be not optimal. All of the numerical values and plots are averages of five trials.

Network architecture

The teacher network is ResNet32 and Student is ResNet8, and the student network is well-converged (not over and under-fit) for evaluating each distillation algorithm performance precisely.

Training/Validation plots

Methods Last Accuracy Best Accuracy
Student 71.76 71.92
Teacher 78.96 79.08
Soft-logits 71.79 72.08
FitNet 72.74 72.96
AT 72.31 72.60
FSP 72.65 72.91
DML 73.27 73.47
KD-SVD 73.68 73.78
AB 72.80 73.10
RKD 73.40 73.48

Plan to do

  • Implement the Jacobian matching

About

Knowledge distillation methods implemented with Tensorflow (now there are 8 methods, and will be added more.)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.6%
  • Jupyter Notebook 10.4%