Here the brief list of existed models speedup techniques with the links:
- Compression
- Quantization
- hashing
- Pruning
- Vector quantization
- Huffman coding
- Factorizations
- Knowledge distillation
- low bit networks
- I don't know exactly
Blog posts:
- How to Quantize Neural Networks with TensorFlow
- Compression of neural networks
- Compressing and regularizing deep neural networks
- Deep Compression and EIE slides and video
- Pruning deep neural networks to make them fast and small(in pyTorch)
Comment/contributing are welcome!