Roadmap Functionality Batched inference Fine-grained KV cache management Explore tree sparsity Fine-tune Medusa heads together with LM head from scratch Distill from any model without access to the original training data Integration Local Deployment mlc-llm exllama llama.cpp Serving vllm lightllm TGI TensorRT