Bug fixes and Llama 2 inference support
This release:
- adds group query attention (GQA) support
- changes the activation memory calculation in inference to assume maximum tensor buffer
- fixes the kv cache size calculation
- adds a gpu cost analysis in the inference
- adds llama2 inference case study