Use local LLM with custom Gradio chat UI.
You can also check the containerize_mode directory for a similar more complex example.
The model will be downloaded from Huggingface. Set up a Huggingface account and set up your API token to avoid being throttled when downloading.
Check the generic instructions for requirements at parent README.md.