This project is a template for using llama.cpp in a stream through the Gradio interface. Three examples of its use are presented:
- TEXT generation mode.
- JSON generation mode.
- Non native function calls via llama.cpp (lightweited models struggle with generation).
To install all dependencies and run the project locally, follow these steps:
-
Create a virtual environment and activate it:
conda create -n fourm python=3.10 -y conda activate fourm pip install --upgrade pip # enable PEP 660 support pip install -e .
-
Install the required Python dependencies:
pip install -r requirements.txt
-
Download the model: Ensure you have
wget
installed. You can download the model using:wget -P src/models/ https://huggingface.co/IlyaGusev/saiga_llama3_8b_gguf/resolve/main/model-q4_K.gguf
Or you can download any model in GGUF format and place it in the
src/models
directory. Don't forget to change theMODEL_PATH
variable in the.env
file to specify which model you want to use. -
Run the Gradio app: Navigate to the
src
directory and run the application:python3 src/ text
Также параметр text можно заменить на json или function.
Once the server is running, open your web browser and navigate to http://127.0.0.1:8000
to interact with the Gradio interface. You can input text and get responses generated by the LLAMA model in real-time.
LLAMA-CPP-WITH-GRADIO.
├── Dockerfile
├── assets
├── LICENSE
├── README.md
├── requirements.txt
├── src
│ ├── examples
│ │ ├── function_chat.py
│ │ ├── json_chat.py
│ │ └── text_chat.py
│ ├── __main__.py
│ ├── env.py
│ ├── llama_inference.py
│ └── utils.py
└── weights
└── download_gguf.py
- Add llama text chat.
- Add llama JSON output example.
- Add llama function usage example.
- Add native calling function
- Add an example with multimodal model (llama3.2-vision-instruct).
Feel free to open an issue or submit a pull request. Contributions are welcome!