Skip to content

dimkablin/llama-cpp-with-gradio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLAMA-CPP-WITH-GRADIO img

img img img

img img

This project is a template for using llama.cpp in a stream through the Gradio interface. Three examples of its use are presented:

  • TEXT generation mode.
  • JSON generation mode.
  • Non native function calls via llama.cpp (lightweited models struggle with generation).
Пример

Table of Contents

Local Installation

To install all dependencies and run the project locally, follow these steps:

  1. Create a virtual environment and activate it:

    conda create -n fourm python=3.10 -y
    conda activate fourm
    pip install --upgrade pip  # enable PEP 660 support
    pip install -e .
  2. Install the required Python dependencies:

    pip install -r requirements.txt
  3. Download the model: Ensure you have wget installed. You can download the model using:

    wget -P src/models/ https://huggingface.co/IlyaGusev/saiga_llama3_8b_gguf/resolve/main/model-q4_K.gguf

    Or you can download any model in GGUF format and place it in the src/models directory. Don't forget to change the MODEL_PATH variable in the .env file to specify which model you want to use.

  4. Run the Gradio app: Navigate to the src directory and run the application:

    python3 src/ text

    Также параметр text можно заменить на json или function.

Usage

Once the server is running, open your web browser and navigate to http://127.0.0.1:8000 to interact with the Gradio interface. You can input text and get responses generated by the LLAMA model in real-time.

Project Structure

LLAMA-CPP-WITH-GRADIO.
├── Dockerfile
├── assets
├── LICENSE
├── README.md
├── requirements.txt
├── src
│   ├── examples
│   │   ├── function_chat.py
│   │   ├── json_chat.py
│   │   └── text_chat.py
│   ├── __main__.py
│   ├── env.py
│   ├── llama_inference.py
│   └── utils.py
└── weights
    └── download_gguf.py

Tasks

  • Add llama text chat.
  • Add llama JSON output example.
  • Add llama function usage example.
  • Add native calling function
  • Add an example with multimodal model (llama3.2-vision-instruct).

Contribution

Feel free to open an issue or submit a pull request. Contributions are welcome!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published