An easy-to-use tool made with Gradio, LangChain, and Torch.
- Context-aware Streaming Chatbot.
- Inbuilt code syntax highlighting.
- Load any LLM repo directly from HuggingFace.
- Supports both CPU & CUDA modes.
- Enable LLM inference with llama.cpp using llama-cpp-python
- Convert models(Safetensors, pt to gguf etc)
- Customize LLM inference parameters(n_gpu_layers, temperature, max_tokens etc)
- Real-time text generation via websockets, enabling seamless integration with different frontend frameworks.
To use LLMinator, follow these simple steps:
```
git clone https://github.com/Aesthisia/LLMinator.git
cd LLMinator
pip install -r requirements.txt
```
Build LLMinator with llama.cpp:
-
Using
make
:-
On Linux or MacOS:
make
-
On Windows:
- Download the latest fortran version of w64devkit.
- Extract
w64devkit
on your pc. - Run
w64devkit.exe
. - Use the
cd
command to reach theLLMinator
folder. - From here you can run:
make
-
-
Using
CMake
:mkdir build cd build cmake ..
- Run the LLMinator tool using the command
python webui.py
. - Access the web interface by opening the http://127.0.0.1:7860 in your browser.
- Start interacting with the chatbot and experimenting with LLMs!
Checkout this youtube video to follow installation steps
Argument Command | Default | Description |
---|---|---|
--host | 127.0.0.1 | Host or IP address on which the server will listen for incoming connections |
--port | 7860 | Launch gradio with given server port |
--share | False | This generates a public shareable link that you can send to anybody |
Connect to ws://localhost:7861/ for real-time text generation. Submit prompts and receive responses through the websocket connection.
Integration with Frontends:
The provided example/index.html
demonstrates basic usage of text generation through websocket connection. You can integrate it with any frontend framework like React.js
- Compatible Versions: This project is compatible with Python versions 3.8+ to 3.11. Ensure you have one of these versions installed on your system. You can check your Python version by running
python --version
orpython3 --version
in your terminal.
- Cmake Dependency: If you plan to build the project using Cmake, make sure you have Cmake installed.
- C Compiler: Additionally, you'll need a C compiler such as GCC. These are typically included with most Linux distributions. You can check this by running
gcc --version
in your terminal. Installation instructions for your specific operating system can be found online.
- Visual Studio Installer: If you're using Visual Studio Code for development, you'll need the C++ development workload installed. You can achieve this through the Visual Studio Installer
- CUDA Installation: To leverage GPU acceleration, you'll need CUDA installed on your system. Download instructions are available on the NVIDIA website.
- Torch Compatibility: After installing CUDA, confirm CUDA availability with
torch.cuda.is_available()
. When using a GPU, ensure you follow the project's specificllama-cpp-python
installation configuration for CUDA support.
If you encounter any errors or issues, feel free to file a detailed report in the project's repository. We're always happy to help! When reporting an issue, please provide as much information as possible, including the error message, logs, the steps you took, and your system configuration. This makes it easier for us to diagnose and fix the problem quickly.
We welcome contributions from the community to enhance LLMinator further. If you'd like to contribute, please follow these guidelines:
- Fork the LLMinator repository on GitHub.
- Create a new branch for your feature or bug fix.
- Test your changes thoroughly.
- Submit a pull request, providing a clear description of the changes you've made.
Reach out to us: [email protected]