Skip to content

Latest commit

 

History

History
99 lines (63 loc) · 4.32 KB

README.md

File metadata and controls

99 lines (63 loc) · 4.32 KB

AI Autocompletion from locally hosted llama.cpp

Albus is a locally hosted AI code completion plugin for Visual Studio Code, designed to work seamlessly with 🦙 llama.cpp Python API.

Albus is aptly named as your "wizard" programmer, since he is not casting spells, but definitely brewing up some magical AI code completion!

albus

Our goal? To democratize the development of AI tools and make it as enchanting for everybody.

Accio, llamas! 🧙‍♀️✨

🚀 Getting Started

Prerequisites

To make use of Albus properly, you will need to run the 🦙 llama.cpp Python API.

Installation & Setup

  1. Create a folder on your local for your server and models
mkdir service
cd service
mkdir models
  1. Create and activate virtual environment
python3 -m venv .env
  1. Install API server with 🦙 llama.cpp Python
(.env) pip3 install "llama-cpp-python[server]"
  1. Download a suitable model from 🤗 Hugging Face into this folder's model folder

Some good models:

https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF

https://huggingface.co/TheBloke/WizardCoder-Python-7B-V1.0-GGUF

  1. Run the server using
python3 -m llama_cpp.server --model models/deepseek-coder-6.7b-instruct.Q5_K_M.gguf --n_ctx 8192
  1. Install the verified extension using vscode or visit this link for more information on how to install it.

Enjoy enhanced code completions with Albus! 🎉

Configuration

General settings:

Property Type Default Description
albus.general.contextLength number 500 Number of characters to include in the context (default: 500 characters)
albus.general.debounceWait number 500 Amount of time to wait before sending a request to the server (default: 500ms)
albus.general.enabled boolean true Enable or disable the general functionality (default: true)

Settings for llama.cpp server:

Setting Type Default Description
albus.llama.cpp.host string localhost Host of the LLama model server
albus.llama.cpp.port number 8000 Port of the LLama model server
albus.llama.cpp.stream boolean true Streaming (enabled by default)
albus.llama.cpp.temperature number 0.7 The randomness of the generated text (default: 0.7)
albus.llama.cpp.max_tokens number 20 The number of tokens to predict when generating text (default: 20)
albus.llama.cpp.repeat_penalty number 1.1 The penalty for repeating tokens (default: 1.1)
albus.llama.cpp.seed number -1 Seed for the random number generator (default: -1)
albus.llama.cpp.top_p number 0.9 Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P (default: 0.9)
albus.llama.cpp.top_k number 40 Limit the next token selection to the K most probable tokens (default: 40)
albus.llama.cpp.stop_strings array ["### "] List of strings for stopping the output of the LLama model

Features

✅ Autocompletion (duh)

✅ Configuration of llama.cpp parameters

Upcoming

  • Integrate other local servers such as Ollama, Koboldcpp, etc.

  • Selecting and refactoring code

  • Code selection and utomatic documentation

  • Optimization of selected code

  • RAG over code and Chat