Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Commit

Permalink
feat(code): switch fully to ollama as LLM provider (#101)
Browse files Browse the repository at this point in the history
* feat: Adds script to evaluate ollama latency

* refactor: Remove legacy comment

* feat: Adds route to chat with code llm

* docs: Updates copyright notice

* docs: Updates copyright notice

* build(docker): bump ollama to 0.1.23

* feat(ollama): unload model after 30 sec

* build(docker): bump ollama to 0.1.25

* feat(code): add chat endpoint

* refactor(docker): set ollama as default llm provider

* test(code): add unit tests for code router

* docs(readme): update the env configuration description

* docs(swagger): improve the example for Chatrole

* docs(readme): add latency benchmark in the readme

* fix(ollama): fix typo in ollama client

* fix(openai): fix import

* ci(github): remove nvidia driver requirement for docker orchestration

* fix(docker): relax healthcheck on test environment

* fix(docker): update docker healthcheck for tests

* fix(docker): fix docker command for test env

* fix(docker): fix healthcheck

* fix(docker): fix healthcheck

* fix(docker): fix healthcheck of ollama containers

* test(repos): disable repo parsing test
  • Loading branch information
frgfm authored Feb 20, 2024
1 parent 6bc7921 commit 2aaa508
Show file tree
Hide file tree
Showing 19 changed files with 622 additions and 423 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/builds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: pg_pwd
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: docker-compose up -d --build
run: docker-compose -f docker-compose.test.yml up -d --build
- name: Docker sanity check
run: sleep 20 && nc -vz localhost 8050
- name: Debug
Expand Down
34 changes: 29 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,27 @@ In order to stop the service, run:
make stop
```

### Latency benchmark

You crave for perfect codde suggestions, but you don't know whether it fits your needs in terms of latency?
In the table below, you will find a latency benchmark for all tested LLMs from Ollama:

| Model | Ingestion mean (std) | Generation mean (std) |
| ------------------------------------------------------------ | ---------------------- | --------------------- |
| [tinyllama:1.1b-chat-v1-q4_0](https://ollama.com/library/tinyllama:1.1b-chat-v1-q4_0) | 2014.63 tok/s (±12.62) | 227.13 tok/s (±2.26) |
| [dolphin-phi:2.7b-v2.6-q4_0](https://ollama.com/library/dolphin-phi:2.7b-v2.6-q4_0) | 684.07 tok/s (±3.85) | 122.25 toks/s (±0.87) |
| [dolphin-mistral:7b-v2.6](https://ollama.com/library/dolphin-mistral:7b-v2.6) | 291.94 tok/s (±0.4) | 60.56 tok/s (±0.15) |


This benchmark was performed over 20 iterations on the same input sequence, on a **laptop** to better reflect performances that can be expected by common users. The hardware setup includes an [Intel(R) Core(TM) i7-12700H](https://ark.intel.com/content/www/us/en/ark/products/132228/intel-core-i7-12700h-processor-24m-cache-up-to-4-70-ghz.html) for the CPU, and a [NVIDIA GeForce RTX 3060](https://www.nvidia.com/fr-fr/geforce/graphics-cards/30-series/rtx-3060-3060ti/) for the laptop GPU.

You can run this latency benchmark for any Ollama model on your hardware as follows:
```bash
python scripts/evaluate_ollama_latency.py dolphin-mistral:7b-v2.6-dpo-laser-q4_0 --endpoint http://localhost:3000
```

*All script arguments can be checked using `python scripts/evaluate_ollama_latency.py --help`*


### How is the database organized

Expand All @@ -88,30 +109,33 @@ The back-end core feature is to interact with the metadata tables. For the servi

The project was designed so that everything runs with Docker orchestration (standalone virtual environment), so you won't need to install any additional libraries.

## Configuration
### Configuration

In order to run the project, you will need to specific some information, which can be done using a `.env` file.
This file will have to hold the following information:
- `POSTGRES_DB`*: a name for the [PostgreSQL](https://www.postgresql.org/) database that will be created
- `POSTGRES_USER`*: a login for the PostgreSQL database
- `POSTGRES_PASSWORD`*: a password for the PostgreSQL database
- `SUPERADMIN_GH_PAT`: the GitHub token of the initial admin access (Generate a new token on [GitHub](https://github.com/settings/tokens?type=beta), with no extra permissions = read-only)
- `SUPERADMIN_PWD`*: the password of the initial admin access
- `GH_OAUTH_ID`: the Client ID of the GitHub Oauth app (Create an OAuth app on [GitHub](https://github.com/settings/applications/new), pointing to your Quack dashboard w/ callback URL)
- `GH_OAUTH_SECRET`: the secret of the GitHub Oauth app (Generate a new client secret on the created OAuth app)
- `POSTGRES_DB`*: a name for the [PostgreSQL](https://www.postgresql.org/) database that will be created
- `POSTGRES_USER`*: a login for the PostgreSQL database
- `POSTGRES_PASSWORD`*: a password for the PostgreSQL database
- `OPENAI_API_KEY`: your API key for Open AI (Create new secret key on [OpenAI](https://platform.openai.com/api-keys))

_* marks the values where you can pick what you want._

Optionally, the following information can be added:
- `SECRET_KEY`*: if set, tokens can be reused between sessions. All instances sharing the same secret key can use the same token.
- `OLLAMA_MODEL`: the model tag in [Ollama library](https://ollama.com/library) that will be used for the API.
- `SENTRY_DSN`: the DSN for your [Sentry](https://sentry.io/) project, which monitors back-end errors and report them back.
- `SERVER_NAME`*: the server tag that will be used to report events to Sentry.
- `POSTHOG_KEY`: the project API key for PostHog [PostHog](https://eu.posthog.com/settings/project-details).
- `SLACK_API_TOKEN`: the App key for your Slack bot (Create New App on [Slack](https://api.slack.com/apps), go to OAuth & Permissions and generate a bot User OAuth Token).
- `SLACK_CHANNEL`: the Slack channel where your bot will post events (defaults to `#general`, you have to invite the App to your channel).
- `SUPPORT_EMAIL`: the email used for support of your API.
- `DEBUG`: if set to false, silence debug logs.
- `OPENAI_API_KEY`**: your API key for Open AI (Create new secret key on [OpenAI](https://platform.openai.com/api-keys))

_** marks the deprecated values._

So your `.env` file should look like something similar to [`.env.example`](.env.example)
The file should be placed in the folder of your `./docker-compose.yml`.
Expand Down
91 changes: 0 additions & 91 deletions docker-compose.ollama.yml

This file was deleted.

34 changes: 30 additions & 4 deletions docker-compose.test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,24 @@ services:
ports:
- "8050:8050"
environment:
- POSTGRES_URL=postgresql+asyncpg://dummy_login:dummy_pwd@test_db/dummy_db
- OLLAMA_ENDPOINT=http://ollama:11434
- OLLAMA_MODEL=tinydolphin:1.1b-v2.8-q4_0
- SUPERADMIN_GH_PAT=${SUPERADMIN_GH_PAT}
- SUPERADMIN_PWD=superadmin_pwd
- GH_OAUTH_ID=${GH_OAUTH_ID}
- GH_OAUTH_SECRET=${GH_OAUTH_SECRET}
- POSTGRES_URL=postgresql+asyncpg://dummy_login:dummy_pwd@test_db/dummy_db
- OPENAI_API_KEY=${OPENAI_API_KEY}
- DEBUG=true
depends_on:
test_db:
condition: service_healthy
ollama:
condition: service_healthy

test_db:
image: postgres:15-alpine
ports:
- "5432:5432"
expose:
- 5432
environment:
- POSTGRES_USER=dummy_login
- POSTGRES_PASSWORD=dummy_pwd
Expand All @@ -33,3 +36,26 @@ services:
interval: 10s
timeout: 3s
retries: 3

ollama:
image: ollama/ollama:0.1.25
command: serve
volumes:
- "$HOME/.ollama:/root/.ollama"
expose:
- 11434
healthcheck:
test: ["CMD-SHELL", "ollama pull 'tinydolphin:1.1b-v2.8-q4_0'"]
interval: 5s
timeout: 1m
retries: 3
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]

volumes:
ollama:
35 changes: 27 additions & 8 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,41 @@ services:
ports:
- "8050:8050"
environment:
- POSTGRES_URL=postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db/${POSTGRES_DB}
- OLLAMA_ENDPOINT=http://ollama:11434
- OLLAMA_MODEL=${OLLAMA_MODEL}
- SECRET_KEY=${SECRET_KEY}
- SUPERADMIN_GH_PAT=${SUPERADMIN_GH_PAT}
- SUPERADMIN_PWD=${SUPERADMIN_PWD}
- GH_OAUTH_ID=${GH_OAUTH_ID}
- GH_OAUTH_SECRET=${GH_OAUTH_SECRET}
- POSTGRES_URL=postgresql+asyncpg://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db/${POSTGRES_DB}
- OPENAI_API_KEY=${OPENAI_API_KEY}
- SECRET_KEY=${SECRET_KEY}
- SENTRY_DSN=${SENTRY_DSN}
- SERVER_NAME=${SERVER_NAME}
- POSTHOG_KEY=${POSTHOG_KEY}
- SLACK_API_TOKEN=${SLACK_API_TOKEN}
- SLACK_CHANNEL=${SLACK_CHANNEL}
- SUPPORT_EMAIL=${SUPPORT_EMAIL}
- DEBUG=true
depends_on:
db:
condition: service_healthy
ollama:
condition: service_healthy

ollama:
image: ollama/ollama:0.1.25
command: serve
volumes:
- "$HOME/.ollama:/root/.ollama"
expose:
- 11434
healthcheck:
test: ["CMD-SHELL", "ollama pull '${OLLAMA_MODEL}'"]
interval: 5s
timeout: 1m
retries: 3
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

db:
image: postgres:15-alpine
Expand Down Expand Up @@ -71,3 +89,4 @@ services:

volumes:
postgres_data:
ollama:
Loading

0 comments on commit 2aaa508

Please sign in to comment.