Skip to content

Commit

Permalink
Update chat docs & resource limits
Browse files Browse the repository at this point in the history
  • Loading branch information
DavidMStraub committed Oct 21, 2024
1 parent 2a7e677 commit c7bc9e4
Show file tree
Hide file tree
Showing 8 changed files with 56 additions and 18 deletions.
3 changes: 3 additions & 0 deletions docs/install_setup/chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ If the model is not present in the local cache, it will be downloaded when Gramp

Please share learnings about different models with the community!

!!! info
The sentence transformers library consumes a significant amount of memory, which might cause worker processes being killed. As a rule of thumb, with semantic search enabled, each Gunicorn worker consumes around 200 MB of memory and each celery worker around 500 MB of memory even when idle, and up to 1 GB when computing embeddings. See [Limit CPU and memory usage](cpu-limited.md) for settings that limit memory usage. In addition, it is advisable to provision a sufficiently large swap partition to prevent OOM errors due to transient memory usage spikes.

## Setting up an LLM provider

Communication with the LLM uses an OpenAI compatible API using the `openai-python` library. This allows using a locally deployed LLM via Ollama (see [Ollama OpenAI compatibility](https://ollama.com/blog/openai-compatibility)) or an API like OpenAI or Huggingface TGI. The LLM is configured via the configuration parameters `LLM_MODEL` and `LLM_BASE_URL`.
Expand Down
57 changes: 44 additions & 13 deletions docs/install_setup/cpu-limited.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,58 @@
# Limit CPU usage
# Limit CPU and memory usage

In order to avoid high CPU/RAM usage, it is possible to set the number of workers
using the environment variable `GUNICORN_NUM_WORKERS`.
In the recommended docker-based setup, Gramps Web uses [Gunicorn](https://gunicorn.org/) to serve the
backend and [Celery](https://docs.celeryq.dev) for background tasks. In both cases, several worker
processes can be run in parallel, which makes the application more responsive from a user perspective.
However, increasing the number of workers also increase the amount of RAM used (even when the application is idle)
and allowing requests to be processed in parallel can lead to high CPU usage (in particular when many users
are using the application simultaneously). Both Gunicorn and Celery allow to limit the number of parallel workers.

Here, we will take a number of workers = 2. Adjust it according to your needs.
It may be a good idea to check the CPU/Threads available before choosing the value:
## Get information about your system

> lscpu | grep CPU
On Linux, you can check the number of cores available on your system with the following command:

The easiest way is to declare the variable in the `docker-compose.yml` file,
under the "environment".
```bash
lscpu | grep CPU
```

To see how much memory and swap space you have available, use

```bash
free -h
```
version: "3.7"


## Limiting the number of Gunicorn workers

The easiest way to set the number of Gunicorn workers when using the default Gramps Web
docker image is to set the environment variable `GUNICORN_NUM_WORKERS`, e.g. by declaring it
in the `docker-compose.yml` file,
under the "environment".

```yaml
services:
grampsweb:
environment:
GUNICORN_NUM_WORKERS: 2
```
Other ways are possible, for example by storing the variable in a file,
and calling it in the startup command:
See [the Gunicorn documentation](https://docs.gunicorn.org/en/stable/design.html#how-many-workers) to decide
about the ideal number of workers.
## Limiting the number of Celery workers
To set the number of Celery workers, adapt the `concurrency` setting in the Docker compose file:

```yaml
grampsweb_celery:
command: celery -A gramps_webapi.celery worker --loglevel=INFO --concurrency=2
```

> docker compose --env-file ./env up
See [the Celery documentation](https://docs.celeryq.dev/en/stable/userguide/workers.html#concurrency) to decide
about the ideal number of workers.

In this case, the `env` file would contain a single line: GUNICORN_NUM_WORKERS=2
!!! info
If the `concurrency` flag is omitted (which was the case in the Gramps Web documentation until v2.5.0), it
defaults to the number of CPU cores available on the system, which might consume a substantial amount of memory.
4 changes: 4 additions & 0 deletions docs/user-guide/chat.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Using AI chat

!!! info
AI chat requires Gramps Web API version 2.5.0 or higher and Gramps Web version 24.10.0 or higher.


The chat view in Gramps Web (if available in your installation) gives access to an AI assistant that can answer questions about your family tree.

!!! warning
Expand Down
2 changes: 1 addition & 1 deletion examples/caprover-one-click-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ services:
notExposeAsWebApp: 'true'
dockerfileLines:
- FROM ghcr.io/gramps-project/grampsweb:$$cap_version
- CMD exec celery -A gramps_webapi.celery worker --loglevel=INFO
- CMD exec celery -A gramps_webapi.celery worker --loglevel=INFO --concurrency=2
volumes:
$$cap_appname-users:
$$cap_appname-index:
Expand Down
2 changes: 1 addition & 1 deletion examples/digitalocean-1click/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ services:
VIRTUAL_HOST: ""
LETSENCRYPT_HOST: ""
LETSENCRYPT_EMAIL: ""
command: celery -A gramps_webapi.celery worker --loglevel=INFO
command: celery -A gramps_webapi.celery worker --loglevel=INFO --concurrency=2

grampsweb_redis:
image: docker.io/library/redis:7.2.4-alpine
Expand Down
2 changes: 1 addition & 1 deletion examples/docker-compose-base/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ services:
container_name: grampsweb_celery
depends_on:
- grampsweb_redis
command: celery -A gramps_webapi.celery worker --loglevel=INFO
command: celery -A gramps_webapi.celery worker --loglevel=INFO --concurrency=2

grampsweb_redis:
image: docker.io/library/redis:7.2.4-alpine
Expand Down
2 changes: 1 addition & 1 deletion examples/docker-compose-letsencrypt/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ services:
VIRTUAL_HOST: ""
LETSENCRYPT_HOST: ""
LETSENCRYPT_EMAIL: ""
command: celery -A gramps_webapi.celery worker --loglevel=INFO
command: celery -A gramps_webapi.celery worker --loglevel=INFO --concurrency=2

grampsweb_redis:
image: docker.io/library/redis:7.2.4-alpine
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ nav:
- Update: install_setup/update.md
- Using PostgreSQL: install_setup/postgres.md
- Hosting media on S3: install_setup/s3.md
- Limit CPU usage: install_setup/cpu-limited.md
- Limit CPU & memory usage: install_setup/cpu-limited.md
- 2.0 upgrade guide: install_setup/v2.md
- Administration:
- Introduction: administration/admin.md
Expand Down

0 comments on commit c7bc9e4

Please sign in to comment.