Skip to content

Latest commit

 

History

History
155 lines (115 loc) · 8.68 KB

File metadata and controls

155 lines (115 loc) · 8.68 KB

llamaspeak

CONTAINERS IMAGES RUN BUILD

Note

For llamaspeak version 2 with multimodal support, see the local_llm container

Start Riva

First, follow the steps from the riva-client:python package to run and test the Riva server:

  1. Start the Riva server on your Jetson by following riva_quickstart_arm64
  2. Run some of the Riva ASR examples to confirm that ASR is working: https://github.com/nvidia-riva/python-clients#asr
  3. Run some of the Riva TTS examples to confirm that TTS is working: https://github.com/nvidia-riva/python-clients#tts

You can also see this helpful video and guide from JetsonHacks for setting up Riva: Speech AI on Jetson Tutorial

Load LLM

Next, start text-generation-webui (version 1.7) with the --api flag and load your chat model of choice through it's web UI on port 7860:

./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) \
   python3 server.py --listen --verbose --api \
	--model-dir=/data/models/text-generation-webui

note: launch the text-generation-webui:1.7 container to maintain API compatability

Alternatively, you can manually specify the model that you want to load without needing to use the web UI:

./run.sh --workdir /opt/text-generation-webui $(./autotag text-generation-webui:1.7) \
   python3 server.py --listen --verbose --api \
	--model-dir=/data/models/text-generation-webui \
	--model=llama-2-13b-chat.Q4_K_M.gguf \
	--loader=llamacpp \
	--n-gpu-layers=128 \
	--n_ctx=4096 \
	--n_batch=4096 \
	--threads=$(($(nproc) - 2))

See here for command-line arguments: https://github.com/oobabooga/text-generation-webui/tree/main#basic-settings

Enabling HTTPS/SSL

Browsers require HTTPS to be used in order to access the client's microphone. Hence, you'll need to create a self-signed SSL certificate and key:

$ cd /path/to/your/jetson-containers/data
$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365 -nodes -subj '/CN=localhost'

You'll want to place these in your jetson-containers/data directory, because this gets automatically mounted into the containers under /data, and will keep your SSL certificate persistent across container runs. When you first navigate your browser to a page that uses these self-signed certificates, it will issue you a warning since they don't originate from a trusted authority:

You can choose to override this, and it won't re-appear again until you change certificates or your device's hostname/IP changes.

Run Llamaspeak

To run the llamaspeak chat server with its default arguments and the SSL keys you generated, start it like this:

./run.sh --env SSL_CERT=/data/cert.pem --env SSL_KEY=/data/key.pem $(./autotag llamaspeak)

See chat.py for command-line options that can be changed. For example, to enable --verbose or --debug logging:

./run.sh --workdir=/opt/llamaspeak \
  --env SSL_CERT=/data/cert.pem \
  --env SSL_KEY=/data/key.pem \
  $(./autotag llamaspeak) \
  python3 chat.py --verbose

if you're having issues with getting audio or responses from the web client, enable debug logging to check the message traffic.

The default port is 8050, but that can be changed with the --port argument. You can then navigate your browser to https://HOSTNAME:8050

CONTAINERS
llamaspeak
   Builds llamaspeak_jp51 llamaspeak_jp60
   Requires L4T ['>=34.1.0']
   Dependencies build-essential python riva-client:python numpy
   Dockerfile Dockerfile
   Images dustynv/llamaspeak:r35.2.1 (2023-09-07, 5.0GB)
dustynv/llamaspeak:r35.3.1 (2023-08-29, 5.0GB)
dustynv/llamaspeak:r35.4.1 (2023-12-05, 5.0GB)
CONTAINER IMAGES
Repository/Tag Date Arch Size
  dustynv/llamaspeak:r35.2.1 2023-09-07 arm64 5.0GB
  dustynv/llamaspeak:r35.3.1 2023-08-29 arm64 5.0GB
  dustynv/llamaspeak:r35.4.1 2023-12-05 arm64 5.0GB

Container images are compatible with other minor versions of JetPack/L4T:
    • L4T R32.7 containers can run on other versions of L4T R32.7 (JetPack 4.6+)
    • L4T R35.x containers can run on other versions of L4T R35.x (JetPack 5.1+)

RUN CONTAINER

To start the container, you can use jetson-containers run and autotag, or manually put together a docker run command:

# automatically pull or build a compatible container image
jetson-containers run $(autotag llamaspeak)

# or explicitly specify one of the container images above
jetson-containers run dustynv/llamaspeak:r35.4.1

# or if using 'docker run' (specify image and mounts/ect)
sudo docker run --runtime nvidia -it --rm --network=host dustynv/llamaspeak:r35.4.1

jetson-containers run forwards arguments to docker run with some defaults added (like --runtime nvidia, mounts a /data cache, and detects devices)
autotag finds a container image that's compatible with your version of JetPack/L4T - either locally, pulled from a registry, or by building it.

To mount your own directories into the container, use the -v or --volume flags:

jetson-containers run -v /path/on/host:/path/in/container $(autotag llamaspeak)

To launch the container running a command, as opposed to an interactive shell:

jetson-containers run $(autotag llamaspeak) my_app --abc xyz

You can pass any options to it that you would to docker run, and it'll print out the full command that it constructs before executing it.

BUILD CONTAINER

If you use autotag as shown above, it'll ask to build the container for you if needed. To manually build it, first do the system setup, then run:

jetson-containers build llamaspeak

The dependencies from above will be built into the container, and it'll be tested during. Run it with --help for build options.