Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] OPEA docker container and habana runtime issue #1164

Open
1 of 6 tasks
anatu-git opened this issue Nov 19, 2024 · 3 comments
Open
1 of 6 tasks

[Bug] OPEA docker container and habana runtime issue #1164

anatu-git opened this issue Nov 19, 2024 · 3 comments
Assignees

Comments

@anatu-git
Copy link

anatu-git commented Nov 19, 2024

Priority

P3-Medium

OS type

Ubuntu

Hardware type

Xeon-ICX

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

commit ID: 179fd84
https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/hpu/gaudi

Description

  • The system hits an error when running docker-compose
  • habana-container-runtime is already installed on the system
  • Docker daemon.json also points to the habana-container-runtime (see excerpt-1)
  • Error response below
docker compose up -d

[+] Running 0/4
 ⠋ Container tei-reranking-gaudi-server  Creating                                                                                                                                                        0.1s
 ⠋ Container tei-embedding-gaudi-server  Creating                                                                                                                                                        0.1s
 ⠋ Container tgi-gaudi-server            Creating                                                                                                                                                        0.1s
 ⠋ Container redis-vector-db             Creating                                                                                                                                                        0.1s
Error response from daemon: unknown or invalid runtime name: habana
  • Excerpt-1
cat /etc/docker/daemon.json
{
   "default-runtime": "habana",
   "runtimes": {
      "habana": {
         "path": "/usr/bin/habana-container-runtime",
         "runtimeArgs": []
      }
   }
}


Reproduce steps

Steps to reproduce can be found here: Link

  • export variables
export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"

export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"

export http_proxy="Your_HTTP_Proxy"
export https_proxy="Your_HTTPs_Proxy"
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy",chatqna-gaudi-ui-server,chatqna-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm_service,guardrails


  • source environment file
source ./set_env.sh


  • Run docker compose
docker compose up -d

Raw log

N/A
@wangkl2 wangkl2 self-assigned this Nov 19, 2024
@wangkl2
Copy link
Collaborator

wangkl2 commented Nov 19, 2024

@anatu-git Could you please first try to create and run the base docker for gaudi? Note: modify the docker image properly based on your OS version.
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest hl-smi

@anatu-git
Copy link
Author

Hi @wangkl2 ,

it looks like I had multiple versions of docker running on the system.
After disabling the bad docker versions and enabling the right ones I see that issue/error message goes away.
However, I am facing an issue, in that the container deployment doesn't download any model or data under the .data folder

# du -sh ./data/
8.0K    ./data/

@wangkl2
Copy link
Collaborator

wangkl2 commented Nov 22, 2024

@anatu-git Please check the container log of tgi backend/embedding/reranking for the error message during downloading the models. If there is connection issues to the huggingface, please check the network connection and check whether there should be some proxy settings. And be sure to set HUGGINGFACEHUB_API_TOKEN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants