Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker_compose example for AMD ROCm deployment #1053

Closed

Conversation

astafevav
Copy link

@astafevav astafevav commented Nov 1, 2024

Type of change

  • Added example for docker compose deployment for AMD ROCm GPU accelerated devices.
  • Added Readme for docker compose deployment for AMD ROCm GPU accelerated devices.
  • Added tests for compose deployment on AMD ROCm systems

@astafevav astafevav force-pushed the add-faqgen-docker-compose-example branch 2 times, most recently from 0254696 to 80820e4 Compare November 1, 2024 12:06
@chensuyue
Copy link
Collaborator

Please do the same test like this #1054

@astafevav astafevav marked this pull request as draft November 5, 2024 12:23
@astafevav astafevav force-pushed the add-faqgen-docker-compose-example branch from 5ca7097 to b40c970 Compare November 6, 2024 06:06
@astafevav astafevav marked this pull request as ready for review November 6, 2024 06:18
@chensuyue chensuyue added the WIP label Nov 6, 2024
@chensuyue
Copy link
Collaborator

chensuyue commented Nov 6, 2024

Something wrong with this PR? The changed files are not related to ROCm.

@astafevav
Copy link
Author

Something wrong with this PR? The changed files are not related to ROCm.

Probably changes was pulled from "main" because of git re-base,

Let's close this one and #1054, and I'll prepare new

@chensuyue
Copy link
Collaborator

Something wrong with this PR? The changed files are not related to ROCm.

Probably changes was pulled from "main" because of git re-base,

Let's close this one and #1054, and I'll prepare new

OK, please paste the new PR link and close the old one.

@chensuyue
Copy link
Collaborator

I suggest to use separate PRs for different examples, it's more efficiency for CI test.

@astafevav astafevav force-pushed the add-faqgen-docker-compose-example branch from b40c970 to ec1f0ab Compare November 6, 2024 13:02
@astafevav
Copy link
Author

Looks like i've succeed to reset and re-commit my branch in forked repo.

Also resolved comments regarding test and test file name mentioned in DocSum PR(#1054)

@chensuyue can we continue working on this PR ?

@chensuyue
Copy link
Collaborator

Looks like i've succeed to reset and re-commit my branch in forked repo.

Also resolved comments regarding test and test file name mentioned in DocSum PR(#1054)

@chensuyue can we continue working on this PR ?

Yes, we can continue use this PR. Let's confirm the connection of the test machine and run the test.

FaqGen/docker_compose/amd/gpu/rocm/README.md Outdated Show resolved Hide resolved
FaqGen/docker_compose/amd/gpu/rocm/README.md Outdated Show resolved Hide resolved
Comment on lines +23 to +28
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended for development, not production, as it disables security measures instead of adding them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD KB recommend to use this option for HPC environments to enable enables memory mapping,
Probably will change in the future when GPU path-through method will be changed to use "--gpus" option for docker and docker compose.

Copy link
Contributor

@eero-t eero-t Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the PTRACE? Surely that's needed only for debugging, and better done with separate container spec adding just that capability, and other extra tooling?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TGI app is crashing when trying to run rocm image without PTRACE, so this capability is still needed,

All docker options were taken from PyTorch installation for ROCm manual

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TGI app is crashing when trying to run rocm image without PTRACE, so this capability is still needed,

Ok.

That's very odd though. Any idea why it requires that, or do you have a backtrace where it's crashing?


stop_docker

if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to make all scripts shellcheck clean (apt install shellscheck; shellcheck *.sh).

@astafevav astafevav force-pushed the add-faqgen-docker-compose-example branch from 11efeda to 03564bc Compare November 8, 2024 05:36
@chensuyue
Copy link
Collaborator

Please fix the CI issue.

@chensuyue
Copy link
Collaborator

@chensuyue
Copy link
Collaborator

Please make sure each commit has been signed with "git commit -s", or the DCO check will failed.

@chensuyue
Copy link
Collaborator

Will this PR also replaced by a new one? like #1054

@astafevav
Copy link
Author

Will this PR also replaced by a new one? like #1054

yes, I will create new separate PR for FaqGen and put link to new PR here.

@astafevav
Copy link
Author

Re-submitted PR: #1126

@astafevav astafevav closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants