Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 hardware agnostic front and backend #5

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
070db5c
'Add AMD support for TorchServe'
smedegaard Nov 1, 2024
ce19723
Update README.md with rocm flags
smedegaard Nov 11, 2024
0fad8e2
add rocm to CONTRIBUTING.md
smedegaard Nov 11, 2024
3247498
WorkerLifeCycle uses SystemInfo to get X_VISIBLE_DEVICES
smedegaard Nov 12, 2024
bae9b2c
AppleUtil adds Accelerator `number_of_cores` times
smedegaard Nov 12, 2024
88f3cb8
fix typo in README.md
smedegaard Nov 13, 2024
8e4d24c
remove mention of java version from README.md
smedegaard Nov 13, 2024
ff4daa8
revert unnecessary changes
samutamm Nov 14, 2024
0bc3e3c
Fix import errors in AppleUtils
jakki-amd Nov 14, 2024
1e635e1
remove rocm support from dockerfile.dev to simplify
samutamm Nov 14, 2024
1647826
fix missing newline
samutamm Nov 14, 2024
0dc5145
revert unnecessary changes
samutamm Nov 14, 2024
f905d0e
'improve formatting for amd_support.md'
Nov 14, 2024
9a515b8
Fix AppleUtils tests
jakki-amd Nov 18, 2024
9d30159
fixes 11. parse-metrics-failed-collecting-amd-gpu-metrics (#24)
smedegaard Nov 20, 2024
8cdf54b
extend testMetricManager
Nov 20, 2024
bd95835
Merge pull request #25 from nod-ai/9-extend-java-testmetricmanager
eppane Nov 21, 2024
e5d382f
Add latest ROCM support
Nov 14, 2024
607d836
Merge pull request #26 from nod-ai/19-add-support-for-latest-torch-rocm
jakki-amd Nov 21, 2024
f2d17d5
PR 24 system_metrics bugfix
Nov 22, 2024
49bc051
Format files
jakki-amd Nov 22, 2024
4bff6d3
Update docs/hardware_support/amd_support.md
smedegaard Nov 26, 2024
b9a1627
typo in docs/hardware_support/amd_support.md
smedegaard Nov 26, 2024
964e5f1
Update docs/hardware_support/amd_support.md
smedegaard Nov 26, 2024
61da32e
Update docs/hardware_support/amd_support.md
smedegaard Nov 26, 2024
0a4d628
remove pyrsmi and nvgpu deps
Nov 26, 2024
aa96f2f
metric collector revert gpu arg name
Nov 26, 2024
a26eefb
fix number of metrics assertion in testMetricManager
Nov 26, 2024
f0b1dfb
'move Intel docs under Hardware Support' (#31)
smedegaard Nov 27, 2024
d330494
Fix docstring
jakki-amd Nov 27, 2024
cbdfe25
Add Dockerfile.rocm
jakki-amd Nov 28, 2024
8330233
Remove sharing lock from bind mounts
jakki-amd Nov 28, 2024
9e5afd0
Update Dockerfile.rocm
jakki-amd Nov 29, 2024
8f35524
Revert Dockerfile changes
jakki-amd Nov 29, 2024
f5ce2ec
Update documentation for Docker support
jakki-amd Nov 29, 2024
f03d0fd
Merge branch 'master' into 2-hardware-agnostic-front-and-backend
jakki-amd Nov 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,9 @@ instances.yaml.backup
# cpp
cpp/_build
cpp/third-party

# projects
.tool-versions
**/*/.classpath
**/*/.settings
**/*/.project
57 changes: 25 additions & 32 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,7 @@ Your contributions will fall into two categories:
- Search for your issue here: https://github.com/pytorch/serve/issues (look for the "good first issue" tag if you're a first time contributor)
- Pick an issue and comment on the task that you want to work on this feature.
- To ensure your changes doesn't break any of the existing features run the sanity suite as follows from serve directory:
- Install dependencies (if not already installed)
For CPU

```bash
python ts_scripts/install_dependencies.py --environment=dev
smedegaard marked this conversation as resolved.
Show resolved Hide resolved
```

For GPU
```bash
python ts_scripts/install_dependencies.py --environment=dev --cuda=cu121
```
> Supported cuda versions as cu121, cu118, cu117, cu116, cu113, cu111, cu102, cu101, cu92
- [Install dependencies](#Install-TorchServe-for-development) (if not already installed)
- Install `pre-commit` to your Git flow:
```bash
pre-commit install
Expand Down Expand Up @@ -60,26 +49,30 @@ pytest -k test/pytest/test_mnist_template.py

If you plan to develop with TorchServe and change some source code, you must install it from source code.

Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable.

Run the following script from the top of the source directory.

NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found

#### For Debian Based Systems/ MacOS

```
python ./ts_scripts/install_dependencies.py --environment=dev
python ./ts_scripts/install_from_src.py --environment=dev
```

Use `--cuda` flag with `install_dependencies.py` for installing cuda version specific dependencies. Possible values are `cu111`, `cu102`, `cu101`, `cu92`

#### For Windows

Refer to the documentation [here](docs/torchserve_on_win_native.md).

For information about the model archiver, see [detailed documentation](model-archiver/README.md).
1. Clone the repository, including third-party modules, with `git clone --recurse-submodules --remote-submodules [email protected]:pytorch/serve.git`
eppane marked this conversation as resolved.
Show resolved Hide resolved
2. Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable.
3. Run the following script from the top of the source directory. NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found

#### For Debian Based Systems/MacOS

```
python ./ts_scripts/install_dependencies.py --environment=dev
python ./ts_scripts/install_from_src.py --environment=dev
```
##### Installing Dependencies for Accelerator Support
Use the optional `--rocm` or `--cuda` flag with `install_dependencies.py` for installing accelerator specific dependencies.

Possible values are
- rocm: `rocm61`, `rocm60`
- cuda: `cu111`, `cu102`, `cu101`, `cu92`

For example `python ./ts_scripts/install_dependencies.py --environment=dev --rocm=rocm61`

#### For Windows

Refer to the documentation [here](docs/torchserve_on_win_native.md).

For information about the model archiver, see [detailed documentation](model-archiver/README.md).

### What to Contribute?

Expand Down
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ TorchServe now enforces token authorization enabled and model API control disabl

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8
Requires:
- python >= 3.8
- Java >= 17
smedegaard marked this conversation as resolved.
Show resolved Hide resolved

```bash
curl http://127.0.0.1:8080/predictions/bert -T input.txt
Expand All @@ -22,7 +24,10 @@ curl http://127.0.0.1:8080/predictions/bert -T input.txt

```bash
# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py

# Include depeendencies for accelerator support with the relevant optional flags
smedegaard marked this conversation as resolved.
Show resolved Hide resolved
python ./ts_scripts/install_dependencies.py --rocm=rocm61
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
Expand All @@ -36,7 +41,10 @@ pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archi

```bash
# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py

# Include depeendencies for accelerator support with the relevant optional flags
smedegaard marked this conversation as resolved.
Show resolved Hide resolved
python ./ts_scripts/install_dependencies.py --rocm=rocm61
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
Expand Down
37 changes: 32 additions & 5 deletions docker/Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# For reference:
# https://docs.docker.com/develop/develop-images/build_enhancements/

ARG BASE_IMAGE=ubuntu:rolling
ARG BASE_IMAGE=ubuntu:24.04
eppane marked this conversation as resolved.
Show resolved Hide resolved
ARG BUILD_TYPE=dev
FROM ${BASE_IMAGE} AS compile-image

Expand All @@ -19,6 +19,7 @@ ARG BRANCH_NAME=master
ARG REPO_URL=https://github.com/pytorch/serve.git
ARG MACHINE_TYPE=cpu
ARG CUDA_VERSION
ARG ROCM_VERSION

ARG BUILD_WITH_IPEX
ARG IPEX_VERSION=1.11.0
Expand All @@ -41,14 +42,16 @@ RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
git \
python$PYTHON_VERSION \
python$PYTHON_VERSION-dev \
python3-distutils \
python3-setuptools \
python$PYTHON_VERSION-venv \
python3-venv \
build-essential \
openjdk-17-jdk \
curl \
vim \
numactl \
zip \
wget \
&& if [ "$BUILD_WITH_IPEX" = "true" ]; then apt-get update && apt-get install -y libjemalloc-dev libgoogle-perftools-dev libomp-dev && ln -s /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/libjemalloc.so && ln -s /usr/lib/x86_64-linux-gnu/libtcmalloc.so /usr/lib/libtcmalloc.so && ln -s /usr/lib/x86_64-linux-gnu/libiomp5.so /usr/lib/libiomp5.so; fi \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp \
Expand All @@ -58,19 +61,43 @@ RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
RUN update-alternatives --install /usr/bin/python python /usr/bin/python$PYTHON_VERSION 1 \
&& update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
if [ -n "$ROCM_VERSION" ]; then \
apt-get update \
&& wget https://repo.radeon.com/amdgpu-install/6.2.2/ubuntu/noble/amdgpu-install_6.2.60202-1_all.deb \
&& DEBIAN_FRONTEND=noninteractive sudo apt-get install -y ./amdgpu-install_6.2.60202-1_all.deb \
&& sudo apt-get update \
&& sudo apt-get install --no-install-recommends -y amdgpu-dkms rocm \
&& cd /home/; \
else \
echo "Skip ROCm installation"; \
fi

# Build Dev Image
FROM compile-image AS dev-image
ARG MACHINE_TYPE=cpu
ARG CUDA_VERSION
RUN if [ "$MACHINE_TYPE" = "gpu" ]; then export USE_CUDA=1; fi \
RUN if [ "$MACHINE_TYPE" = "nvidia_gpu" ]; then export USE_CUDA=1; fi \
&& git clone $REPO_URL \
&& cd serve \
&& git checkout ${BRANCH_NAME} \
&& python$PYTHON_VERSION -m venv /home/venv
ENV PATH="/home/venv/bin:$PATH"
WORKDIR serve

COPY . .

RUN python -m pip install -U pip setuptools \
&& if [ -z "$CUDA_VERSION" ]; then python ts_scripts/install_dependencies.py --environment=dev; else python ts_scripts/install_dependencies.py --environment=dev --cuda $CUDA_VERSION; fi \
&& if ([ -z "$CUDA_VERSION" ] && [ -z "$ROCM_VERSION" ]); then \
python ts_scripts/install_dependencies.py --environment=dev; \
elif [ -n "$ROCM_VERSION" ]; then \
python ts_scripts/install_dependencies.py --environment=dev --rocm $ROCM_VERSION \
&& cd /opt/rocm/share/amd_smi \
&& pip install . \
&& cd /serve/; \
else \
python ts_scripts/install_dependencies.py --environment=dev --cuda $CUDA_VERSION; \
fi \
&& if [ "$BUILD_WITH_IPEX" = "true" ]; then python -m pip install --no-cache-dir intel_extension_for_pytorch==${IPEX_VERSION} -f ${IPEX_URL}; fi \
&& python ts_scripts/install_from_src.py \
&& useradd -m model-server \
Expand All @@ -83,7 +110,6 @@ RUN python -m pip install -U pip setuptools \
&& chown -R model-server /home/venv

EXPOSE 8080 8081 8082 7070 7071
USER model-server
samutamm marked this conversation as resolved.
Show resolved Hide resolved
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
Expand Down Expand Up @@ -112,4 +138,5 @@ RUN set -ex \

FROM ${BUILD_TYPE}-image AS final-image
ARG BUILD_TYPE
ENV CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
samutamm marked this conversation as resolved.
Show resolved Hide resolved
RUN echo "${BUILD_TYPE} image creation completed"
4 changes: 2 additions & 2 deletions frontend/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ def javaProjects() {

configure(javaProjects()) {
apply plugin: 'java-library'
sourceCompatibility = 1.8
targetCompatibility = 1.8
sourceCompatibility = JavaVersion.VERSION_17
targetCompatibility = JavaVersion.VERSION_17

defaultTasks 'jar'

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package org.pytorch.serve.device;

import java.text.MessageFormat;
import org.pytorch.serve.device.interfaces.IAcceleratorUtility;

public class Accelerator {
public final Integer id;
public final AcceleratorVendor vendor;
public final String model;
public IAcceleratorUtility acceleratorUtility;
public Float usagePercentage;
public Float memoryUtilizationPercentage;
public Integer memoryAvailableMegabytes;
public Integer memoryUtilizationMegabytes;

public Accelerator(String acceleratorName, AcceleratorVendor vendor, Integer gpuId) {
this.model = acceleratorName;
this.vendor = vendor;
this.id = gpuId;
this.usagePercentage = (float) 0.0;
this.memoryUtilizationPercentage = (float) 0.0;
this.memoryAvailableMegabytes = 0;
this.memoryUtilizationMegabytes = 0;
}

// Getters
public Integer getMemoryAvailableMegaBytes() {
return memoryAvailableMegabytes;
}

public AcceleratorVendor getVendor() {
return vendor;
}

public String getAcceleratorModel() {
return model;
}

public Integer getAcceleratorId() {
return id;
}

public Float getUsagePercentage() {
return usagePercentage;
}

public Float getMemoryUtilizationPercentage() {
return memoryUtilizationPercentage;
}

public Integer getMemoryUtilizationMegabytes() {
return memoryUtilizationMegabytes;
}

// Setters
public void setMemoryAvailableMegaBytes(Integer memoryAvailable) {
this.memoryAvailableMegabytes = memoryAvailable;
}

public void setUsagePercentage(Float acceleratorUtilization) {
this.usagePercentage = acceleratorUtilization;
}

public void setMemoryUtilizationPercentage(Float memoryUtilizationPercentage) {
this.memoryUtilizationPercentage = memoryUtilizationPercentage;
}

public void setMemoryUtilizationMegabytes(Integer memoryUtilizationMegabytes) {
this.memoryUtilizationMegabytes = memoryUtilizationMegabytes;
}

// Other Methods
public String utilizationToString() {
final String message =
MessageFormat.format(
"gpuId::{0} utilization.gpu::{1} % utilization.memory::{2} % memory.used::{3} MiB",
id,
usagePercentage,
memoryUtilizationPercentage,
memoryUtilizationMegabytes);

return message;
}

public void updateDynamicAttributes(Accelerator updated) {
this.usagePercentage = updated.usagePercentage;
this.memoryUtilizationPercentage = updated.memoryUtilizationPercentage;
this.memoryUtilizationMegabytes = updated.memoryUtilizationMegabytes;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
package org.pytorch.serve.device;

public enum AcceleratorVendor {
AMD,
NVIDIA,
INTEL,
APPLE,
UNKNOWN
}
Loading