Skip to content

Commit

Permalink
deploy: f54a9f1
Browse files Browse the repository at this point in the history
  • Loading branch information
garrett4wade committed Sep 5, 2024
1 parent 9570131 commit 2595e4d
Show file tree
Hide file tree
Showing 18 changed files with 243 additions and 364 deletions.
2 changes: 1 addition & 1 deletion .buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 6464afc302baa1a42bda9f4a4714c561
config: 6f9697955cd160a7b5971ef4a95c4f01
tags: 645f666f9bcd5a90fca523b33c5a78b7
11 changes: 11 additions & 0 deletions _sources/contributing.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,14 @@ The GitHub Pages will be updated automatically after the PR is merged.
pytest -m "not distributed"
# On a node with multiple GPUs, run all tests
pytest
************************
Building Docker Images
************************

.. code:: bash
# Build the GPU image
docker build -t real-gpu:24.03-0.3.0 -f Dockerfile --target gpu --build-arg REAL_GPU_BASE_IMAGE=nvcr.io/nvidia/pytorch:24.03-py3 --build-arg REAL_CPU_BASE_IMAGE=ubuntu:22.04 .
# Build the CPU image
docker build -t real-cpu:22.04-0.3.0 -f Dockerfile --target cpu --build-arg REAL_GPU_BASE_IMAGE=nvcr.io/nvidia/pytorch:24.03-py3 --build-arg REAL_CPU_BASE_IMAGE=ubuntu:22.04 .
43 changes: 12 additions & 31 deletions _sources/install.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,51 +15,32 @@ To pull the images, run:

.. code:: console
$ docker pull docker.io/garrett4wade/real-cpu:22.04-${REAL_VERSION}
$ docker pull docker.io/garrett4wade/real-gpu:23.10-py3-${REAL_VERSION}
$ docker pull docker.io/garrett4wade/real-cpu:22.04-0.3.0
$ docker pull docker.io/garrett4wade/real-gpu:24.03-py3-0.3.0
The CPU image is built from "ubuntu:22.04" and the GPU image is built
from "nvcr.io/nvidia/pytorch:23.10-py3". You can check the latest
package version `here
<https://github.com/openpsi-project/ReaLHF/releases>`_.
from "nvcr.io/nvidia/pytorch:24.03-py3". You can check the latest docker
image version `here
<https://hub.docker.com/r/garrett4wade/real-gpu/tags>`_.

After pulling the Docker images, run your Docker container locally on a
GPU node with the following command:

.. code:: console
$ docker run -it --rm --gpus all garrett4wade/real-gpu:23.10-py3-${REAL_VERSION} bash
$ docker run -it --rm --gpus all --mount type=bind,src=/path/outside/container,dst=/realhf garrett4wade/real-gpu:24.03-py3-0.3.0 bash
The source code is available at ``/realhf`` inside the container. This
is an editable installation, so you can modify the code or run
experiments directly.

If you want to develop the code outside a Docker container, you should
mount the code directory to the container, e.g.,

.. code:: console
$ docker run -it --rm --gpus all --mount type=bind,src=/path/outside/container,dst=/realhf garrett4wade/real-gpu:23.10-py3-${REAL_VERSION} bash
If your destination path is not ``/realhf``, remember to rerun the
editable installation command after mounting:

.. code:: console
$ REAL_CUDA=1 pip install -e /your/mounted/code/path --no-build-isolation
.. note::

The ``REAL_CUDA`` environment variable is used to install the CUDA
extension.
There is an editable installation at ``/realhf`` inside the container,
so your change to the code outside the container should automatically
takes effect.

*****************************
Install From PyPI or Source
*****************************

If you prefer not to use the provided Docker image, you can also start
with an image provided by NVIDA (e.g.,
``nvcr.io/nvidia/pytorch:23.10-py3``) and install ReaL from PyPI or from
``nvcr.io/nvidia/pytorch:24.03-py3``) and install ReaL from PyPI or from
the source.

.. note::
Expand Down Expand Up @@ -89,9 +70,9 @@ On a GPU machine, also install the required runtime packages:
.. code:: console
$ export MAX_JOBS=8 # Set the number of parallel jobs for compilation.
$ pip install git+https://github.com/NVIDIA/TransformerEngine.git@v1.4 --no-deps --no-build-isolation
$ pip install git+https://github.com/NVIDIA/TransformerEngine.git@v1.8 --no-deps --no-build-isolation
$ pip install flash_attn==2.4.2 --no-build-isolation
$ pip install grouped_gemm # For MoE
$ pip3 install git+https://github.com/tgale96/grouped_gemm[email protected] --no-build-isolation --no-deps # For MoE
.. note::

Expand Down
Loading

0 comments on commit 2595e4d

Please sign in to comment.