Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
garrett4wade committed Jun 20, 2024
1 parent 363c10f commit 9ecfe02
Show file tree
Hide file tree
Showing 11 changed files with 179 additions and 208 deletions.
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
ARG REAL_CPU_BASE_IMAGE
ARG REAL_GPU_BASE_IMAGE

# >>>>>> CPU image
FROM ubuntu:22.04 as cpu
FROM ${REAL_CPU_BASE_IMAGE} as cpu

ENV DEBIAN_FRONTEND=noninteractive
RUN apt update
Expand Down Expand Up @@ -31,7 +34,7 @@ EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

# >>>>>> GPU image
FROM nvcr.io/nvidia/pytorch:23.10-py3 AS gpu
FROM ${REAL_GPU_BASE_IMAGE} AS gpu

ENV DEBIAN_FRONTEND=noninteractive
RUN apt update
Expand Down
3 changes: 3 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@ services:
context: .
dockerfile: Dockerfile
target: docs
args:
REAL_CPU_BASE_IMAGE: ubuntu:22.04
REAL_GPU_BASE_IMAGE: nvcr.io/nvidia/pytorch:23.10-py3
ports:
- "7780:80"
24 changes: 13 additions & 11 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
Contributing
###############

This repository is developed and maintained by `Wei Fu <garrett4wade.github.io>`_
and `Zhiyu Mei <https://openreview.net/profile?id=~Zhiyu_Mei1>`_, both of whom are
PhD students at `IIIS, Tsinghua University <https://iiis.tsinghua.edu.cn/en/>`_
advised by Professor `Yi Wu <https://jxwuyi.weebly.com/>`_.
.. This repository is developed and maintained by `Wei Fu <garrett4wade.github.io>`_
.. and `Zhiyu Mei <https://openreview.net/profile?id=~Zhiyu_Mei1>`_, both of whom are
.. PhD students at `IIIS, Tsinghua University <https://iiis.tsinghua.edu.cn/en/>`_
.. advised by Professor `Yi Wu <https://jxwuyi.weebly.com/>`_.
We acknowledge that due to limited time and resources,
the quality of the documentation and code in this repository is not very high.
As a result, it can be quite challenging for potential developers to
read the code and contribute new features.
If you wish to contribute to this repository and have any questions about the code,
please do not hesitate to contact us.
.. We acknowledge that due to limited time and resources,
.. the quality of the documentation and code in this repository is not very high.
.. As a result, it can be quite challenging for potential developers to
.. read the code and contribute new features.
If you wish to contribute to this repository or have any questions about the code,
please do not hesitate to raise issues or contact us directly.
We will do our best to assist you.
Currently, there is no template for issues or pull requests.

We hope the open-source community can help improve this repository
and enable the RLHF technology to truly empower the applications of LLM.
and enable RLHF technology to truly empower the applications of LLM.
204 changes: 103 additions & 101 deletions docs/source/customization.rst

Large diffs are not rendered by default.

28 changes: 14 additions & 14 deletions docs/source/distributed.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
Set Up Distributed Experiments
==================================

Currently, ReaL supports launching distrbited experiments using
Currently, ReaL supports launching distributed experiments using
`SLURM <https://slurm.schedmd.com/documentation.html>`_
with the `Pyxis <https://github.com/NVIDIA/pyxis>`_ plugin.
This plugin allows for launching enroot containers with the
``srun`` command.

To set up distributed experiments, you should write a JSON
cluster configuration as the example in ``examples/cluster_config.json``.
To set up distributed experiments, you need to create a JSON
cluster configuration file, as shown in the example in ``examples/cluster_config.json``.

- ``cluster_type``: The type of cluster. Currently, only "slurm" is supported.
- ``cluster_type``: The type of the cluster. Currently, only "slurm" is supported.
- ``cluster_name``: The name of the cluster. Arbitrary.
- ``fileroot``: An NFS path that all nodes can access. This is where the log and checkpoints will be stored.
- ``default_mount``: Comma separated list of paths to mount on all nodes. This should include the above ``fileroot``.
- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Any host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers low-end GPUs in the cluster.
- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. GPU type is used by SLURM.
- ``cpu_image``: The docker image of the controller and the master worker.
- ``gpu_image``: The docker image of the model worker.
- ``node_name_prefix``: The prefix of the host names. We assume host names in the cluster is prefixed by a string followed by some integer, e.g., "com-01", where "com-" is the prefix.
- ``fileroot``: An NFS path accessible by all nodes. This is where logs and checkpoints will be stored.
- ``default_mount``: A comma-separated list of paths to mount on all nodes. This should include the ``fileroot`` mentioned above..
- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Every host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers to low-end GPUs in the cluster.
- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. The GPU type is used by SLURM.
- ``cpu_image``: The Docker image for the controller and the master worker.
- ``gpu_image``: The Docker image for the model worker.
- ``node_name_prefix``: The prefix of the host names. We assume that host names in the cluster are prefixed by a string followed by an integer, e.g., "com-01", where "com-" is the prefix.

The path of this file should be specified in the ``CLUSTER_SPEC_PATH`` environment variable
inside the used docker images and when launching the experiment. For example,
inside the Docker images used and when launching the experiment. For example:

.. code-block:: console
CLUSTER_SPEC_PATH=/tmp/my-cluster.json python3 -m realhf.apps.quickstart ppo ...
You also need to add an additional layer in the docker images like the following:
You also need to add an additional layer in the Docker images as shown below:

.. code-block:: dockerfile
FROM docker.io/garrett4wade/real-cpu
FROM garrett4wade/real-cpu:22.04-0.1.0
ENV CLUSTER_SPEC_PATH=/tmp/my-cluster.json
6 changes: 3 additions & 3 deletions docs/source/expconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Configurations

We illustrate configurations for quickstart experiments in this page.
Each type of experiment (e.g., SFT, PPO) corresponds to a specific
configuration class (e.g., :class:`realhf.SFTConfig` for SFT).
configuration object (e.g., :class:`realhf.SFTConfig` for SFT).

Since ReaL uses `Hydra <https://hydra.cc/>`_ for configuration management,
users can override these options provided by the class recursively
Expand Down Expand Up @@ -57,15 +57,15 @@ Dataset Configurations
``NamedArray``
-----------------------

``NamedArray``` is an object we use in model function calls.
``NamedArray`` is an object we use in model function calls.
It is inherited from the previous SRL project.

Named array extends plain arrays/tensors in the following ways.

1. NamedArray aggregates multiple arrays, possibly of different shapes.
2. Each array is given a name, providing a user-friendly way of indexing to the corresponding data.
3. NamedArrays can be nested. (Although it should *not* be nested in this system.)
4. NamedArray can store metadata such as sequence length, which is useful for padding and masking without causing CUDA synchronization.
4. NamedArray can store metadata such as sequence lengths, which is useful for padding and masking without causing CUDA synchronization.

Users can regard it as a nested dictionary of arrays, except that indexing a ``NamedArray`` results in *slicing every hosted arrays* (again, we don't use this feature in this project).

Expand Down
38 changes: 3 additions & 35 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,39 +6,6 @@
Welcome to ReaL's documentation!
====================================

Highlights of ReaL
-----------

**Super-Efficient**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ReaL introduces a novel *parameter reallocation* technique. It dynamically shifts parameters and
adjusts parallel strategies of LLMs during training. This technique significantly reduces communication
overhead and improves GPU utilization for RLHF.

Combined with advanced techniques for LLM training, such as 3D parallelism, ZeRO optimization, and offloading,
ReaL can scale RLHF training to hundreds or thousands of GPUs, maintaining high throughput and efficiency.

Beyond large-scale training, ReaL is also memory-efficient with limited resources. For example, ReaL can
train 70B LLMs with offloading on a single node.

For more details, check our `introduction page <intro>`_.

**Easy to use**
~~~~~~~~~~~~~~~~~~~~~~~

Install with PyPI or use our Docker image, then run your experiment with a single command!

Check our `quickstart guide <quickstart>`_ for more details.

**Flexible**
~~~~~~~~~~~~~~~~~~~~~~~

ReaL's system implementations are fully decoupled from algorithm interfaces. Achieve optimal performance
for your customized application within 100 lines of code!

Please refer to our `customization guide <customization>`_ for more details.

Contents
----------------

Expand All @@ -47,10 +14,11 @@ Contents

intro
install
quickstart
expconfig
quickstart
customization
arch
.. arch

distributed
contributing

Expand Down
58 changes: 26 additions & 32 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,62 +4,56 @@ Installation
Docker Images
--------------

The easiest way to run ReaL is to use the provided Docker images.
We provide a CPU-only image to launch experiments and a runtime GPU
image to be deployed in the cluster.
The Dockerfile has been provided in the repository as well.
The easiest way to run ReaL is by using the provided Docker images.
We offer a CPU-only image for launching experiments and a runtime GPU
image for deployment in a cluster. The Dockerfile is also available in the repository.

To pull the images, run:

.. code-block:: console
$ docker pull docker.io/garrett4wade/real-cpu
$ docker pull docker.io/garrett4wade/real-gpu
$ docker pull docker.io/garrett4wade/real-cpu:22.04-0.1.0
$ docker pull docker.io/garrett4wade/real-gpu:23.10-py3-0.1.0
.. warning::
The CPU image is built from "ubuntu:22.04" and the GPU image is built from "nvcr.io/nvidia/pytorch:23.10-py3". The current package version is "0.1.0".

when using these docker images locally, the user should mount the user code directory
to path ``/realhf`` in the container. This is because the image shifts an editable
installation at ``/realhf``. When the user code overwrites this path, the change of user
code will take effect without re-installing this ``realhf`` PyPI package.
After pulling the Docker images, run your Docker container locally on a GPU node with the following command:

It's also okay to mount to another location and re-install the package in the container.
.. code-block:: console
$ docker run -it --gpus all garrett4wade/real-gpu:23.10-py3-0.1.0 bash
The source code is available at /realhf inside the container. This is an editable installation, so you can modify the code or run experiments directly.

To build the images from scratch, run:
If you want to develop the code outside a Docker container,
remember to rerun the editable installation command after mounting:

.. code-block:: console
$ docker build --target=cpu -t real-cpu .
$ docker build --target=gpu -t real-gpu .
$ pip install -e /your/mounted/code/path --no-build-isolation
Install From PyPI or Source
----------------------------

If you don't want to use docker, you can also install ReaL from PyPI
or from source.
If you prefer not to use Docker, you can also install ReaL from PyPI or from the source.

Install from PyPI:
.. note::

.. code-block:: console
We don't upload a pre-built wheel to PyPI, so the installation will require compiling the C++ and CUDA extensions. If CUDA is not available on your machine, only the C++ extension will be installed.

$ pip install realhf --no-build-isolation
Install from PyPI:

.. note::
.. code-block:: console
Installing from the PyPI wheel still requires the user to clone the
source code to launch experiments.
$ python3 -m pip install realhf --no-build-isolation
Install from source:
The PyPI package allows you to launch existing experiments with the quickstart command. If you want to modify the code, you should clone the source code and install it from the source:

.. code-block:: console
$ $ git clone https://github.com/openpsi-project/ReaLHF
$ git clone https://github.com/openpsi-project/ReaLHF
$ cd ReaLHF
$ pip install -e . --no-build-isolation
.. note::
$ python3 -m pip install -e . --no-build-isolation
In an environment without CUDA, ReaL will only
install necessary Python modules for launching distributed experiments.
That's why we have two different docker images for
launching and deploying ReaL.
Next, check :doc:`quickstart`` for instructions on running experiments.

Check warning on line 59 in docs/source/install.rst

View workflow job for this annotation

GitHub Actions / build

unknown document: 'quickstart`'
13 changes: 4 additions & 9 deletions docs/source/intro.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,6 @@
Introduction
----------------

ReaL introduces a novel technique called *Parameter Reallocation*
(the name *ReaL* is the abbreviation for *ReaLlocation*), which dynamically
shifts model parameters and changes the parallelization strategy during training.
This technique can significantly reduce the communication overhead and improve
GPU utilization in RLHF training, leading to a substantial speedup over the state-of-the-art
open-source systems.

We observe two major limitations based on our profiling
of the previous RLHF systems, as shown in the :ref:`timeline`.

Expand Down Expand Up @@ -39,8 +32,8 @@ The key idea of ReaL is to enable dynamic **reallocation of
model parameters** between GPUs to improve the efficiency of
the entire RLHF training process.
By first choosing a parallelization strategy tailored for
each model function call
(e.g., use pipelining for Generation, while tensor parallelism for Training)
each computation workload
(e.g., pipelining for Generation and tensor parallelism for Training)
and then executing these calls concurrently with a smaller
parallelization degree (e.g., Actor and Critic in Training),
we can eliminate redundant communication while maximizing GPU utilization,
Expand All @@ -51,6 +44,8 @@ prior solutions.
We show throughput comparison with the state-of-the-art open-source systems
in the following figure.

(In the following figure, as the number of GPUs increases, the model size scales up from LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.)

.. image:: images/vws.svg

.. "Scale Actor" maintains the sizes
Expand Down
5 changes: 4 additions & 1 deletion docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ First, clone the ReaL repository from GitHub:
$ git clone https://github.com/openpsi-project/ReaLHF
$ cd ReaLHF
$ pip3 install -e . --no-build-isolation
RLHF with 4x LLaMA-7B in 30min
------------------------------------------------
Expand Down Expand Up @@ -170,7 +171,7 @@ Run the following command to train the reward model:
dataset.train_bs_n_seqs=512 \
dataset.valid_bs_n_seqs=512
It's common practice to use the SFT model to initialize the reward model.
It's a common practice to use the SFT model to initialize the reward model.
Therefore, we can pass the path of the saved SFT model as the ``model.path`` option.
Using the pre-trained LLaMA checkpoint is also feasible, but it may not perform as well.

Expand Down Expand Up @@ -325,7 +326,9 @@ Each GPU can accommodate parameter shards of multiple models (e.g., both the Act
Between two function calls upon the same model, ReaL will automatically re-allocate
model parameters between source and destination locations and properly remap
parallel strategies.

.. The reallocation also includes GPU-to-CPU reallocation, referred to as *offloading*.
This technique can substantially reduce communication overhead caused by parallelization
and improve GPU utilization.
Please check :doc:`intro` for more details.
Expand Down
1 change: 1 addition & 0 deletions realhf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,6 @@
from .experiments.common.ppo_exp import PPOConfig, PPOHyperparameters
from .experiments.common.rw_exp import RWConfig
from .experiments.common.sft_exp import SFTConfig
from .base.namedarray import NamedArray

__version__ = "0.1.0"

0 comments on commit 9ecfe02

Please sign in to comment.