update docs

openpsi-project · Jun 20, 2024 · 9ecfe02 · 9ecfe02
1 parent 363c10f
commit 9ecfe02
Show file tree

Hide file tree

Showing 11 changed files with 179 additions and 208 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,5 +1,8 @@
+ARG REAL_CPU_BASE_IMAGE
+ARG REAL_GPU_BASE_IMAGE
+
 # >>>>>> CPU image
-FROM ubuntu:22.04 as cpu
+FROM ${REAL_CPU_BASE_IMAGE} as cpu
 
 ENV DEBIAN_FRONTEND=noninteractive
 RUN apt update
@@ -31,7 +34,7 @@ EXPOSE 80
 CMD ["nginx", "-g", "daemon off;"]
 
 # >>>>>> GPU image
-FROM nvcr.io/nvidia/pytorch:23.10-py3 AS gpu
+FROM ${REAL_GPU_BASE_IMAGE} AS gpu
 
 ENV DEBIAN_FRONTEND=noninteractive
 RUN apt update

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -5,5 +5,8 @@ services:
       context: .
       dockerfile: Dockerfile
       target: docs
+      args:
+        REAL_CPU_BASE_IMAGE: ubuntu:22.04
+        REAL_GPU_BASE_IMAGE: nvcr.io/nvidia/pytorch:23.10-py3
     ports:
       - "7780:80"
diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
@@ -1,18 +1,20 @@
 Contributing
 ###############
 
-This repository is developed and maintained by `Wei Fu <garrett4wade.github.io>`_
-and `Zhiyu Mei <https://openreview.net/profile?id=~Zhiyu_Mei1>`_, both of whom are
-PhD students at `IIIS, Tsinghua University <https://iiis.tsinghua.edu.cn/en/>`_
-advised by Professor `Yi Wu <https://jxwuyi.weebly.com/>`_.
+.. This repository is developed and maintained by `Wei Fu <garrett4wade.github.io>`_
+.. and `Zhiyu Mei <https://openreview.net/profile?id=~Zhiyu_Mei1>`_, both of whom are
+.. PhD students at `IIIS, Tsinghua University <https://iiis.tsinghua.edu.cn/en/>`_
+.. advised by Professor `Yi Wu <https://jxwuyi.weebly.com/>`_.
 
-We acknowledge that due to limited time and resources, 
-the quality of the documentation and code in this repository is not very high. 
-As a result, it can be quite challenging for potential developers to 
-read the code and contribute new features. 
-If you wish to contribute to this repository and have any questions about the code, 
-please do not hesitate to contact us. 
+.. We acknowledge that due to limited time and resources, 
+.. the quality of the documentation and code in this repository is not very high. 
+.. As a result, it can be quite challenging for potential developers to 
+.. read the code and contribute new features. 
+
+If you wish to contribute to this repository or have any questions about the code, 
+please do not hesitate to raise issues or contact us directly. 
 We will do our best to assist you. 
+Currently, there is no template for issues or pull requests.
 
 We hope the open-source community can help improve this repository 
-and enable the RLHF technology to truly empower the applications of LLM.
+and enable RLHF technology to truly empower the applications of LLM.
diff --git a/docs/source/customization.rst b/docs/source/customization.rst
diff --git a/docs/source/distributed.rst b/docs/source/distributed.rst
@@ -1,35 +1,35 @@
 Set Up Distributed Experiments
 ==================================
 
-Currently, ReaL supports launching distrbited experiments using 
+Currently, ReaL supports launching distributed experiments using
 `SLURM <https://slurm.schedmd.com/documentation.html>`_
 with the `Pyxis <https://github.com/NVIDIA/pyxis>`_ plugin.
 This plugin allows for launching enroot containers with the
 ``srun`` command.
 
-To set up distributed experiments, you should write a JSON
-cluster configuration as the example in ``examples/cluster_config.json``.
+To set up distributed experiments, you need to create a JSON
+cluster configuration file, as shown in the example in  ``examples/cluster_config.json``.
 
-- ``cluster_type``: The type of cluster. Currently, only "slurm" is supported.
+- ``cluster_type``: The type of the cluster. Currently, only "slurm" is supported.
 - ``cluster_name``: The name of the cluster. Arbitrary.
-- ``fileroot``: An NFS path that all nodes can access. This is where the log and checkpoints will be stored.
-- ``default_mount``: Comma separated list of paths to mount on all nodes. This should include the above ``fileroot``.
-- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Any host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers low-end GPUs in the cluster.
-- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. GPU type is used by SLURM.
-- ``cpu_image``: The docker image of the controller and the master worker.
-- ``gpu_image``: The docker image of the model worker.
-- ``node_name_prefix``: The prefix of the host names. We assume host names in the cluster is prefixed by a string followed by some integer, e.g., "com-01", where "com-" is the prefix.
+- ``fileroot``: An NFS path accessible by all nodes. This is where logs and checkpoints will be stored.
+- ``default_mount``: A comma-separated list of paths to mount on all nodes. This should include the ``fileroot`` mentioned above..
+- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Every host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers to low-end GPUs in the cluster.
+- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. The GPU type is used by SLURM.
+- ``cpu_image``: The Docker image for the controller and the master worker.
+- ``gpu_image``: The Docker image for the model worker.
+- ``node_name_prefix``: The prefix of the host names. We assume that host names in the cluster are prefixed by a string followed by an integer, e.g., "com-01", where "com-" is the prefix.
 
 The path of this file should be specified in the ``CLUSTER_SPEC_PATH`` environment variable
-inside the used docker images and when launching the experiment. For example,
+inside the Docker images used and when launching the experiment. For example:
 
 .. code-block:: console
 
     CLUSTER_SPEC_PATH=/tmp/my-cluster.json python3 -m realhf.apps.quickstart ppo ...
 
-You also need to add an additional layer in the docker images like the following:
+You also need to add an additional layer in the Docker images as shown below:
 
 .. code-block:: dockerfile
 
-    FROM docker.io/garrett4wade/real-cpu
+    FROM garrett4wade/real-cpu:22.04-0.1.0
     ENV CLUSTER_SPEC_PATH=/tmp/my-cluster.json
diff --git a/docs/source/expconfig.rst b/docs/source/expconfig.rst
@@ -3,7 +3,7 @@ Configurations
 
 We illustrate configurations for quickstart experiments in this page.
 Each type of experiment (e.g., SFT, PPO) corresponds to a specific 
-configuration class (e.g., :class:`realhf.SFTConfig` for SFT).
+configuration object (e.g., :class:`realhf.SFTConfig` for SFT).
 
 Since ReaL uses `Hydra <https://hydra.cc/>`_ for configuration management,
 users can override these options provided by the class recursively
@@ -57,15 +57,15 @@ Dataset Configurations
 ``NamedArray``
 -----------------------
 
-``NamedArray``` is an object we use in model function calls.
+``NamedArray`` is an object we use in model function calls.
 It is inherited from the previous SRL project.
 
 Named array extends plain arrays/tensors in the following ways.
 
 1. NamedArray aggregates multiple arrays, possibly of different shapes.
 2. Each array is given a name, providing a user-friendly way of indexing to the corresponding data.
 3. NamedArrays can be nested. (Although it should *not* be nested in this system.)
-4. NamedArray can store metadata such as sequence length, which is useful for padding and masking without causing CUDA synchronization.
+4. NamedArray can store metadata such as sequence lengths, which is useful for padding and masking without causing CUDA synchronization.
 
 Users can regard it as a nested dictionary of arrays, except that indexing a ``NamedArray`` results in *slicing every hosted arrays* (again, we don't use this feature in this project).
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,39 +6,6 @@
 Welcome to ReaL's documentation!
 ====================================
 
-Highlights of ReaL
------------
-
-**Super-Efficient**
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-ReaL introduces a novel *parameter reallocation* technique. It dynamically shifts parameters and 
-adjusts parallel strategies of LLMs during training. This technique significantly reduces communication 
-overhead and improves GPU utilization for RLHF.
-
-Combined with advanced techniques for LLM training, such as 3D parallelism, ZeRO optimization, and offloading, 
-ReaL can scale RLHF training to hundreds or thousands of GPUs, maintaining high throughput and efficiency.
-
-Beyond large-scale training, ReaL is also memory-efficient with limited resources. For example, ReaL can 
-train 70B LLMs with offloading on a single node.
-
-For more details, check our `introduction page <intro>`_.
-
-**Easy to use**
-~~~~~~~~~~~~~~~~~~~~~~~
-
-Install with PyPI or use our Docker image, then run your experiment with a single command!
-
-Check our `quickstart guide <quickstart>`_ for more details.
-
-**Flexible**
-~~~~~~~~~~~~~~~~~~~~~~~
-
-ReaL's system implementations are fully decoupled from algorithm interfaces. Achieve optimal performance 
-for your customized application within 100 lines of code!
-
-Please refer to our `customization guide <customization>`_ for more details.
-
 Contents
 ----------------
 
@@ -47,10 +14,11 @@ Contents
 
    intro
    install
-   quickstart
    expconfig
+   quickstart
    customization
-   arch
+   .. arch
+
    distributed
    contributing
 

diff --git a/docs/source/install.rst b/docs/source/install.rst
@@ -4,62 +4,56 @@ Installation
 Docker Images
 --------------
 
-The easiest way to run ReaL is to use the provided Docker images.
-We provide a CPU-only image to launch experiments and a runtime GPU
-image to be deployed in the cluster.
-The Dockerfile has been provided in the repository as well.
+The easiest way to run ReaL is by using the provided Docker images.
+We offer a CPU-only image for launching experiments and a runtime GPU
+image for deployment in a cluster. The Dockerfile is also available in the repository.
 
 To pull the images, run:
 
 .. code-block:: console
 
-   $ docker pull docker.io/garrett4wade/real-cpu
-   $ docker pull docker.io/garrett4wade/real-gpu
+   $ docker pull docker.io/garrett4wade/real-cpu:22.04-0.1.0
+   $ docker pull docker.io/garrett4wade/real-gpu:23.10-py3-0.1.0
 
-.. warning::
+The CPU image is built from "ubuntu:22.04" and the GPU image is built from "nvcr.io/nvidia/pytorch:23.10-py3". The current package version is "0.1.0".
 
-   when using these docker images locally, the user should mount the user code directory
-   to path ``/realhf`` in the container. This is because the image shifts an editable
-   installation at ``/realhf``. When the user code overwrites this path, the change of user
-   code will take effect without re-installing this ``realhf`` PyPI package.
+After pulling the Docker images, run your Docker container locally on a GPU node with the following command:
 
-   It's also okay to mount to another location and re-install the package in the container.
+.. code-block:: console
+
+   $ docker run -it --gpus all garrett4wade/real-gpu:23.10-py3-0.1.0 bash
+
+The source code is available at /realhf inside the container. This is an editable installation, so you can modify the code or run experiments directly.
 
-To build the images from scratch, run:
+If you want to develop the code outside a Docker container,
+remember to rerun the editable installation command after mounting:
 
 .. code-block:: console
 
-   $ docker build --target=cpu -t real-cpu .
-   $ docker build --target=gpu -t real-gpu .
+   $ pip install -e /your/mounted/code/path --no-build-isolation
+
 
 Install From PyPI or Source
 ----------------------------
 
-If you don't want to use docker, you can also install ReaL from PyPI
-or from source.
+If you prefer not to use Docker, you can also install ReaL from PyPI or from the source.
 
-Install from PyPI:
+.. note::
 
-.. code-block:: console
+   We don't upload a pre-built wheel to PyPI, so the installation will require compiling the C++ and CUDA extensions. If CUDA is not available on your machine, only the C++ extension will be installed.
 
-   $ pip install realhf --no-build-isolation
+Install from PyPI:
 
-.. note::
+.. code-block:: console
 
-   Installing from the PyPI wheel still requires the user to clone the
-   source code to launch experiments.
+   $ python3 -m pip install realhf --no-build-isolation
 
-Install from source:
+The PyPI package allows you to launch existing experiments with the quickstart command. If you want to modify the code, you should clone the source code and install it from the source:
 
 .. code-block:: console
 
-   $ $ git clone https://github.com/openpsi-project/ReaLHF
+   $ git clone https://github.com/openpsi-project/ReaLHF
    $ cd ReaLHF
-   $ pip install -e . --no-build-isolation
-
-.. note::
+   $ python3 -m pip install -e . --no-build-isolation
 
-   In an environment without CUDA, ReaL will only
-   install necessary Python modules for launching distributed experiments.
-   That's why we have two different docker images for
-   launching and deploying ReaL.
+Next, check :doc:`quickstart`` for instructions on running experiments.
diff --git a/docs/source/intro.rst b/docs/source/intro.rst
@@ -1,13 +1,6 @@
 Introduction
 ----------------
 
-ReaL introduces a novel technique called *Parameter Reallocation*
-(the name *ReaL* is the abbreviation for *ReaLlocation*), which dynamically
-shifts model parameters and changes the parallelization strategy during training.
-This technique can significantly reduce the communication overhead and improve
-GPU utilization in RLHF training, leading to a substantial speedup over the state-of-the-art
-open-source systems.
-
 We observe two major limitations based on our profiling
 of the previous RLHF systems, as shown in the :ref:`timeline`.
 
@@ -39,8 +32,8 @@ The key idea of ReaL is to enable dynamic **reallocation of
 model parameters** between GPUs to improve the efficiency of
 the entire RLHF training process.
 By first choosing a parallelization strategy tailored for
-each model function call
-(e.g., use pipelining for Generation, while tensor parallelism for Training)
+each computation workload
+(e.g., pipelining for Generation and tensor parallelism for Training)
 and then executing these calls concurrently with a smaller
 parallelization degree (e.g., Actor and Critic in Training),
 we can eliminate redundant communication while maximizing GPU utilization,
@@ -51,6 +44,8 @@ prior solutions.
 We show throughput comparison with the state-of-the-art open-source systems
 in the following figure.
 
+(In the following figure, as the number of GPUs increases, the model size scales up from LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.)
+
 .. image:: images/vws.svg
 
 .. "Scale Actor" maintains the sizes

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -10,6 +10,7 @@ First, clone the ReaL repository from GitHub:
 
     $ git clone https://github.com/openpsi-project/ReaLHF
     $ cd ReaLHF
+    $ pip3 install -e . --no-build-isolation
 
 RLHF with 4x LLaMA-7B in 30min
 ------------------------------------------------
@@ -170,7 +171,7 @@ Run the following command to train the reward model:
         dataset.train_bs_n_seqs=512 \
         dataset.valid_bs_n_seqs=512
 
-It's common practice to use the SFT model to initialize the reward model.
+It's a common practice to use the SFT model to initialize the reward model.
 Therefore, we can pass the path of the saved SFT model as the ``model.path`` option.
 Using the pre-trained LLaMA checkpoint is also feasible, but it may not perform as well.
 
@@ -325,7 +326,9 @@ Each GPU can accommodate parameter shards of multiple models (e.g., both the Act
 Between two function calls upon the same model, ReaL will automatically re-allocate
 model parameters between source and destination locations and properly remap
 parallel strategies.
+
 .. The reallocation also includes GPU-to-CPU reallocation, referred to as *offloading*.
+
 This technique can substantially reduce communication overhead caused by parallelization
 and improve GPU utilization.
 Please check :doc:`intro` for more details.

diff --git a/realhf/__init__.py b/realhf/__init__.py
@@ -19,5 +19,6 @@
 from .experiments.common.ppo_exp import PPOConfig, PPOHyperparameters
 from .experiments.common.rw_exp import RWConfig
 from .experiments.common.sft_exp import SFTConfig
+from .base.namedarray import NamedArray
 
 __version__ = "0.1.0"