diff --git a/_sources/contributing.rst.txt b/_sources/contributing.rst.txt index 92323658..9547bba2 100644 --- a/_sources/contributing.rst.txt +++ b/_sources/contributing.rst.txt @@ -1,18 +1,20 @@ Contributing ############### -This repository is developed and maintained by `Wei Fu `_ -and `Zhiyu Mei `_, both of whom are -PhD students at `IIIS, Tsinghua University `_ -advised by Professor `Yi Wu `_. +.. This repository is developed and maintained by `Wei Fu `_ +.. and `Zhiyu Mei `_, both of whom are +.. PhD students at `IIIS, Tsinghua University `_ +.. advised by Professor `Yi Wu `_. -We acknowledge that due to limited time and resources, -the quality of the documentation and code in this repository is not very high. -As a result, it can be quite challenging for potential developers to -read the code and contribute new features. -If you wish to contribute to this repository and have any questions about the code, -please do not hesitate to contact us. +.. We acknowledge that due to limited time and resources, +.. the quality of the documentation and code in this repository is not very high. +.. As a result, it can be quite challenging for potential developers to +.. read the code and contribute new features. + +If you wish to contribute to this repository or have any questions about the code, +please do not hesitate to raise issues or contact us directly. We will do our best to assist you. +Currently, there is no template for issues or pull requests. We hope the open-source community can help improve this repository -and enable the RLHF technology to truly empower the applications of LLM. +and enable RLHF technology to truly empower the applications of LLM. diff --git a/_sources/customization.rst.txt b/_sources/customization.rst.txt index 669a652b..ceff3661 100644 --- a/_sources/customization.rst.txt +++ b/_sources/customization.rst.txt @@ -2,62 +2,74 @@ Customization ################ - Customizing Datasets ----------------------------------- Overview ~~~~~~~~~~ -We provide three types of datasets implementation in ``realhf/impl/dataset/``, -with corresponding configurations +We provide three types of dataset implementations in ``realhf/impl/dataset/`` with the following configurations: - :class:`realhf.PromptAnswerDatasetConfig` - :class:`realhf.PairedComparisonDatasetConfig` -- :class:`realhf.PromptOnlyDatasetConfig`. +- :class:`realhf.PromptOnlyDatasetConfig` -Please check the corresponding configurations for more details -about how to use or change these implemented datasets. +Please refer to the respective configuration documentation for detailed instructions on how to use or modify these datasets. -Datasets in ReaL are the commonly used +Datasets in ReaL are commonly used `PyTorch map-style datasets `_. -Users are required to implement a ``__getitem__`` method in the dataset class, -which returns an :class:`realhf.NamedArray` object containing the data of a single sample and its sequence length. -The sequence length is required because ReaL uses variable-length inputs without padding to save GPU memory. +Users need to implement a ``__getitem__`` method in the dataset class, +which returns a :class:`realhf.NamedArray` object containing the data of a single sample and its sequence length. +The sequence length is necessary because ReaL uses variable-length inputs without padding to save GPU memory. -How dataset configuration is parsed +How Dataset Configuration is Parsed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We take the SFT experiment as an example. -The :class:`realhf.PromptAnswerDatasetConfig` class will be converted to a dataset config -under the system API, i.e., ``realhf.api.core.system_api.Dataset``. -Please check the ``datasets`` method of :class:`realhf.SFTConfig` for more details. -This object has a dataset name (in this case, "prompt_answer") and corresponding arguments -that are passed to the dataset class's constructor. +We will use the SFT experiment as an example. + +The :class:`realhf.PromptAnswerDatasetConfig` object will be converted to a dataset configuration +under the system API, specifically ``realhf.api.core.system_api.Dataset``. +Refer to the ``datasets`` method of :class:`realhf.SFTConfig` for more details. +This object includes a dataset name (in this case, "prompt_answer") and corresponding arguments +that are passed to the dataset class's constructor: + +.. code-block:: python + + @property + def datasets(self): + return [ + Dataset( + "prompt_answer", + args=dict( + max_length=self.dataset.max_seqlen, + dataset_path=self.dataset.train_path, + ), + ) + ] -At the end of ``realhf.impl.dataset.prompt_answer_dataset``, we can see a line: +At the end of ``realhf.impl.dataset.prompt_answer_dataset``, we find the following line: .. code-block:: python data_api.register_dataset("prompt_answer", PromptAnswerDataset) -This line properly registers the dataset class with the system API, so that when this name -is given to system API, ReaL can find this dataset implementation and construct it. +This line registers the dataset class with the system API. When this name is provided to the system API, +ReaL can locate this dataset implementation and construct it. The ``args`` field in ``realhf.api.core.system_api.Dataset`` will be passed to the ``__init__`` -method of the dataset class, except that ReaL preserves a ``util`` field to store some utility objects. +method of the dataset class, except that ReaL reserves a ``util`` field to store some utility objects. -Steps for implementing a new dataset +Steps for Implementing a New Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Create a new dataset file under ``realhf/impl/dataset/``. +1. Create a new dataset file under ``realhf/impl/dataset/``. -- Implement a map-style PyTorch dataset class with a ``__getitem__`` method. This method returns an :class:`realhf.NamedArray` object containing the sequence length as metadata. +2. Implement a map-style PyTorch dataset class with a ``__getitem__`` method. This method should return a :class:`realhf.NamedArray` object containing the sequence length as metadata. -- Register the class with ``data_api.register_dataset`` at the end of this file, with the name "my-dataset". +3. Register the class with ``data_api.register_dataset`` at the end of the file, using the name "my-dataset". -- Change the name of the used dataset in experiment configurations, e.g., in the ``datasets`` method of ``realhf.SFTConfig``, to "my-dataset". +4. Update the name of the dataset in experiment configurations, for example, in the ``datasets`` method of ``realhf.SFTConfig``, to "my-dataset". -- If you would like to pass in more arguments to construct the dataset class, change the quickstart configuration class (in this case, ``realhf.PromptAnswerDatasetConfig``) as well as the ``args`` field in the system API dataset object. +5. If you need to pass additional arguments to construct the dataset class, modify the quickstart configuration class (in this case, ``realhf.PromptAnswerDatasetConfig``) as well as the ``args`` field in the system API dataset object. Customizing Models @@ -66,36 +78,38 @@ Customizing Models Overview ~~~~~~~~~~ -For efficiency reasons, ReaL does not support every transformer model from the HuggingFace model hub. -In ReaL, we implement a :class:`realhf.impl.model.nn.real_llm_api.ReaLModel` class that wraps the HuggingFace model and provides -additional offload and parameter reallocation APIs. +For efficiency reasons, ReaL does not support every transformer +model from the HuggingFace model hub. +In ReaL, we implement the :class:`realhf.impl.model.nn.real_llm_api.ReaLModel` +class that wraps the HuggingFace model and provides micro-batched pipelining, +offload, and parameter reallocation functionalities. +There are helper functions in the model API used to convert HuggingFace models back and forth, +such as ``from_llama`` and ``to_llama``. +These helper functions are generated automatically by registering conversion functions in the ``api/from_hf/`` folder. -Note that there are some helper functions in the model API that are used to convert HuggingFace models back-and-forth, -e.g., ``from_llama``, ``config_to_codellama``, etc. -These helper functions are generated *automatically* by registering converting functions in the -``api/from_hf/`` folder. +For example, consider ``api/from_hf/llama.py``. +To register a convertible HuggingFace model, the user should implement: -We take ``api/from_hf/llama.py`` as an example. -To register a convertable HuggingFace model, the user should implement\: - -- Two functions that convert model configs between HuggingFace and :class:`realhf.ReaLModelConfig`. -- Two functions that convert model state dicts between HuggingFace and ReaL, basically key remap. +- Two functions to convert model configs between HuggingFace and :class:`realhf.ReaLModelConfig`. +- Two functions to convert model state dicts between HuggingFace and ReaL, primarily involving key remapping. - Three functions specifying the names of parameters in the embedding layer, transformer blocks, and the output layer, respectively. -Steps to support a new HuggingFace model +Steps to Support a New HuggingFace Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Create a new model file under ``api/from_hf/``. - Implement the required helper functions as described above. -- Register the model with ``register_hf_family`` at the end of this file. +- Register the model with register_hf_family at the end of the file. - (Optional) Test the consistency of the implemented model with scripts in ``tests/``. -We acknowledge that the current config and implementation of ``ReaLModel`` does not support -all the features of HuggingFace models, e.g., MoE, shared embeddings, etc. -As a result, supporting a new HF model usually requires to modify files in ``impl/model/nn/``, -which can be a terrible experience to users that are not familar with the code architecture. -If you have any questions or want to request a new model feature, +We acknowledge that the current configuration and implementation of ``ReaLModel`` +do not support all features of HuggingFace models, +such as MoE and shared embeddings. +As a result, supporting a new HuggingFace model +often requires modifications to files in ``impl/model/nn/``, +which can be a challenging experience for users unfamiliar with the code architecture. +If you have any questions or wish to request a new model feature, please feel free to raise an issue on our GitHub repository. @@ -105,22 +119,23 @@ Customizing Algorithms Overview ~~~~~~~~~~~ -Algorithms in ReaL are represented as dataflow graphs. -Each node in the graph is a model function call (MFC), which is one of -the generate, inference, or train requests applied to a specific model (e.g., Actor or Critic). -Edges in the graph denote the data or parameter version dependencies -between nodes. +In ReaL, algorithms are represented as dataflow graphs. +Each node in the graph corresponds to a model function call (MFC), +which can be a generate, inference, or train request applied to a +specific model (e.g., Actor or Critic). +The edges in the graph indicate data or +parameter version dependencies between nodes. -We show the dataflow graph of PPO in the following figure: +The following figure illustrates the dataflow graph of PPO: .. image:: images/rlhf_dfg.svg :alt: Dataflow graph of RLHF. :align: center A node is represented by a :class:`realhf.MFCDef` object. -We can see that the node has a ``model_name`` field and a ``interface_type`` field, -which specifies what this node should conceptually do during exection. -The ``interface_impl`` field specifies an actual implementation of the model interface. +Each node has a ``model_name`` field and an ``interface_type`` +field, which specify what the node should conceptually do during execution. +The ``interface_impl`` field specifies the actual implementation of the model interface. The interface class has the following signature: @@ -128,40 +143,31 @@ The interface class has the following signature: :members: :undoc-members: -During the execution of an MFC node, the model with ``model_name`` will be passed -into this interface object together with the data specified in the MFC node. +During the execution of an MFC node, +the model identified by ``model_name`` is passed into this interface object, +along with the data specified in the MFC node. .. note:: - Similar to datasets, model interfaces are also registered and constructed by the system API. - Please check ``impl/model/interface/sft_interface.py`` for an example. - The ``SFTInterface`` is registered at the end of this file and constructed by :class:`realhf.SFTConfig` - (see the ``rpcs`` method). - -Running algorithms in ReaL is exactly running a large dataflow graph that -concatenates all the training iterations. -The *MasterWorker* monitors the running state of this graph and issues MFC requests -to *ModelWorkers* once the dependencies are satisfied. -For more details about the code architecture, please refer to the :doc:`arch` page. - -.. To implement a new algorithm in ReaL, -.. the user should first figure out whether the new dataflow can be unified into -.. existing dataflow graphs (i.e., SFT/RW, DPO, PPO), as defined in the ``experiments/common/`` folder. -.. If the new algorithm has a completely new dataflow, the user should modify -.. the experiment configuration class. -.. Otherwise, e.g., if the user just want to add an additional loss term to the existing algorithm, -.. the user can just modify the interface implementation in the ``impl/model/interface`` folder. - -Example 1: Replace the interface + + Similar to datasets, model interfaces are registered and constructed by the system API. Please check ``impl/model/interface/sft_interface.py`` for an example. The ``SFTInterface`` is registered at the end of this file and constructed by :class:`realhf.SFTConfig` (see the ``rpcs`` method). + +Running algorithms in ReaL involves executing a large dataflow +graph that concatenates all the training iterations. +The *MasterWorker* monitors the state of this graph and +issues MFC requests to *ModelWorkers* once the dependencies are satisfied. + + +.. For more details about the code architecture, please refer to the :doc:`arch` page. + +Example: A Customized Reward Function for PPO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Say, in the PPO experiment, we want to use a customized reward model -from HuggingFace. How should we do if this model is not supported by ``ReaLModel``? -We provide the example code in ``examples/ppo_sentiment.py``, where we replace the -trained reward model for sentiment generation with a BERT-like sentiment analysis model -from HuggingFace. +In this example, we demonstrate how to use a customized reward model from HuggingFace in a PPO experiment when the model is not supported by ``ReaLModel``. -First, we should implement a new model interface class for our customized usage: +The example code can be found in ``examples/ppo_sentiment.py``, where we replace the trained reward model for sentiment generation with a BERT-like sentiment analysis model from HuggingFace. + +First, we need to implement a new model interface class for our customized use: .. code-block:: python @@ -182,8 +188,7 @@ First, we should implement a new model interface class for our customized usage: @torch.no_grad() def inference(self, model: model_api.Model, data: NamedArray) -> NamedArray: - ... - # Re-tokenize. + # Re-tokenize the texts. texts = model.tokenizer.batch_decode( input_ids, skip_special_tokens=True ) @@ -191,31 +196,29 @@ First, we should implement a new model interface class for our customized usage: texts, return_tensors="pt", padding=True, truncation=True ) - # Inference to get the score. - # For IMDB, 0 is negative and 1 is positive. We record the logits of positive. + # Perform inference to get the score. + # For IMDB, 0 is negative and 1 is positive. We record the logits of the positive class. scores = self.score_model( input_ids=encoding["input_ids"].cuda(), attention_mask=encoding["attention_mask"].cuda(), ).logits[..., -1].contiguous().float() - scores = logits[..., -1].contiguous().float() res = NamedArray(scores=scores) res.register_metadata(**data.metadata) return res -Here are two key points in this code: +Key points in this code: - During interface initialization, we load a HuggingFace model and its tokenizer. - During inference, we re-tokenize the generated output from the Actor, compute the score, and return it. -That's easy, right? Now we should register this interface in the system API: +Now, we need to register this interface in the system API: .. code-block:: python model_api.register_interface("sentiment_scoring", SentimentScoringInterface) -Then, to use our customized interface implementation in PPO, we should change -the ``interface_impl`` field of the reward model in the MFC nodes of PPO: +To use our customized interface implementation in PPO, we need to change the ``interface_impl`` field of the reward model in the MFC nodes of PPO: .. code-block:: python @@ -226,7 +229,7 @@ the ``interface_impl`` field of the reward model in the MFC nodes of PPO: for mw in cfg.model_worker: for s in mw.shards: if s.id.model_name.role == "reward": - # Remove the original reward model because we use the customized one. + # Remove the original reward model because we are using a customized one. s.model = config_api.Model( "tokenizer", args=dict( @@ -245,14 +248,13 @@ the ``interface_impl`` field of the reward model in the MFC nodes of PPO: inf_reward_rpc.post_hooks = [] return cfg -Don't forget the register your customized experiment configuration -such that ReaL can launch it with the quickstart command line options: +Don't forget to register your customized experiment configuration so that ReaL can launch it with the quickstart command line options: .. code-block:: python register_quickstart_exp("my-ppo", MyPPOConfig) -Done! Let's run the customized experiment with the quickstart command: +Finally, let's run the customized experiment with the quickstart command: .. code-block:: console @@ -264,10 +266,10 @@ Done! Let's run the customized experiment with the quickstart command: ppo.top_p=0.9 ppo.top_k=1000 \ ... -This example also applies for scenarios when you want to use an external reward, -like the signal from compiler or other online automatic evaluations. +This example is also applicable for scenarios where you want to use an external reward, such as a signal from a compiler or other online automatic evaluations. -Example 2: Develop a new dataflow -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -TODO \ No newline at end of file +.. Example 2: Develop a new dataflow +.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. TODO \ No newline at end of file diff --git a/_sources/distributed.rst.txt b/_sources/distributed.rst.txt index 99ab61e9..27e43a3e 100644 --- a/_sources/distributed.rst.txt +++ b/_sources/distributed.rst.txt @@ -1,35 +1,35 @@ Set Up Distributed Experiments ================================== -Currently, ReaL supports launching distrbited experiments using +Currently, ReaL supports launching distributed experiments using `SLURM `_ with the `Pyxis `_ plugin. This plugin allows for launching enroot containers with the ``srun`` command. -To set up distributed experiments, you should write a JSON -cluster configuration as the example in ``examples/cluster_config.json``. +To set up distributed experiments, you need to create a JSON +cluster configuration file, as shown in the example in ``examples/cluster_config.json``. -- ``cluster_type``: The type of cluster. Currently, only "slurm" is supported. +- ``cluster_type``: The type of the cluster. Currently, only "slurm" is supported. - ``cluster_name``: The name of the cluster. Arbitrary. -- ``fileroot``: An NFS path that all nodes can access. This is where the log and checkpoints will be stored. -- ``default_mount``: Comma separated list of paths to mount on all nodes. This should include the above ``fileroot``. -- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Any host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers low-end GPUs in the cluster. -- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. GPU type is used by SLURM. -- ``cpu_image``: The docker image of the controller and the master worker. -- ``gpu_image``: The docker image of the model worker. -- ``node_name_prefix``: The prefix of the host names. We assume host names in the cluster is prefixed by a string followed by some integer, e.g., "com-01", where "com-" is the prefix. +- ``fileroot``: An NFS path accessible by all nodes. This is where logs and checkpoints will be stored. +- ``default_mount``: A comma-separated list of paths to mount on all nodes. This should include the ``fileroot`` mentioned above.. +- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Every host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers to low-end GPUs in the cluster. +- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. The GPU type is used by SLURM. +- ``cpu_image``: The Docker image for the controller and the master worker. +- ``gpu_image``: The Docker image for the model worker. +- ``node_name_prefix``: The prefix of the host names. We assume that host names in the cluster are prefixed by a string followed by an integer, e.g., "com-01", where "com-" is the prefix. The path of this file should be specified in the ``CLUSTER_SPEC_PATH`` environment variable -inside the used docker images and when launching the experiment. For example, +inside the Docker images used and when launching the experiment. For example: .. code-block:: console CLUSTER_SPEC_PATH=/tmp/my-cluster.json python3 -m realhf.apps.quickstart ppo ... -You also need to add an additional layer in the docker images like the following: +You also need to add an additional layer in the Docker images as shown below: .. code-block:: dockerfile - FROM docker.io/garrett4wade/real-cpu + FROM garrett4wade/real-cpu:22.04-0.1.0 ENV CLUSTER_SPEC_PATH=/tmp/my-cluster.json \ No newline at end of file diff --git a/_sources/expconfig.rst.txt b/_sources/expconfig.rst.txt index 8aa7f16a..e0b6084d 100644 --- a/_sources/expconfig.rst.txt +++ b/_sources/expconfig.rst.txt @@ -3,7 +3,7 @@ Configurations We illustrate configurations for quickstart experiments in this page. Each type of experiment (e.g., SFT, PPO) corresponds to a specific -configuration class (e.g., :class:`realhf.SFTConfig` for SFT). +configuration object (e.g., :class:`realhf.SFTConfig` for SFT). Since ReaL uses `Hydra `_ for configuration management, users can override these options provided by the class recursively @@ -57,7 +57,7 @@ Dataset Configurations ``NamedArray`` ----------------------- -``NamedArray``` is an object we use in model function calls. +``NamedArray`` is an object we use in model function calls. It is inherited from the previous SRL project. Named array extends plain arrays/tensors in the following ways. @@ -65,7 +65,7 @@ Named array extends plain arrays/tensors in the following ways. 1. NamedArray aggregates multiple arrays, possibly of different shapes. 2. Each array is given a name, providing a user-friendly way of indexing to the corresponding data. 3. NamedArrays can be nested. (Although it should *not* be nested in this system.) -4. NamedArray can store metadata such as sequence length, which is useful for padding and masking without causing CUDA synchronization. +4. NamedArray can store metadata such as sequence lengths, which is useful for padding and masking without causing CUDA synchronization. Users can regard it as a nested dictionary of arrays, except that indexing a ``NamedArray`` results in *slicing every hosted arrays* (again, we don't use this feature in this project). diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt index 9f604707..b60a7817 100644 --- a/_sources/index.rst.txt +++ b/_sources/index.rst.txt @@ -6,39 +6,6 @@ Welcome to ReaL's documentation! ==================================== -Highlights of ReaL ------------ - -**Super-Efficient** -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -ReaL introduces a novel *parameter reallocation* technique. It dynamically shifts parameters and -adjusts parallel strategies of LLMs during training. This technique significantly reduces communication -overhead and improves GPU utilization for RLHF. - -Combined with advanced techniques for LLM training, such as 3D parallelism, ZeRO optimization, and offloading, -ReaL can scale RLHF training to hundreds or thousands of GPUs, maintaining high throughput and efficiency. - -Beyond large-scale training, ReaL is also memory-efficient with limited resources. For example, ReaL can -train 70B LLMs with offloading on a single node. - -For more details, check our `introduction page `_. - -**Easy to use** -~~~~~~~~~~~~~~~~~~~~~~~ - -Install with PyPI or use our Docker image, then run your experiment with a single command! - -Check our `quickstart guide `_ for more details. - -**Flexible** -~~~~~~~~~~~~~~~~~~~~~~~ - -ReaL's system implementations are fully decoupled from algorithm interfaces. Achieve optimal performance -for your customized application within 100 lines of code! - -Please refer to our `customization guide `_ for more details. - Contents ---------------- @@ -47,10 +14,11 @@ Contents intro install - quickstart expconfig + quickstart customization - arch + .. arch + distributed contributing diff --git a/_sources/install.rst.txt b/_sources/install.rst.txt index 4680d8be..0b7867d5 100644 --- a/_sources/install.rst.txt +++ b/_sources/install.rst.txt @@ -4,62 +4,56 @@ Installation Docker Images -------------- -The easiest way to run ReaL is to use the provided Docker images. -We provide a CPU-only image to launch experiments and a runtime GPU -image to be deployed in the cluster. -The Dockerfile has been provided in the repository as well. +The easiest way to run ReaL is by using the provided Docker images. +We offer a CPU-only image for launching experiments and a runtime GPU +image for deployment in a cluster. The Dockerfile is also available in the repository. To pull the images, run: .. code-block:: console - $ docker pull docker.io/garrett4wade/real-cpu - $ docker pull docker.io/garrett4wade/real-gpu + $ docker pull docker.io/garrett4wade/real-cpu:22.04-0.1.0 + $ docker pull docker.io/garrett4wade/real-gpu:23.10-py3-0.1.0 -.. warning:: +The CPU image is built from "ubuntu:22.04" and the GPU image is built from "nvcr.io/nvidia/pytorch:23.10-py3". The current package version is "0.1.0". - when using these docker images locally, the user should mount the user code directory - to path ``/realhf`` in the container. This is because the image shifts an editable - installation at ``/realhf``. When the user code overwrites this path, the change of user - code will take effect without re-installing this ``realhf`` PyPI package. +After pulling the Docker images, run your Docker container locally on a GPU node with the following command: - It's also okay to mount to another location and re-install the package in the container. +.. code-block:: console + + $ docker run -it --gpus all garrett4wade/real-gpu:23.10-py3-0.1.0 bash + +The source code is available at /realhf inside the container. This is an editable installation, so you can modify the code or run experiments directly. -To build the images from scratch, run: +If you want to develop the code outside a Docker container, +remember to rerun the editable installation command after mounting: .. code-block:: console - $ docker build --target=cpu -t real-cpu . - $ docker build --target=gpu -t real-gpu . + $ pip install -e /your/mounted/code/path --no-build-isolation + Install From PyPI or Source ---------------------------- -If you don't want to use docker, you can also install ReaL from PyPI -or from source. +If you prefer not to use Docker, you can also install ReaL from PyPI or from the source. -Install from PyPI: +.. note:: -.. code-block:: console + We don't upload a pre-built wheel to PyPI, so the installation will require compiling the C++ and CUDA extensions. If CUDA is not available on your machine, only the C++ extension will be installed. - $ pip install realhf --no-build-isolation +Install from PyPI: -.. note:: +.. code-block:: console - Installing from the PyPI wheel still requires the user to clone the - source code to launch experiments. + $ python3 -m pip install realhf --no-build-isolation -Install from source: +The PyPI package allows you to launch existing experiments with the quickstart command. If you want to modify the code, you should clone the source code and install it from the source: .. code-block:: console - $ $ git clone https://github.com/openpsi-project/ReaLHF + $ git clone https://github.com/openpsi-project/ReaLHF $ cd ReaLHF - $ pip install -e . --no-build-isolation - -.. note:: + $ python3 -m pip install -e . --no-build-isolation - In an environment without CUDA, ReaL will only - install necessary Python modules for launching distributed experiments. - That's why we have two different docker images for - launching and deploying ReaL. +Next, check :doc:`quickstart`` for instructions on running experiments. diff --git a/_sources/intro.rst.txt b/_sources/intro.rst.txt index 8a94e511..2988f5c2 100644 --- a/_sources/intro.rst.txt +++ b/_sources/intro.rst.txt @@ -1,13 +1,6 @@ Introduction ---------------- -ReaL introduces a novel technique called *Parameter Reallocation* -(the name *ReaL* is the abbreviation for *ReaLlocation*), which dynamically -shifts model parameters and changes the parallelization strategy during training. -This technique can significantly reduce the communication overhead and improve -GPU utilization in RLHF training, leading to a substantial speedup over the state-of-the-art -open-source systems. - We observe two major limitations based on our profiling of the previous RLHF systems, as shown in the :ref:`timeline`. @@ -39,8 +32,8 @@ The key idea of ReaL is to enable dynamic **reallocation of model parameters** between GPUs to improve the efficiency of the entire RLHF training process. By first choosing a parallelization strategy tailored for -each model function call -(e.g., use pipelining for Generation, while tensor parallelism for Training) +each computation workload +(e.g., pipelining for Generation and tensor parallelism for Training) and then executing these calls concurrently with a smaller parallelization degree (e.g., Actor and Critic in Training), we can eliminate redundant communication while maximizing GPU utilization, @@ -51,6 +44,8 @@ prior solutions. We show throughput comparison with the state-of-the-art open-source systems in the following figure. +(In the following figure, as the number of GPUs increases, the model size scales up from LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.) + .. image:: images/vws.svg .. "Scale Actor" maintains the sizes diff --git a/_sources/quickstart.rst.txt b/_sources/quickstart.rst.txt index 5489d81e..833e12bb 100644 --- a/_sources/quickstart.rst.txt +++ b/_sources/quickstart.rst.txt @@ -10,6 +10,7 @@ First, clone the ReaL repository from GitHub: $ git clone https://github.com/openpsi-project/ReaLHF $ cd ReaLHF + $ pip3 install -e . --no-build-isolation RLHF with 4x LLaMA-7B in 30min ------------------------------------------------ @@ -170,7 +171,7 @@ Run the following command to train the reward model: dataset.train_bs_n_seqs=512 \ dataset.valid_bs_n_seqs=512 -It's common practice to use the SFT model to initialize the reward model. +It's a common practice to use the SFT model to initialize the reward model. Therefore, we can pass the path of the saved SFT model as the ``model.path`` option. Using the pre-trained LLaMA checkpoint is also feasible, but it may not perform as well. @@ -325,7 +326,9 @@ Each GPU can accommodate parameter shards of multiple models (e.g., both the Act Between two function calls upon the same model, ReaL will automatically re-allocate model parameters between source and destination locations and properly remap parallel strategies. + .. The reallocation also includes GPU-to-CPU reallocation, referred to as *offloading*. + This technique can substantially reduce communication overhead caused by parallelization and improve GPU utilization. Please check :doc:`intro` for more details. diff --git a/arch.html b/arch.html index 16ad9481..ac699a77 100644 --- a/arch.html +++ b/arch.html @@ -30,8 +30,6 @@ - -