diff --git a/_sources/contributing.rst.txt b/_sources/contributing.rst.txt
index 92323658..9547bba2 100644
--- a/_sources/contributing.rst.txt
+++ b/_sources/contributing.rst.txt
@@ -1,18 +1,20 @@
 Contributing
 ###############
 
-This repository is developed and maintained by `Wei Fu <garrett4wade.github.io>`_
-and `Zhiyu Mei <https://openreview.net/profile?id=~Zhiyu_Mei1>`_, both of whom are
-PhD students at `IIIS, Tsinghua University <https://iiis.tsinghua.edu.cn/en/>`_
-advised by Professor `Yi Wu <https://jxwuyi.weebly.com/>`_.
+.. This repository is developed and maintained by `Wei Fu <garrett4wade.github.io>`_
+.. and `Zhiyu Mei <https://openreview.net/profile?id=~Zhiyu_Mei1>`_, both of whom are
+.. PhD students at `IIIS, Tsinghua University <https://iiis.tsinghua.edu.cn/en/>`_
+.. advised by Professor `Yi Wu <https://jxwuyi.weebly.com/>`_.
 
-We acknowledge that due to limited time and resources, 
-the quality of the documentation and code in this repository is not very high. 
-As a result, it can be quite challenging for potential developers to 
-read the code and contribute new features. 
-If you wish to contribute to this repository and have any questions about the code, 
-please do not hesitate to contact us. 
+.. We acknowledge that due to limited time and resources, 
+.. the quality of the documentation and code in this repository is not very high. 
+.. As a result, it can be quite challenging for potential developers to 
+.. read the code and contribute new features. 
+
+If you wish to contribute to this repository or have any questions about the code, 
+please do not hesitate to raise issues or contact us directly. 
 We will do our best to assist you. 
+Currently, there is no template for issues or pull requests.
 
 We hope the open-source community can help improve this repository 
-and enable the RLHF technology to truly empower the applications of LLM.
+and enable RLHF technology to truly empower the applications of LLM.
diff --git a/_sources/customization.rst.txt b/_sources/customization.rst.txt
index 669a652b..ceff3661 100644
--- a/_sources/customization.rst.txt
+++ b/_sources/customization.rst.txt
@@ -2,62 +2,74 @@
 Customization
 ################
 
-
 Customizing Datasets
 -----------------------------------
 
 Overview
 ~~~~~~~~~~
 
-We provide three types of datasets implementation in ``realhf/impl/dataset/``,
-with corresponding configurations
+We provide three types of dataset implementations in ``realhf/impl/dataset/`` with the following configurations:
 
 - :class:`realhf.PromptAnswerDatasetConfig`
 - :class:`realhf.PairedComparisonDatasetConfig`
-- :class:`realhf.PromptOnlyDatasetConfig`.
+- :class:`realhf.PromptOnlyDatasetConfig`
 
-Please check the corresponding configurations for more details
-about how to use or change these implemented datasets.
+Please refer to the respective configuration documentation for detailed instructions on how to use or modify these datasets.
 
-Datasets in ReaL are the commonly used
+Datasets in ReaL are commonly used
 `PyTorch map-style datasets <https://pytorch.org/docs/stable/data.html#map-style-datasets>`_.
-Users are required to implement a ``__getitem__`` method in the dataset class,
-which returns an :class:`realhf.NamedArray` object containing the data of a single sample and its sequence length.
-The sequence length is required because ReaL uses variable-length inputs without padding to save GPU memory.
+Users need to implement a ``__getitem__`` method in the dataset class,
+which returns a :class:`realhf.NamedArray` object containing the data of a single sample and its sequence length.
+The sequence length is necessary because ReaL uses variable-length inputs without padding to save GPU memory.
 
-How dataset configuration is parsed
+How Dataset Configuration is Parsed
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-We take the SFT experiment as an example.
-The :class:`realhf.PromptAnswerDatasetConfig` class will be converted to a dataset config
-under the system API, i.e., ``realhf.api.core.system_api.Dataset``.
-Please check the ``datasets`` method of :class:`realhf.SFTConfig` for more details.
-This object has a dataset name (in this case, "prompt_answer") and corresponding arguments
-that are passed to the dataset class's constructor.
+We will use the SFT experiment as an example.
+
+The :class:`realhf.PromptAnswerDatasetConfig` object will be converted to a dataset configuration
+under the system API, specifically ``realhf.api.core.system_api.Dataset``.
+Refer to the ``datasets`` method of :class:`realhf.SFTConfig` for more details.
+This object includes a dataset name (in this case, "prompt_answer") and corresponding arguments
+that are passed to the dataset class's constructor:
+
+.. code-block:: python
+
+    @property
+    def datasets(self):
+        return [
+            Dataset(
+                "prompt_answer",
+                args=dict(
+                    max_length=self.dataset.max_seqlen,
+                    dataset_path=self.dataset.train_path,
+                ),
+            )
+        ]
 
-At the end of ``realhf.impl.dataset.prompt_answer_dataset``, we can see a line:
+At the end of ``realhf.impl.dataset.prompt_answer_dataset``, we find the following line:
 
 .. code-block:: python
 
     data_api.register_dataset("prompt_answer", PromptAnswerDataset)
 
-This line properly registers the dataset class with the system API, so that when this name
-is given to system API, ReaL can find this dataset implementation and construct it.
+This line registers the dataset class with the system API. When this name is provided to the system API,
+ReaL can locate this dataset implementation and construct it.
 The ``args`` field in ``realhf.api.core.system_api.Dataset`` will be passed to the ``__init__``
-method of the dataset class, except that ReaL preserves a ``util`` field to store some utility objects.
+method of the dataset class, except that ReaL reserves a ``util`` field to store some utility objects.
 
-Steps for implementing a new dataset
+Steps for Implementing a New Dataset
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-- Create a new dataset file under ``realhf/impl/dataset/``.
+1. Create a new dataset file under ``realhf/impl/dataset/``.
 
-- Implement a map-style PyTorch dataset class with a ``__getitem__`` method. This method returns an :class:`realhf.NamedArray` object containing the sequence length as metadata.
+2. Implement a map-style PyTorch dataset class with a ``__getitem__`` method. This method should return a :class:`realhf.NamedArray` object containing the sequence length as metadata.
 
-- Register the class with ``data_api.register_dataset`` at the end of this file, with the name "my-dataset".
+3. Register the class with ``data_api.register_dataset`` at the end of the file, using the name "my-dataset".
 
-- Change the name of the used dataset in experiment configurations, e.g., in the ``datasets`` method of ``realhf.SFTConfig``, to "my-dataset".
+4. Update the name of the dataset in experiment configurations, for example, in the ``datasets`` method of ``realhf.SFTConfig``, to "my-dataset".
 
-- If you would like to pass in more arguments to construct the dataset class, change the quickstart configuration class (in this case, ``realhf.PromptAnswerDatasetConfig``) as well as the ``args`` field in the system API dataset object.
+5. If you need to pass additional arguments to construct the dataset class, modify the quickstart configuration class (in this case, ``realhf.PromptAnswerDatasetConfig``) as well as the ``args`` field in the system API dataset object.
 
 
 Customizing Models
@@ -66,36 +78,38 @@ Customizing Models
 Overview
 ~~~~~~~~~~
 
-For efficiency reasons, ReaL does not support every transformer model from the HuggingFace model hub.
-In ReaL, we implement a :class:`realhf.impl.model.nn.real_llm_api.ReaLModel` class that wraps the HuggingFace model and provides
-additional offload and parameter reallocation APIs.
+For efficiency reasons, ReaL does not support every transformer
+model from the HuggingFace model hub.
+In ReaL, we implement the :class:`realhf.impl.model.nn.real_llm_api.ReaLModel`
+class that wraps the HuggingFace model and provides micro-batched pipelining,
+offload, and parameter reallocation functionalities.
 
+There are helper functions in the model API used to convert HuggingFace models back and forth,
+such as ``from_llama`` and ``to_llama``. 
+These helper functions are generated automatically by registering conversion functions in the ``api/from_hf/`` folder.
 
-Note that there are some helper functions in the model API that are used to convert HuggingFace models back-and-forth,
-e.g., ``from_llama``, ``config_to_codellama``, etc.
-These helper functions are generated *automatically* by registering converting functions in the
-``api/from_hf/`` folder.
+For example, consider ``api/from_hf/llama.py``.
+To register a convertible HuggingFace model, the user should implement:
 
-We take ``api/from_hf/llama.py`` as an example.
-To register a convertable HuggingFace model, the user should implement\:
-
-- Two functions that convert model configs between HuggingFace and :class:`realhf.ReaLModelConfig`.
-- Two functions that convert model state dicts between HuggingFace and ReaL, basically key remap.
+- Two functions to convert model configs between HuggingFace and :class:`realhf.ReaLModelConfig`.
+- Two functions to convert model state dicts between HuggingFace and ReaL, primarily involving key remapping.
 - Three functions specifying the names of parameters in the embedding layer, transformer blocks, and the output layer, respectively.
 
-Steps to support a new HuggingFace model
+Steps to Support a New HuggingFace Model
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 - Create a new model file under ``api/from_hf/``.
 - Implement the required helper functions as described above.
-- Register the model with ``register_hf_family`` at the end of this file.
+- Register the model with register_hf_family at the end of the file.
 - (Optional) Test the consistency of the implemented model with scripts in ``tests/``.
 
-We acknowledge that the current config and implementation of ``ReaLModel`` does not support
-all the features of HuggingFace models, e.g., MoE, shared embeddings, etc.
-As a result, supporting a new HF model usually requires to modify files in ``impl/model/nn/``,
-which can be a terrible experience to users that are not familar with the code architecture.
-If you have any questions or want to request a new model feature,
+We acknowledge that the current configuration and implementation of ``ReaLModel``
+do not support all features of HuggingFace models,
+such as MoE and shared embeddings.
+As a result, supporting a new HuggingFace model
+often requires modifications to files in ``impl/model/nn/``,
+which can be a challenging experience for users unfamiliar with the code architecture.
+If you have any questions or wish to request a new model feature,
 please feel free to raise an issue on our GitHub repository.
 
 
@@ -105,22 +119,23 @@ Customizing Algorithms
 Overview
 ~~~~~~~~~~~
 
-Algorithms in ReaL are represented as dataflow graphs.
-Each node in the graph is a model function call (MFC), which is one of
-the generate, inference, or train requests applied to a specific model (e.g., Actor or Critic).
-Edges in the graph denote the data or parameter version dependencies
-between nodes.
+In ReaL, algorithms are represented as dataflow graphs.
+Each node in the graph corresponds to a model function call (MFC),
+which can be a generate, inference, or train request applied to a
+specific model (e.g., Actor or Critic).
+The edges in the graph indicate data or
+parameter version dependencies between nodes.
 
-We show the dataflow graph of PPO in the following figure:
+The following figure illustrates the dataflow graph of PPO:
 
 .. image:: images/rlhf_dfg.svg
     :alt: Dataflow graph of RLHF.
     :align: center
 
 A node is represented by a :class:`realhf.MFCDef` object.
-We can see that the node has a ``model_name`` field and a ``interface_type`` field,
-which specifies what this node should conceptually do during exection.
-The ``interface_impl`` field specifies an actual implementation of the model interface.
+Each node has a ``model_name`` field and an ``interface_type``
+field, which specify what the node should conceptually do during execution.
+The ``interface_impl`` field specifies the actual implementation of the model interface.
 
 The interface class has the following signature:
 
@@ -128,40 +143,31 @@ The interface class has the following signature:
     :members:
     :undoc-members:
 
-During the execution of an MFC node, the model with ``model_name`` will be passed
-into this interface object together with the data specified in the MFC node.
+During the execution of an MFC node,
+the model identified by ``model_name`` is passed into this interface object,
+along with the data specified in the MFC node.
 
 .. note::
-    Similar to datasets, model interfaces are also registered and constructed by the system API.
-    Please check ``impl/model/interface/sft_interface.py`` for an example.
-    The ``SFTInterface`` is registered at the end of this file and constructed by :class:`realhf.SFTConfig`
-    (see the ``rpcs`` method).
-
-Running algorithms in ReaL is exactly running a large dataflow graph that
-concatenates all the training iterations.
-The *MasterWorker* monitors the running state of this graph and issues MFC requests
-to *ModelWorkers* once the dependencies are satisfied.
-For more details about the code architecture, please refer to the :doc:`arch` page.
-
-.. To implement a new algorithm in ReaL,
-.. the user should first figure out whether the new dataflow can be unified into
-.. existing dataflow graphs (i.e., SFT/RW, DPO, PPO), as defined in the ``experiments/common/`` folder.
-.. If the new algorithm has a completely new dataflow, the user should modify
-.. the experiment configuration class.
-.. Otherwise, e.g., if the user just want to add an additional loss term to the existing algorithm,
-.. the user can just modify the interface implementation in the ``impl/model/interface`` folder.
-
-Example 1: Replace the interface
+
+    Similar to datasets, model interfaces are registered and constructed by the system API. Please check ``impl/model/interface/sft_interface.py`` for an example. The ``SFTInterface`` is registered at the end of this file and constructed by :class:`realhf.SFTConfig` (see the ``rpcs`` method).
+
+Running algorithms in ReaL involves executing a large dataflow
+graph that concatenates all the training iterations.
+The *MasterWorker* monitors the state of this graph and
+issues MFC requests to *ModelWorkers* once the dependencies are satisfied.
+
+
+.. For more details about the code architecture, please refer to the :doc:`arch` page.
+
+Example: A Customized Reward Function for PPO
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Say, in the PPO experiment, we want to use a customized reward model
-from HuggingFace. How should we do if this model is not supported by ``ReaLModel``?
 
-We provide the example code in ``examples/ppo_sentiment.py``, where we replace the
-trained reward model for sentiment generation with a BERT-like sentiment analysis model
-from HuggingFace.
+In this example, we demonstrate how to use a customized reward model from HuggingFace in a PPO experiment when the model is not supported by ``ReaLModel``.
 
-First, we should implement a new model interface class for our customized usage:
+The example code can be found in ``examples/ppo_sentiment.py``, where we replace the trained reward model for sentiment generation with a BERT-like sentiment analysis model from HuggingFace.
+
+First, we need to implement a new model interface class for our customized use:
 
 .. code-block:: python
 
@@ -182,8 +188,7 @@ First, we should implement a new model interface class for our customized usage:
 
         @torch.no_grad()
         def inference(self, model: model_api.Model, data: NamedArray) -> NamedArray:
-            ...
-            # Re-tokenize.
+            # Re-tokenize the texts.
             texts = model.tokenizer.batch_decode(
                 input_ids, skip_special_tokens=True
             )
@@ -191,31 +196,29 @@ First, we should implement a new model interface class for our customized usage:
                 texts, return_tensors="pt", padding=True, truncation=True
             )
 
-            # Inference to get the score.
-            # For IMDB, 0 is negative and 1 is positive. We record the logits of positive.
+            # Perform inference to get the score.
+            # For IMDB, 0 is negative and 1 is positive. We record the logits of the positive class.
             scores = self.score_model(
                 input_ids=encoding["input_ids"].cuda(),
                 attention_mask=encoding["attention_mask"].cuda(),
             ).logits[..., -1].contiguous().float()
-            scores = logits[..., -1].contiguous().float()
 
             res = NamedArray(scores=scores)
             res.register_metadata(**data.metadata)
             return res
 
-Here are two key points in this code:
+Key points in this code:
 
 - During interface initialization, we load a HuggingFace model and its tokenizer.
 - During inference, we re-tokenize the generated output from the Actor, compute the score, and return it.
 
-That's easy, right? Now we should register this interface in the system API:
+Now, we need to register this interface in the system API:
 
 .. code-block:: python
 
     model_api.register_interface("sentiment_scoring", SentimentScoringInterface)
 
-Then, to use our customized interface implementation in PPO, we should change
-the ``interface_impl`` field of the reward model in the MFC nodes of PPO:
+To use our customized interface implementation in PPO, we need to change the ``interface_impl`` field of the reward model in the MFC nodes of PPO:
 
 .. code-block:: python
 
@@ -226,7 +229,7 @@ the ``interface_impl`` field of the reward model in the MFC nodes of PPO:
             for mw in cfg.model_worker:
                 for s in mw.shards:
                     if s.id.model_name.role == "reward":
-                        # Remove the original reward model because we use the customized one.
+                        # Remove the original reward model because we are using a customized one.
                         s.model = config_api.Model(
                             "tokenizer",
                             args=dict(
@@ -245,14 +248,13 @@ the ``interface_impl`` field of the reward model in the MFC nodes of PPO:
             inf_reward_rpc.post_hooks = []
             return cfg
 
-Don't forget the register your customized experiment configuration
-such that ReaL can launch it with the quickstart command line options:
+Don't forget to register your customized experiment configuration so that ReaL can launch it with the quickstart command line options:
 
 .. code-block:: python
 
     register_quickstart_exp("my-ppo", MyPPOConfig)
 
-Done! Let's run the customized experiment with the quickstart command:
+Finally, let's run the customized experiment with the quickstart command:
 
 .. code-block:: console
 
@@ -264,10 +266,10 @@ Done! Let's run the customized experiment with the quickstart command:
         ppo.top_p=0.9 ppo.top_k=1000 \
         ...
 
-This example also applies for scenarios when you want to use an external reward,
-like the signal from compiler or other online automatic evaluations.
+This example is also applicable for scenarios where you want to use an external reward, such as a signal from a compiler or other online automatic evaluations.
 
-Example 2: Develop a new dataflow
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-TODO
\ No newline at end of file
+.. Example 2: Develop a new dataflow
+.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. TODO
\ No newline at end of file
diff --git a/_sources/distributed.rst.txt b/_sources/distributed.rst.txt
index 99ab61e9..27e43a3e 100644
--- a/_sources/distributed.rst.txt
+++ b/_sources/distributed.rst.txt
@@ -1,35 +1,35 @@
 Set Up Distributed Experiments
 ==================================
 
-Currently, ReaL supports launching distrbited experiments using 
+Currently, ReaL supports launching distributed experiments using
 `SLURM <https://slurm.schedmd.com/documentation.html>`_
 with the `Pyxis <https://github.com/NVIDIA/pyxis>`_ plugin.
 This plugin allows for launching enroot containers with the
 ``srun`` command.
 
-To set up distributed experiments, you should write a JSON
-cluster configuration as the example in ``examples/cluster_config.json``.
+To set up distributed experiments, you need to create a JSON
+cluster configuration file, as shown in the example in  ``examples/cluster_config.json``.
 
-- ``cluster_type``: The type of cluster. Currently, only "slurm" is supported.
+- ``cluster_type``: The type of the cluster. Currently, only "slurm" is supported.
 - ``cluster_name``: The name of the cluster. Arbitrary.
-- ``fileroot``: An NFS path that all nodes can access. This is where the log and checkpoints will be stored.
-- ``default_mount``: Comma separated list of paths to mount on all nodes. This should include the above ``fileroot``.
-- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Any host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers low-end GPUs in the cluster.
-- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. GPU type is used by SLURM.
-- ``cpu_image``: The docker image of the controller and the master worker.
-- ``gpu_image``: The docker image of the model worker.
-- ``node_name_prefix``: The prefix of the host names. We assume host names in the cluster is prefixed by a string followed by some integer, e.g., "com-01", where "com-" is the prefix.
+- ``fileroot``: An NFS path accessible by all nodes. This is where logs and checkpoints will be stored.
+- ``default_mount``: A comma-separated list of paths to mount on all nodes. This should include the ``fileroot`` mentioned above..
+- ``node_type_from_node_name``: A dictionary mapping a regular expression to a node type. Every host in this cluster should match one of these regular expressions. Node types include ["g1", "g2", "g8", "a100"]. "g" refers to low-end GPUs in the cluster.
+- ``gpu_type_from_node_name``: A dictionary mapping a regular expression to a GPU type. The GPU type is used by SLURM.
+- ``cpu_image``: The Docker image for the controller and the master worker.
+- ``gpu_image``: The Docker image for the model worker.
+- ``node_name_prefix``: The prefix of the host names. We assume that host names in the cluster are prefixed by a string followed by an integer, e.g., "com-01", where "com-" is the prefix.
 
 The path of this file should be specified in the ``CLUSTER_SPEC_PATH`` environment variable
-inside the used docker images and when launching the experiment. For example,
+inside the Docker images used and when launching the experiment. For example:
 
 .. code-block:: console
 
     CLUSTER_SPEC_PATH=/tmp/my-cluster.json python3 -m realhf.apps.quickstart ppo ...
 
-You also need to add an additional layer in the docker images like the following:
+You also need to add an additional layer in the Docker images as shown below:
 
 .. code-block:: dockerfile
 
-    FROM docker.io/garrett4wade/real-cpu
+    FROM garrett4wade/real-cpu:22.04-0.1.0
     ENV CLUSTER_SPEC_PATH=/tmp/my-cluster.json
\ No newline at end of file
diff --git a/_sources/expconfig.rst.txt b/_sources/expconfig.rst.txt
index 8aa7f16a..e0b6084d 100644
--- a/_sources/expconfig.rst.txt
+++ b/_sources/expconfig.rst.txt
@@ -3,7 +3,7 @@ Configurations
 
 We illustrate configurations for quickstart experiments in this page.
 Each type of experiment (e.g., SFT, PPO) corresponds to a specific 
-configuration class (e.g., :class:`realhf.SFTConfig` for SFT).
+configuration object (e.g., :class:`realhf.SFTConfig` for SFT).
 
 Since ReaL uses `Hydra <https://hydra.cc/>`_ for configuration management,
 users can override these options provided by the class recursively
@@ -57,7 +57,7 @@ Dataset Configurations
 ``NamedArray``
 -----------------------
 
-``NamedArray``` is an object we use in model function calls.
+``NamedArray`` is an object we use in model function calls.
 It is inherited from the previous SRL project.
 
 Named array extends plain arrays/tensors in the following ways.
@@ -65,7 +65,7 @@ Named array extends plain arrays/tensors in the following ways.
 1. NamedArray aggregates multiple arrays, possibly of different shapes.
 2. Each array is given a name, providing a user-friendly way of indexing to the corresponding data.
 3. NamedArrays can be nested. (Although it should *not* be nested in this system.)
-4. NamedArray can store metadata such as sequence length, which is useful for padding and masking without causing CUDA synchronization.
+4. NamedArray can store metadata such as sequence lengths, which is useful for padding and masking without causing CUDA synchronization.
 
 Users can regard it as a nested dictionary of arrays, except that indexing a ``NamedArray`` results in *slicing every hosted arrays* (again, we don't use this feature in this project).
 
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
index 9f604707..b60a7817 100644
--- a/_sources/index.rst.txt
+++ b/_sources/index.rst.txt
@@ -6,39 +6,6 @@
 Welcome to ReaL's documentation!
 ====================================
 
-Highlights of ReaL
------------
-
-**Super-Efficient**
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-ReaL introduces a novel *parameter reallocation* technique. It dynamically shifts parameters and 
-adjusts parallel strategies of LLMs during training. This technique significantly reduces communication 
-overhead and improves GPU utilization for RLHF.
-
-Combined with advanced techniques for LLM training, such as 3D parallelism, ZeRO optimization, and offloading, 
-ReaL can scale RLHF training to hundreds or thousands of GPUs, maintaining high throughput and efficiency.
-
-Beyond large-scale training, ReaL is also memory-efficient with limited resources. For example, ReaL can 
-train 70B LLMs with offloading on a single node.
-
-For more details, check our `introduction page <intro>`_.
-
-**Easy to use**
-~~~~~~~~~~~~~~~~~~~~~~~
-
-Install with PyPI or use our Docker image, then run your experiment with a single command!
-
-Check our `quickstart guide <quickstart>`_ for more details.
-
-**Flexible**
-~~~~~~~~~~~~~~~~~~~~~~~
-
-ReaL's system implementations are fully decoupled from algorithm interfaces. Achieve optimal performance 
-for your customized application within 100 lines of code!
-
-Please refer to our `customization guide <customization>`_ for more details.
-
 Contents
 ----------------
 
@@ -47,10 +14,11 @@ Contents
 
    intro
    install
-   quickstart
    expconfig
+   quickstart
    customization
-   arch
+   .. arch
+
    distributed
    contributing
 
diff --git a/_sources/install.rst.txt b/_sources/install.rst.txt
index 4680d8be..0b7867d5 100644
--- a/_sources/install.rst.txt
+++ b/_sources/install.rst.txt
@@ -4,62 +4,56 @@ Installation
 Docker Images
 --------------
 
-The easiest way to run ReaL is to use the provided Docker images.
-We provide a CPU-only image to launch experiments and a runtime GPU
-image to be deployed in the cluster.
-The Dockerfile has been provided in the repository as well.
+The easiest way to run ReaL is by using the provided Docker images.
+We offer a CPU-only image for launching experiments and a runtime GPU
+image for deployment in a cluster. The Dockerfile is also available in the repository.
 
 To pull the images, run:
 
 .. code-block:: console
 
-   $ docker pull docker.io/garrett4wade/real-cpu
-   $ docker pull docker.io/garrett4wade/real-gpu
+   $ docker pull docker.io/garrett4wade/real-cpu:22.04-0.1.0
+   $ docker pull docker.io/garrett4wade/real-gpu:23.10-py3-0.1.0
 
-.. warning::
+The CPU image is built from "ubuntu:22.04" and the GPU image is built from "nvcr.io/nvidia/pytorch:23.10-py3". The current package version is "0.1.0".
 
-   when using these docker images locally, the user should mount the user code directory
-   to path ``/realhf`` in the container. This is because the image shifts an editable
-   installation at ``/realhf``. When the user code overwrites this path, the change of user
-   code will take effect without re-installing this ``realhf`` PyPI package.
+After pulling the Docker images, run your Docker container locally on a GPU node with the following command:
 
-   It's also okay to mount to another location and re-install the package in the container.
+.. code-block:: console
+
+   $ docker run -it --gpus all garrett4wade/real-gpu:23.10-py3-0.1.0 bash
+
+The source code is available at /realhf inside the container. This is an editable installation, so you can modify the code or run experiments directly.
 
-To build the images from scratch, run:
+If you want to develop the code outside a Docker container,
+remember to rerun the editable installation command after mounting:
 
 .. code-block:: console
 
-   $ docker build --target=cpu -t real-cpu .
-   $ docker build --target=gpu -t real-gpu .
+   $ pip install -e /your/mounted/code/path --no-build-isolation
+
 
 Install From PyPI or Source
 ----------------------------
 
-If you don't want to use docker, you can also install ReaL from PyPI
-or from source.
+If you prefer not to use Docker, you can also install ReaL from PyPI or from the source.
 
-Install from PyPI:
+.. note::
 
-.. code-block:: console
+   We don't upload a pre-built wheel to PyPI, so the installation will require compiling the C++ and CUDA extensions. If CUDA is not available on your machine, only the C++ extension will be installed.
 
-   $ pip install realhf --no-build-isolation
+Install from PyPI:
 
-.. note::
+.. code-block:: console
 
-   Installing from the PyPI wheel still requires the user to clone the
-   source code to launch experiments.
+   $ python3 -m pip install realhf --no-build-isolation
 
-Install from source:
+The PyPI package allows you to launch existing experiments with the quickstart command. If you want to modify the code, you should clone the source code and install it from the source:
 
 .. code-block:: console
 
-   $ $ git clone https://github.com/openpsi-project/ReaLHF
+   $ git clone https://github.com/openpsi-project/ReaLHF
    $ cd ReaLHF
-   $ pip install -e . --no-build-isolation
-
-.. note::
+   $ python3 -m pip install -e . --no-build-isolation
 
-   In an environment without CUDA, ReaL will only
-   install necessary Python modules for launching distributed experiments.
-   That's why we have two different docker images for
-   launching and deploying ReaL.
+Next, check :doc:`quickstart`` for instructions on running experiments.
diff --git a/_sources/intro.rst.txt b/_sources/intro.rst.txt
index 8a94e511..2988f5c2 100644
--- a/_sources/intro.rst.txt
+++ b/_sources/intro.rst.txt
@@ -1,13 +1,6 @@
 Introduction
 ----------------
 
-ReaL introduces a novel technique called *Parameter Reallocation*
-(the name *ReaL* is the abbreviation for *ReaLlocation*), which dynamically
-shifts model parameters and changes the parallelization strategy during training.
-This technique can significantly reduce the communication overhead and improve
-GPU utilization in RLHF training, leading to a substantial speedup over the state-of-the-art
-open-source systems.
-
 We observe two major limitations based on our profiling
 of the previous RLHF systems, as shown in the :ref:`timeline`.
 
@@ -39,8 +32,8 @@ The key idea of ReaL is to enable dynamic **reallocation of
 model parameters** between GPUs to improve the efficiency of
 the entire RLHF training process.
 By first choosing a parallelization strategy tailored for
-each model function call
-(e.g., use pipelining for Generation, while tensor parallelism for Training)
+each computation workload
+(e.g., pipelining for Generation and tensor parallelism for Training)
 and then executing these calls concurrently with a smaller
 parallelization degree (e.g., Actor and Critic in Training),
 we can eliminate redundant communication while maximizing GPU utilization,
@@ -51,6 +44,8 @@ prior solutions.
 We show throughput comparison with the state-of-the-art open-source systems
 in the following figure.
 
+(In the following figure, as the number of GPUs increases, the model size scales up from LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.)
+
 .. image:: images/vws.svg
 
 .. "Scale Actor" maintains the sizes
diff --git a/_sources/quickstart.rst.txt b/_sources/quickstart.rst.txt
index 5489d81e..833e12bb 100644
--- a/_sources/quickstart.rst.txt
+++ b/_sources/quickstart.rst.txt
@@ -10,6 +10,7 @@ First, clone the ReaL repository from GitHub:
 
     $ git clone https://github.com/openpsi-project/ReaLHF
     $ cd ReaLHF
+    $ pip3 install -e . --no-build-isolation
 
 RLHF with 4x LLaMA-7B in 30min
 ------------------------------------------------
@@ -170,7 +171,7 @@ Run the following command to train the reward model:
         dataset.train_bs_n_seqs=512 \
         dataset.valid_bs_n_seqs=512
 
-It's common practice to use the SFT model to initialize the reward model.
+It's a common practice to use the SFT model to initialize the reward model.
 Therefore, we can pass the path of the saved SFT model as the ``model.path`` option.
 Using the pre-trained LLaMA checkpoint is also feasible, but it may not perform as well.
 
@@ -325,7 +326,9 @@ Each GPU can accommodate parameter shards of multiple models (e.g., both the Act
 Between two function calls upon the same model, ReaL will automatically re-allocate
 model parameters between source and destination locations and properly remap
 parallel strategies.
+
 .. The reallocation also includes GPU-to-CPU reallocation, referred to as *offloading*.
+
 This technique can substantially reduce communication overhead caused by parallelization
 and improve GPU utilization.
 Please check :doc:`intro` for more details.
diff --git a/arch.html b/arch.html
index 16ad9481..ac699a77 100644
--- a/arch.html
+++ b/arch.html
@@ -30,8 +30,6 @@
         <link rel="index" title="Index" href="genindex.html" />
         <link rel="search" title="Search" href="search.html" />
         <link rel="top" title="ReaL 0.1.0 documentation" href="#" />
-        <link rel="next" title="Set Up Distributed Experiments" href="distributed.html" />
-        <link rel="prev" title="Customization" href="customization.html" />
     <style>
       :root {
         --nftt-body-font-family: "Nunito", var(--nftt-font-sans-serif) !important;
@@ -170,13 +168,12 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <div class="mb-3 p-1 pt-3 pb-4 border-bottom">
       <input id="sidebar-filter" type="text" name="filter" class="form-control form-control-sm" placeholder="filter" aria-label="filter">
     </div>
-    <ul class="current">
+    <ul>
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1 current"><a class="current reference internal" href="#">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -206,30 +203,8 @@ <h1>Code Architecture<a class="headerlink" href="#code-architecture" title="Link
     </div>
 
     <footer class="nftt-footer">
-      <nav id="paginator" class="py-4" aria-label="Documentation navigation">
-    <div class="container">
-      <ul class="pagination justify-content-between mb-0"><li class="page-item">
-            <a href="customization.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Customization">
-              <span class="prev-page"><i class="bi bi-caret-left"></i></span>
-              <div class="d-flex flex-column">
-                <span class="text-small text-start text-muted">Previous</span>
-                <span class="underline">Customization</span>
-              </div>
-            </a>
-          </li>
-        <li class="page-item ms-auto">
-            <a href="distributed.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Set Up Distributed Experiments">
-              <div class="d-flex flex-column">
-                <span class="text-small text-end text-start text-muted">Next</span>
-                <span class="underline">Set Up Distributed Experiments</span>
-              </div>
-              <span class="next-page"><i class="bi bi-caret-right"></i></span>
-            </a>
-          </li>
-        
-      </ul>
-    </div>
-  </nav>
+      
+<nav id="paginator"></nav>
 
       <div class="py-5 px-4 px-md-3">
   <div class="container">
diff --git a/contributing.html b/contributing.html
index 97cd54e7..e661b2f3 100644
--- a/contributing.html
+++ b/contributing.html
@@ -172,10 +172,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1 current"><a class="current reference internal" href="#">Contributing</a></li>
 </ul>
@@ -195,19 +194,12 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
         <article id="content" class="nftt-content" role="main">
     <section id="contributing">
 <h1>Contributing<a class="headerlink" href="#contributing" title="Link to this heading">¶</a></h1>
-<p>This repository is developed and maintained by <a class="reference external" href="garrett4wade.github.io">Wei Fu</a>
-and <a class="reference external" href="https://openreview.net/profile?id=~Zhiyu_Mei1">Zhiyu Mei</a>, both of whom are
-PhD students at <a class="reference external" href="https://iiis.tsinghua.edu.cn/en/">IIIS, Tsinghua University</a>
-advised by Professor <a class="reference external" href="https://jxwuyi.weebly.com/">Yi Wu</a>.</p>
-<p>We acknowledge that due to limited time and resources,
-the quality of the documentation and code in this repository is not very high.
-As a result, it can be quite challenging for potential developers to
-read the code and contribute new features.
-If you wish to contribute to this repository and have any questions about the code,
-please do not hesitate to contact us.
-We will do our best to assist you.</p>
+<p>If you wish to contribute to this repository or have any questions about the code,
+please do not hesitate to raise issues or contact us directly.
+We will do our best to assist you.
+Currently, there is no template for issues or pull requests.</p>
 <p>We hope the open-source community can help improve this repository
-and enable the RLHF technology to truly empower the applications of LLM.</p>
+and enable RLHF technology to truly empower the applications of LLM.</p>
 </section>
 
 </article>
diff --git a/customization.html b/customization.html
index 70b2ab9c..bb0ec923 100644
--- a/customization.html
+++ b/customization.html
@@ -30,8 +30,8 @@
         <link rel="index" title="Index" href="genindex.html" />
         <link rel="search" title="Search" href="search.html" />
         <link rel="top" title="ReaL 0.1.0 documentation" href="#" />
-        <link rel="next" title="Code Architecture" href="arch.html" />
-        <link rel="prev" title="Configurations" href="expconfig.html" />
+        <link rel="next" title="Set Up Distributed Experiments" href="distributed.html" />
+        <link rel="prev" title="Quickstart" href="quickstart.html" />
     <style>
       :root {
         --nftt-body-font-family: "Nunito", var(--nftt-font-sans-serif) !important;
@@ -173,10 +173,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1 current"><a class="current reference internal" href="#">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -200,81 +199,95 @@ <h1>Customization<a class="headerlink" href="#customization" title="Link to this
 <h2>Customizing Datasets<a class="headerlink" href="#customizing-datasets" title="Link to this heading">¶</a></h2>
 <section id="overview">
 <h3>Overview<a class="headerlink" href="#overview" title="Link to this heading">¶</a></h3>
-<p>We provide three types of datasets implementation in <code class="docutils literal notranslate"><span class="pre">realhf/impl/dataset/</span></code>,
-with corresponding configurations</p>
+<p>We provide three types of dataset implementations in <code class="docutils literal notranslate"><span class="pre">realhf/impl/dataset/</span></code> with the following configurations:</p>
 <ul class="simple">
 <li><p><code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.PromptAnswerDatasetConfig</span></code></p></li>
 <li><p><code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.PairedComparisonDatasetConfig</span></code></p></li>
-<li><p><code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.PromptOnlyDatasetConfig</span></code>.</p></li>
+<li><p><code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.PromptOnlyDatasetConfig</span></code></p></li>
 </ul>
-<p>Please check the corresponding configurations for more details
-about how to use or change these implemented datasets.</p>
-<p>Datasets in ReaL are the commonly used
+<p>Please refer to the respective configuration documentation for detailed instructions on how to use or modify these datasets.</p>
+<p>Datasets in ReaL are commonly used
 <a class="reference external" href="https://pytorch.org/docs/stable/data.html#map-style-datasets">PyTorch map-style datasets</a>.
-Users are required to implement a <code class="docutils literal notranslate"><span class="pre">__getitem__</span></code> method in the dataset class,
-which returns an <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.NamedArray</span></code> object containing the data of a single sample and its sequence length.
-The sequence length is required because ReaL uses variable-length inputs without padding to save GPU memory.</p>
+Users need to implement a <code class="docutils literal notranslate"><span class="pre">__getitem__</span></code> method in the dataset class,
+which returns a <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.NamedArray</span></code> object containing the data of a single sample and its sequence length.
+The sequence length is necessary because ReaL uses variable-length inputs without padding to save GPU memory.</p>
 </section>
 <section id="how-dataset-configuration-is-parsed">
-<h3>How dataset configuration is parsed<a class="headerlink" href="#how-dataset-configuration-is-parsed" title="Link to this heading">¶</a></h3>
-<p>We take the SFT experiment as an example.
-The <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.PromptAnswerDatasetConfig</span></code> class will be converted to a dataset config
-under the system API, i.e., <code class="docutils literal notranslate"><span class="pre">realhf.api.core.system_api.Dataset</span></code>.
-Please check the <code class="docutils literal notranslate"><span class="pre">datasets</span></code> method of <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code> for more details.
-This object has a dataset name (in this case, “prompt_answer”) and corresponding arguments
-that are passed to the dataset class’s constructor.</p>
-<p>At the end of <code class="docutils literal notranslate"><span class="pre">realhf.impl.dataset.prompt_answer_dataset</span></code>, we can see a line:</p>
+<h3>How Dataset Configuration is Parsed<a class="headerlink" href="#how-dataset-configuration-is-parsed" title="Link to this heading">¶</a></h3>
+<p>We will use the SFT experiment as an example.</p>
+<p>The <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.PromptAnswerDatasetConfig</span></code> object will be converted to a dataset configuration
+under the system API, specifically <code class="docutils literal notranslate"><span class="pre">realhf.api.core.system_api.Dataset</span></code>.
+Refer to the <code class="docutils literal notranslate"><span class="pre">datasets</span></code> method of <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code> for more details.
+This object includes a dataset name (in this case, “prompt_answer”) and corresponding arguments
+that are passed to the dataset class’s constructor:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nd">@property</span>
+<span class="k">def</span> <span class="nf">datasets</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
+    <span class="k">return</span> <span class="p">[</span>
+        <span class="n">Dataset</span><span class="p">(</span>
+            <span class="s2">&quot;prompt_answer&quot;</span><span class="p">,</span>
+            <span class="n">args</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
+                <span class="n">max_length</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">dataset</span><span class="o">.</span><span class="n">max_seqlen</span><span class="p">,</span>
+                <span class="n">dataset_path</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">dataset</span><span class="o">.</span><span class="n">train_path</span><span class="p">,</span>
+            <span class="p">),</span>
+        <span class="p">)</span>
+    <span class="p">]</span>
+</pre></div>
+</div>
+<p>At the end of <code class="docutils literal notranslate"><span class="pre">realhf.impl.dataset.prompt_answer_dataset</span></code>, we find the following line:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">data_api</span><span class="o">.</span><span class="n">register_dataset</span><span class="p">(</span><span class="s2">&quot;prompt_answer&quot;</span><span class="p">,</span> <span class="n">PromptAnswerDataset</span><span class="p">)</span>
 </pre></div>
 </div>
-<p>This line properly registers the dataset class with the system API, so that when this name
-is given to system API, ReaL can find this dataset implementation and construct it.
+<p>This line registers the dataset class with the system API. When this name is provided to the system API,
+ReaL can locate this dataset implementation and construct it.
 The <code class="docutils literal notranslate"><span class="pre">args</span></code> field in <code class="docutils literal notranslate"><span class="pre">realhf.api.core.system_api.Dataset</span></code> will be passed to the <code class="docutils literal notranslate"><span class="pre">__init__</span></code>
-method of the dataset class, except that ReaL preserves a <code class="docutils literal notranslate"><span class="pre">util</span></code> field to store some utility objects.</p>
+method of the dataset class, except that ReaL reserves a <code class="docutils literal notranslate"><span class="pre">util</span></code> field to store some utility objects.</p>
 </section>
 <section id="steps-for-implementing-a-new-dataset">
-<h3>Steps for implementing a new dataset<a class="headerlink" href="#steps-for-implementing-a-new-dataset" title="Link to this heading">¶</a></h3>
-<ul class="simple">
+<h3>Steps for Implementing a New Dataset<a class="headerlink" href="#steps-for-implementing-a-new-dataset" title="Link to this heading">¶</a></h3>
+<ol class="arabic simple">
 <li><p>Create a new dataset file under <code class="docutils literal notranslate"><span class="pre">realhf/impl/dataset/</span></code>.</p></li>
-<li><p>Implement a map-style PyTorch dataset class with a <code class="docutils literal notranslate"><span class="pre">__getitem__</span></code> method. This method returns an <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.NamedArray</span></code> object containing the sequence length as metadata.</p></li>
-<li><p>Register the class with <code class="docutils literal notranslate"><span class="pre">data_api.register_dataset</span></code> at the end of this file, with the name “my-dataset”.</p></li>
-<li><p>Change the name of the used dataset in experiment configurations, e.g., in the <code class="docutils literal notranslate"><span class="pre">datasets</span></code> method of <code class="docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code>, to “my-dataset”.</p></li>
-<li><p>If you would like to pass in more arguments to construct the dataset class, change the quickstart configuration class (in this case, <code class="docutils literal notranslate"><span class="pre">realhf.PromptAnswerDatasetConfig</span></code>) as well as the <code class="docutils literal notranslate"><span class="pre">args</span></code> field in the system API dataset object.</p></li>
-</ul>
+<li><p>Implement a map-style PyTorch dataset class with a <code class="docutils literal notranslate"><span class="pre">__getitem__</span></code> method. This method should return a <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.NamedArray</span></code> object containing the sequence length as metadata.</p></li>
+<li><p>Register the class with <code class="docutils literal notranslate"><span class="pre">data_api.register_dataset</span></code> at the end of the file, using the name “my-dataset”.</p></li>
+<li><p>Update the name of the dataset in experiment configurations, for example, in the <code class="docutils literal notranslate"><span class="pre">datasets</span></code> method of <code class="docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code>, to “my-dataset”.</p></li>
+<li><p>If you need to pass additional arguments to construct the dataset class, modify the quickstart configuration class (in this case, <code class="docutils literal notranslate"><span class="pre">realhf.PromptAnswerDatasetConfig</span></code>) as well as the <code class="docutils literal notranslate"><span class="pre">args</span></code> field in the system API dataset object.</p></li>
+</ol>
 </section>
 </section>
 <section id="customizing-models">
 <h2>Customizing Models<a class="headerlink" href="#customizing-models" title="Link to this heading">¶</a></h2>
 <section id="id1">
 <h3>Overview<a class="headerlink" href="#id1" title="Link to this heading">¶</a></h3>
-<p>For efficiency reasons, ReaL does not support every transformer model from the HuggingFace model hub.
-In ReaL, we implement a <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.impl.model.nn.real_llm_api.ReaLModel</span></code> class that wraps the HuggingFace model and provides
-additional offload and parameter reallocation APIs.</p>
-<p>Note that there are some helper functions in the model API that are used to convert HuggingFace models back-and-forth,
-e.g., <code class="docutils literal notranslate"><span class="pre">from_llama</span></code>, <code class="docutils literal notranslate"><span class="pre">config_to_codellama</span></code>, etc.
-These helper functions are generated <em>automatically</em> by registering converting functions in the
-<code class="docutils literal notranslate"><span class="pre">api/from_hf/</span></code> folder.</p>
-<p>We take <code class="docutils literal notranslate"><span class="pre">api/from_hf/llama.py</span></code> as an example.
-To register a convertable HuggingFace model, the user should implement:</p>
+<p>For efficiency reasons, ReaL does not support every transformer
+model from the HuggingFace model hub.
+In ReaL, we implement the <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.impl.model.nn.real_llm_api.ReaLModel</span></code>
+class that wraps the HuggingFace model and provides micro-batched pipelining,
+offload, and parameter reallocation functionalities.</p>
+<p>There are helper functions in the model API used to convert HuggingFace models back and forth,
+such as <code class="docutils literal notranslate"><span class="pre">from_llama</span></code> and <code class="docutils literal notranslate"><span class="pre">to_llama</span></code>.
+These helper functions are generated automatically by registering conversion functions in the <code class="docutils literal notranslate"><span class="pre">api/from_hf/</span></code> folder.</p>
+<p>For example, consider <code class="docutils literal notranslate"><span class="pre">api/from_hf/llama.py</span></code>.
+To register a convertible HuggingFace model, the user should implement:</p>
 <ul class="simple">
-<li><p>Two functions that convert model configs between HuggingFace and <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.ReaLModelConfig</span></code>.</p></li>
-<li><p>Two functions that convert model state dicts between HuggingFace and ReaL, basically key remap.</p></li>
+<li><p>Two functions to convert model configs between HuggingFace and <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.ReaLModelConfig</span></code>.</p></li>
+<li><p>Two functions to convert model state dicts between HuggingFace and ReaL, primarily involving key remapping.</p></li>
 <li><p>Three functions specifying the names of parameters in the embedding layer, transformer blocks, and the output layer, respectively.</p></li>
 </ul>
 </section>
 <section id="steps-to-support-a-new-huggingface-model">
-<h3>Steps to support a new HuggingFace model<a class="headerlink" href="#steps-to-support-a-new-huggingface-model" title="Link to this heading">¶</a></h3>
+<h3>Steps to Support a New HuggingFace Model<a class="headerlink" href="#steps-to-support-a-new-huggingface-model" title="Link to this heading">¶</a></h3>
 <ul class="simple">
 <li><p>Create a new model file under <code class="docutils literal notranslate"><span class="pre">api/from_hf/</span></code>.</p></li>
 <li><p>Implement the required helper functions as described above.</p></li>
-<li><p>Register the model with <code class="docutils literal notranslate"><span class="pre">register_hf_family</span></code> at the end of this file.</p></li>
+<li><p>Register the model with register_hf_family at the end of the file.</p></li>
 <li><p>(Optional) Test the consistency of the implemented model with scripts in <code class="docutils literal notranslate"><span class="pre">tests/</span></code>.</p></li>
 </ul>
-<p>We acknowledge that the current config and implementation of <code class="docutils literal notranslate"><span class="pre">ReaLModel</span></code> does not support
-all the features of HuggingFace models, e.g., MoE, shared embeddings, etc.
-As a result, supporting a new HF model usually requires to modify files in <code class="docutils literal notranslate"><span class="pre">impl/model/nn/</span></code>,
-which can be a terrible experience to users that are not familar with the code architecture.
-If you have any questions or want to request a new model feature,
+<p>We acknowledge that the current configuration and implementation of <code class="docutils literal notranslate"><span class="pre">ReaLModel</span></code>
+do not support all features of HuggingFace models,
+such as MoE and shared embeddings.
+As a result, supporting a new HuggingFace model
+often requires modifications to files in <code class="docutils literal notranslate"><span class="pre">impl/model/nn/</span></code>,
+which can be a challenging experience for users unfamiliar with the code architecture.
+If you have any questions or wish to request a new model feature,
 please feel free to raise an issue on our GitHub repository.</p>
 </section>
 </section>
@@ -282,40 +295,35 @@ <h3>Steps to support a new HuggingFace model<a class="headerlink" href="#steps-t
 <h2>Customizing Algorithms<a class="headerlink" href="#customizing-algorithms" title="Link to this heading">¶</a></h2>
 <section id="id2">
 <h3>Overview<a class="headerlink" href="#id2" title="Link to this heading">¶</a></h3>
-<p>Algorithms in ReaL are represented as dataflow graphs.
-Each node in the graph is a model function call (MFC), which is one of
-the generate, inference, or train requests applied to a specific model (e.g., Actor or Critic).
-Edges in the graph denote the data or parameter version dependencies
-between nodes.</p>
-<p>We show the dataflow graph of PPO in the following figure:</p>
+<p>In ReaL, algorithms are represented as dataflow graphs.
+Each node in the graph corresponds to a model function call (MFC),
+which can be a generate, inference, or train request applied to a
+specific model (e.g., Actor or Critic).
+The edges in the graph indicate data or
+parameter version dependencies between nodes.</p>
+<p>The following figure illustrates the dataflow graph of PPO:</p>
 <img alt="Dataflow graph of RLHF." class="align-center" src="_images/rlhf_dfg.svg" /><p>A node is represented by a <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.MFCDef</span></code> object.
-We can see that the node has a <code class="docutils literal notranslate"><span class="pre">model_name</span></code> field and a <code class="docutils literal notranslate"><span class="pre">interface_type</span></code> field,
-which specifies what this node should conceptually do during exection.
-The <code class="docutils literal notranslate"><span class="pre">interface_impl</span></code> field specifies an actual implementation of the model interface.</p>
+Each node has a <code class="docutils literal notranslate"><span class="pre">model_name</span></code> field and an <code class="docutils literal notranslate"><span class="pre">interface_type</span></code>
+field, which specify what the node should conceptually do during execution.
+The <code class="docutils literal notranslate"><span class="pre">interface_impl</span></code> field specifies the actual implementation of the model interface.</p>
 <p>The interface class has the following signature:</p>
-<p>During the execution of an MFC node, the model with <code class="docutils literal notranslate"><span class="pre">model_name</span></code> will be passed
-into this interface object together with the data specified in the MFC node.</p>
+<p>During the execution of an MFC node,
+the model identified by <code class="docutils literal notranslate"><span class="pre">model_name</span></code> is passed into this interface object,
+along with the data specified in the MFC node.</p>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<p>Similar to datasets, model interfaces are also registered and constructed by the system API.
-Please check <code class="docutils literal notranslate"><span class="pre">impl/model/interface/sft_interface.py</span></code> for an example.
-The <code class="docutils literal notranslate"><span class="pre">SFTInterface</span></code> is registered at the end of this file and constructed by <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code>
-(see the <code class="docutils literal notranslate"><span class="pre">rpcs</span></code> method).</p>
+<p>Similar to datasets, model interfaces are registered and constructed by the system API. Please check <code class="docutils literal notranslate"><span class="pre">impl/model/interface/sft_interface.py</span></code> for an example. The <code class="docutils literal notranslate"><span class="pre">SFTInterface</span></code> is registered at the end of this file and constructed by <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code> (see the <code class="docutils literal notranslate"><span class="pre">rpcs</span></code> method).</p>
 </div>
-<p>Running algorithms in ReaL is exactly running a large dataflow graph that
-concatenates all the training iterations.
-The <em>MasterWorker</em> monitors the running state of this graph and issues MFC requests
-to <em>ModelWorkers</em> once the dependencies are satisfied.
-For more details about the code architecture, please refer to the <a class="reference internal" href="arch.html"><span class="doc">Code Architecture</span></a> page.</p>
+<p>Running algorithms in ReaL involves executing a large dataflow
+graph that concatenates all the training iterations.
+The <em>MasterWorker</em> monitors the state of this graph and
+issues MFC requests to <em>ModelWorkers</em> once the dependencies are satisfied.</p>
 </section>
-<section id="example-1-replace-the-interface">
-<h3>Example 1: Replace the interface<a class="headerlink" href="#example-1-replace-the-interface" title="Link to this heading">¶</a></h3>
-<p>Say, in the PPO experiment, we want to use a customized reward model
-from HuggingFace. How should we do if this model is not supported by <code class="docutils literal notranslate"><span class="pre">ReaLModel</span></code>?</p>
-<p>We provide the example code in <code class="docutils literal notranslate"><span class="pre">examples/ppo_sentiment.py</span></code>, where we replace the
-trained reward model for sentiment generation with a BERT-like sentiment analysis model
-from HuggingFace.</p>
-<p>First, we should implement a new model interface class for our customized usage:</p>
+<section id="example-a-customized-reward-function-for-ppo">
+<h3>Example: A Customized Reward Function for PPO<a class="headerlink" href="#example-a-customized-reward-function-for-ppo" title="Link to this heading">¶</a></h3>
+<p>In this example, we demonstrate how to use a customized reward model from HuggingFace in a PPO experiment when the model is not supported by <code class="docutils literal notranslate"><span class="pre">ReaLModel</span></code>.</p>
+<p>The example code can be found in <code class="docutils literal notranslate"><span class="pre">examples/ppo_sentiment.py</span></code>, where we replace the trained reward model for sentiment generation with a BERT-like sentiment analysis model from HuggingFace.</p>
+<p>First, we need to implement a new model interface class for our customized use:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nd">@dataclasses</span><span class="o">.</span><span class="n">dataclass</span>
 <span class="k">class</span> <span class="nc">SentimentScoringInterface</span><span class="p">(</span><span class="n">model_api</span><span class="o">.</span><span class="n">ModelInterface</span><span class="p">):</span>
 
@@ -333,8 +341,7 @@ <h3>Example 1: Replace the interface<a class="headerlink" href="#example-1-repla
 
     <span class="nd">@torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">()</span>
     <span class="k">def</span> <span class="nf">inference</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">model</span><span class="p">:</span> <span class="n">model_api</span><span class="o">.</span><span class="n">Model</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">NamedArray</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">NamedArray</span><span class="p">:</span>
-        <span class="o">...</span>
-        <span class="c1"># Re-tokenize.</span>
+        <span class="c1"># Re-tokenize the texts.</span>
         <span class="n">texts</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">batch_decode</span><span class="p">(</span>
             <span class="n">input_ids</span><span class="p">,</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">True</span>
         <span class="p">)</span>
@@ -342,30 +349,28 @@ <h3>Example 1: Replace the interface<a class="headerlink" href="#example-1-repla
             <span class="n">texts</span><span class="p">,</span> <span class="n">return_tensors</span><span class="o">=</span><span class="s2">&quot;pt&quot;</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">truncation</span><span class="o">=</span><span class="kc">True</span>
         <span class="p">)</span>
 
-        <span class="c1"># Inference to get the score.</span>
-        <span class="c1"># For IMDB, 0 is negative and 1 is positive. We record the logits of positive.</span>
+        <span class="c1"># Perform inference to get the score.</span>
+        <span class="c1"># For IMDB, 0 is negative and 1 is positive. We record the logits of the positive class.</span>
         <span class="n">scores</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">score_model</span><span class="p">(</span>
             <span class="n">input_ids</span><span class="o">=</span><span class="n">encoding</span><span class="p">[</span><span class="s2">&quot;input_ids&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span>
             <span class="n">attention_mask</span><span class="o">=</span><span class="n">encoding</span><span class="p">[</span><span class="s2">&quot;attention_mask&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">cuda</span><span class="p">(),</span>
         <span class="p">)</span><span class="o">.</span><span class="n">logits</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span><span class="o">.</span><span class="n">float</span><span class="p">()</span>
-        <span class="n">scores</span> <span class="o">=</span> <span class="n">logits</span><span class="p">[</span><span class="o">...</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">contiguous</span><span class="p">()</span><span class="o">.</span><span class="n">float</span><span class="p">()</span>
 
         <span class="n">res</span> <span class="o">=</span> <span class="n">NamedArray</span><span class="p">(</span><span class="n">scores</span><span class="o">=</span><span class="n">scores</span><span class="p">)</span>
         <span class="n">res</span><span class="o">.</span><span class="n">register_metadata</span><span class="p">(</span><span class="o">**</span><span class="n">data</span><span class="o">.</span><span class="n">metadata</span><span class="p">)</span>
         <span class="k">return</span> <span class="n">res</span>
 </pre></div>
 </div>
-<p>Here are two key points in this code:</p>
+<p>Key points in this code:</p>
 <ul class="simple">
 <li><p>During interface initialization, we load a HuggingFace model and its tokenizer.</p></li>
 <li><p>During inference, we re-tokenize the generated output from the Actor, compute the score, and return it.</p></li>
 </ul>
-<p>That’s easy, right? Now we should register this interface in the system API:</p>
+<p>Now, we need to register this interface in the system API:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">model_api</span><span class="o">.</span><span class="n">register_interface</span><span class="p">(</span><span class="s2">&quot;sentiment_scoring&quot;</span><span class="p">,</span> <span class="n">SentimentScoringInterface</span><span class="p">)</span>
 </pre></div>
 </div>
-<p>Then, to use our customized interface implementation in PPO, we should change
-the <code class="docutils literal notranslate"><span class="pre">interface_impl</span></code> field of the reward model in the MFC nodes of PPO:</p>
+<p>To use our customized interface implementation in PPO, we need to change the <code class="docutils literal notranslate"><span class="pre">interface_impl</span></code> field of the reward model in the MFC nodes of PPO:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">MyPPOConfig</span><span class="p">(</span><span class="n">PPOConfig</span><span class="p">):</span>
 
     <span class="k">def</span> <span class="nf">initial_setup</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">ExperimentConfig</span><span class="p">:</span>
@@ -373,7 +378,7 @@ <h3>Example 1: Replace the interface<a class="headerlink" href="#example-1-repla
         <span class="k">for</span> <span class="n">mw</span> <span class="ow">in</span> <span class="n">cfg</span><span class="o">.</span><span class="n">model_worker</span><span class="p">:</span>
             <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">mw</span><span class="o">.</span><span class="n">shards</span><span class="p">:</span>
                 <span class="k">if</span> <span class="n">s</span><span class="o">.</span><span class="n">id</span><span class="o">.</span><span class="n">model_name</span><span class="o">.</span><span class="n">role</span> <span class="o">==</span> <span class="s2">&quot;reward&quot;</span><span class="p">:</span>
-                    <span class="c1"># Remove the original reward model because we use the customized one.</span>
+                    <span class="c1"># Remove the original reward model because we are using a customized one.</span>
                     <span class="n">s</span><span class="o">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">config_api</span><span class="o">.</span><span class="n">Model</span><span class="p">(</span>
                         <span class="s2">&quot;tokenizer&quot;</span><span class="p">,</span>
                         <span class="n">args</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span>
@@ -393,12 +398,11 @@ <h3>Example 1: Replace the interface<a class="headerlink" href="#example-1-repla
         <span class="k">return</span> <span class="n">cfg</span>
 </pre></div>
 </div>
-<p>Don’t forget the register your customized experiment configuration
-such that ReaL can launch it with the quickstart command line options:</p>
+<p>Don’t forget to register your customized experiment configuration so that ReaL can launch it with the quickstart command line options:</p>
 <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">register_quickstart_exp</span><span class="p">(</span><span class="s2">&quot;my-ppo&quot;</span><span class="p">,</span> <span class="n">MyPPOConfig</span><span class="p">)</span>
 </pre></div>
 </div>
-<p>Done! Let’s run the customized experiment with the quickstart command:</p>
+<p>Finally, let’s run the customized experiment with the quickstart command:</p>
 <div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp"># </span>Note<span class="w"> </span>that<span class="w"> </span>we<span class="w"> </span>change<span class="w"> </span>the<span class="w"> </span>name<span class="w"> </span><span class="s2">&quot;ppo&quot;</span><span class="w"> </span>to<span class="w"> </span><span class="s2">&quot;my-ppo&quot;</span>
 <span class="go">python3 -m realhf.apps.quickstart my-ppo \</span>
 <span class="go">    experiment_name=sentiment-ppo \</span>
@@ -408,12 +412,7 @@ <h3>Example 1: Replace the interface<a class="headerlink" href="#example-1-repla
 <span class="go">    ...</span>
 </pre></div>
 </div>
-<p>This example also applies for scenarios when you want to use an external reward,
-like the signal from compiler or other online automatic evaluations.</p>
-</section>
-<section id="example-2-develop-a-new-dataflow">
-<h3>Example 2: Develop a new dataflow<a class="headerlink" href="#example-2-develop-a-new-dataflow" title="Link to this heading">¶</a></h3>
-<p>TODO</p>
+<p>This example is also applicable for scenarios where you want to use an external reward, such as a signal from a compiler or other online automatic evaluations.</p>
 </section>
 </section>
 </section>
@@ -433,19 +432,18 @@ <h3>Example 2: Develop a new dataflow<a class="headerlink" href="#example-2-deve
 <li><a class="reference internal" href="#">Customization</a><ul>
 <li><a class="reference internal" href="#customizing-datasets">Customizing Datasets</a><ul>
 <li><a class="reference internal" href="#overview">Overview</a></li>
-<li><a class="reference internal" href="#how-dataset-configuration-is-parsed">How dataset configuration is parsed</a></li>
-<li><a class="reference internal" href="#steps-for-implementing-a-new-dataset">Steps for implementing a new dataset</a></li>
+<li><a class="reference internal" href="#how-dataset-configuration-is-parsed">How Dataset Configuration is Parsed</a></li>
+<li><a class="reference internal" href="#steps-for-implementing-a-new-dataset">Steps for Implementing a New Dataset</a></li>
 </ul>
 </li>
 <li><a class="reference internal" href="#customizing-models">Customizing Models</a><ul>
 <li><a class="reference internal" href="#id1">Overview</a></li>
-<li><a class="reference internal" href="#steps-to-support-a-new-huggingface-model">Steps to support a new HuggingFace model</a></li>
+<li><a class="reference internal" href="#steps-to-support-a-new-huggingface-model">Steps to Support a New HuggingFace Model</a></li>
 </ul>
 </li>
 <li><a class="reference internal" href="#customizing-algorithms">Customizing Algorithms</a><ul>
 <li><a class="reference internal" href="#id2">Overview</a></li>
-<li><a class="reference internal" href="#example-1-replace-the-interface">Example 1: Replace the interface</a></li>
-<li><a class="reference internal" href="#example-2-develop-a-new-dataflow">Example 2: Develop a new dataflow</a></li>
+<li><a class="reference internal" href="#example-a-customized-reward-function-for-ppo">Example: A Customized Reward Function for PPO</a></li>
 </ul>
 </li>
 </ul>
@@ -464,19 +462,19 @@ <h3>Example 2: Develop a new dataflow<a class="headerlink" href="#example-2-deve
       <nav id="paginator" class="py-4" aria-label="Documentation navigation">
     <div class="container">
       <ul class="pagination justify-content-between mb-0"><li class="page-item">
-            <a href="expconfig.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Configurations">
+            <a href="quickstart.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Quickstart">
               <span class="prev-page"><i class="bi bi-caret-left"></i></span>
               <div class="d-flex flex-column">
                 <span class="text-small text-start text-muted">Previous</span>
-                <span class="underline">Configurations</span>
+                <span class="underline">Quickstart</span>
               </div>
             </a>
           </li>
         <li class="page-item ms-auto">
-            <a href="arch.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Code Architecture">
+            <a href="distributed.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Set Up Distributed Experiments">
               <div class="d-flex flex-column">
                 <span class="text-small text-end text-start text-muted">Next</span>
-                <span class="underline">Code Architecture</span>
+                <span class="underline">Set Up Distributed Experiments</span>
               </div>
               <span class="next-page"><i class="bi bi-caret-right"></i></span>
             </a>
diff --git a/distributed.html b/distributed.html
index 635766b4..a9ca2444 100644
--- a/distributed.html
+++ b/distributed.html
@@ -31,7 +31,7 @@
         <link rel="search" title="Search" href="search.html" />
         <link rel="top" title="ReaL 0.1.0 documentation" href="#" />
         <link rel="next" title="Contributing" href="contributing.html" />
-        <link rel="prev" title="Code Architecture" href="arch.html" />
+        <link rel="prev" title="Customization" href="customization.html" />
     <style>
       :root {
         --nftt-body-font-family: "Nunito", var(--nftt-font-sans-serif) !important;
@@ -173,10 +173,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1 current"><a class="current reference internal" href="#">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -196,31 +195,31 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
         <article id="content" class="nftt-content" role="main">
     <section id="set-up-distributed-experiments">
 <h1>Set Up Distributed Experiments<a class="headerlink" href="#set-up-distributed-experiments" title="Link to this heading">¶</a></h1>
-<p>Currently, ReaL supports launching distrbited experiments using
+<p>Currently, ReaL supports launching distributed experiments using
 <a class="reference external" href="https://slurm.schedmd.com/documentation.html">SLURM</a>
 with the <a class="reference external" href="https://github.com/NVIDIA/pyxis">Pyxis</a> plugin.
 This plugin allows for launching enroot containers with the
 <code class="docutils literal notranslate"><span class="pre">srun</span></code> command.</p>
-<p>To set up distributed experiments, you should write a JSON
-cluster configuration as the example in <code class="docutils literal notranslate"><span class="pre">examples/cluster_config.json</span></code>.</p>
+<p>To set up distributed experiments, you need to create a JSON
+cluster configuration file, as shown in the example in  <code class="docutils literal notranslate"><span class="pre">examples/cluster_config.json</span></code>.</p>
 <ul class="simple">
-<li><p><code class="docutils literal notranslate"><span class="pre">cluster_type</span></code>: The type of cluster. Currently, only “slurm” is supported.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">cluster_type</span></code>: The type of the cluster. Currently, only “slurm” is supported.</p></li>
 <li><p><code class="docutils literal notranslate"><span class="pre">cluster_name</span></code>: The name of the cluster. Arbitrary.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">fileroot</span></code>: An NFS path that all nodes can access. This is where the log and checkpoints will be stored.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">default_mount</span></code>: Comma separated list of paths to mount on all nodes. This should include the above <code class="docutils literal notranslate"><span class="pre">fileroot</span></code>.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">node_type_from_node_name</span></code>: A dictionary mapping a regular expression to a node type. Any host in this cluster should match one of these regular expressions. Node types include [“g1”, “g2”, “g8”, “a100”]. “g” refers low-end GPUs in the cluster.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">gpu_type_from_node_name</span></code>: A dictionary mapping a regular expression to a GPU type. GPU type is used by SLURM.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">cpu_image</span></code>: The docker image of the controller and the master worker.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">gpu_image</span></code>: The docker image of the model worker.</p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">node_name_prefix</span></code>: The prefix of the host names. We assume host names in the cluster is prefixed by a string followed by some integer, e.g., “com-01”, where “com-” is the prefix.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">fileroot</span></code>: An NFS path accessible by all nodes. This is where logs and checkpoints will be stored.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">default_mount</span></code>: A comma-separated list of paths to mount on all nodes. This should include the <code class="docutils literal notranslate"><span class="pre">fileroot</span></code> mentioned above..</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">node_type_from_node_name</span></code>: A dictionary mapping a regular expression to a node type. Every host in this cluster should match one of these regular expressions. Node types include [“g1”, “g2”, “g8”, “a100”]. “g” refers to low-end GPUs in the cluster.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">gpu_type_from_node_name</span></code>: A dictionary mapping a regular expression to a GPU type. The GPU type is used by SLURM.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">cpu_image</span></code>: The Docker image for the controller and the master worker.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">gpu_image</span></code>: The Docker image for the model worker.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">node_name_prefix</span></code>: The prefix of the host names. We assume that host names in the cluster are prefixed by a string followed by an integer, e.g., “com-01”, where “com-” is the prefix.</p></li>
 </ul>
 <p>The path of this file should be specified in the <code class="docutils literal notranslate"><span class="pre">CLUSTER_SPEC_PATH</span></code> environment variable
-inside the used docker images and when launching the experiment. For example,</p>
+inside the Docker images used and when launching the experiment. For example:</p>
 <div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">CLUSTER_SPEC_PATH=/tmp/my-cluster.json python3 -m realhf.apps.quickstart ppo ...</span>
 </pre></div>
 </div>
-<p>You also need to add an additional layer in the docker images like the following:</p>
-<div class="highlight-dockerfile notranslate"><div class="highlight"><pre><span></span><span class="k">FROM</span><span class="w"> </span><span class="s">docker.io/garrett4wade/real-cpu</span>
+<p>You also need to add an additional layer in the Docker images as shown below:</p>
+<div class="highlight-dockerfile notranslate"><div class="highlight"><pre><span></span><span class="k">FROM</span><span class="w"> </span><span class="s">garrett4wade/real-cpu:22.04-0.1.0</span>
 <span class="k">ENV</span><span class="w"> </span><span class="nv">CLUSTER_SPEC_PATH</span><span class="o">=</span>/tmp/my-cluster.json
 </pre></div>
 </div>
@@ -237,11 +236,11 @@ <h1>Set Up Distributed Experiments<a class="headerlink" href="#set-up-distribute
       <nav id="paginator" class="py-4" aria-label="Documentation navigation">
     <div class="container">
       <ul class="pagination justify-content-between mb-0"><li class="page-item">
-            <a href="arch.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Code Architecture">
+            <a href="customization.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Customization">
               <span class="prev-page"><i class="bi bi-caret-left"></i></span>
               <div class="d-flex flex-column">
                 <span class="text-small text-start text-muted">Previous</span>
-                <span class="underline">Code Architecture</span>
+                <span class="underline">Customization</span>
               </div>
             </a>
           </li>
diff --git a/expconfig.html b/expconfig.html
index 017061e5..f322b540 100644
--- a/expconfig.html
+++ b/expconfig.html
@@ -30,8 +30,8 @@
         <link rel="index" title="Index" href="genindex.html" />
         <link rel="search" title="Search" href="search.html" />
         <link rel="top" title="ReaL 0.1.0 documentation" href="#" />
-        <link rel="next" title="Customization" href="customization.html" />
-        <link rel="prev" title="Quickstart" href="quickstart.html" />
+        <link rel="next" title="Quickstart" href="quickstart.html" />
+        <link rel="prev" title="Installation" href="install.html" />
     <style>
       :root {
         --nftt-body-font-family: "Nunito", var(--nftt-font-sans-serif) !important;
@@ -173,10 +173,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1 current"><a class="current reference internal" href="#">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -198,7 +197,7 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
 <h1>Configurations<a class="headerlink" href="#configurations" title="Link to this heading">¶</a></h1>
 <p>We illustrate configurations for quickstart experiments in this page.
 Each type of experiment (e.g., SFT, PPO) corresponds to a specific
-configuration class (e.g., <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code> for SFT).</p>
+configuration object (e.g., <code class="xref py py-class docutils literal notranslate"><span class="pre">realhf.SFTConfig</span></code> for SFT).</p>
 <p>Since ReaL uses <a class="reference external" href="https://hydra.cc/">Hydra</a> for configuration management,
 users can override these options provided by the class recursively
 with command line arguments.
@@ -214,14 +213,14 @@ <h2>Dataset Configurations<a class="headerlink" href="#dataset-configurations" t
 </section>
 <section id="namedarray">
 <h2><code class="docutils literal notranslate"><span class="pre">NamedArray</span></code><a class="headerlink" href="#namedarray" title="Link to this heading">¶</a></h2>
-<p><code class="docutils literal notranslate"><span class="pre">NamedArray`</span></code> is an object we use in model function calls.
+<p><code class="docutils literal notranslate"><span class="pre">NamedArray</span></code> is an object we use in model function calls.
 It is inherited from the previous SRL project.</p>
 <p>Named array extends plain arrays/tensors in the following ways.</p>
 <ol class="arabic simple">
 <li><p>NamedArray aggregates multiple arrays, possibly of different shapes.</p></li>
 <li><p>Each array is given a name, providing a user-friendly way of indexing to the corresponding data.</p></li>
 <li><p>NamedArrays can be nested. (Although it should <em>not</em> be nested in this system.)</p></li>
-<li><p>NamedArray can store metadata such as sequence length, which is useful for padding and masking without causing CUDA synchronization.</p></li>
+<li><p>NamedArray can store metadata such as sequence lengths, which is useful for padding and masking without causing CUDA synchronization.</p></li>
 </ol>
 <p>Users can regard it as a nested dictionary of arrays, except that indexing a <code class="docutils literal notranslate"><span class="pre">NamedArray</span></code> results in <em>slicing every hosted arrays</em> (again, we don’t use this feature in this project).</p>
 </section>
@@ -264,19 +263,19 @@ <h2>Dataflow Graph<a class="headerlink" href="#dataflow-graph" title="Link to th
       <nav id="paginator" class="py-4" aria-label="Documentation navigation">
     <div class="container">
       <ul class="pagination justify-content-between mb-0"><li class="page-item">
-            <a href="quickstart.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Quickstart">
+            <a href="install.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Installation">
               <span class="prev-page"><i class="bi bi-caret-left"></i></span>
               <div class="d-flex flex-column">
                 <span class="text-small text-start text-muted">Previous</span>
-                <span class="underline">Quickstart</span>
+                <span class="underline">Installation</span>
               </div>
             </a>
           </li>
         <li class="page-item ms-auto">
-            <a href="customization.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Customization">
+            <a href="quickstart.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Quickstart">
               <div class="d-flex flex-column">
                 <span class="text-small text-end text-start text-muted">Next</span>
-                <span class="underline">Customization</span>
+                <span class="underline">Quickstart</span>
               </div>
               <span class="next-page"><i class="bi bi-caret-right"></i></span>
             </a>
diff --git a/genindex.html b/genindex.html
index ccb9f228..115e5c3a 100644
--- a/genindex.html
+++ b/genindex.html
@@ -169,10 +169,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul>
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
diff --git a/index.html b/index.html
index 1a30832d..0c738dbd 100644
--- a/index.html
+++ b/index.html
@@ -172,10 +172,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul>
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -195,31 +194,6 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
         <article id="content" class="nftt-content" role="main">
     <section id="welcome-to-real-s-documentation">
 <h1>Welcome to ReaL’s documentation!<a class="headerlink" href="#welcome-to-real-s-documentation" title="Link to this heading">¶</a></h1>
-<section id="highlights-of-real">
-<h2>Highlights of ReaL<a class="headerlink" href="#highlights-of-real" title="Link to this heading">¶</a></h2>
-<section id="super-efficient">
-<h3><strong>Super-Efficient</strong><a class="headerlink" href="#super-efficient" title="Link to this heading">¶</a></h3>
-<p>ReaL introduces a novel <em>parameter reallocation</em> technique. It dynamically shifts parameters and
-adjusts parallel strategies of LLMs during training. This technique significantly reduces communication
-overhead and improves GPU utilization for RLHF.</p>
-<p>Combined with advanced techniques for LLM training, such as 3D parallelism, ZeRO optimization, and offloading,
-ReaL can scale RLHF training to hundreds or thousands of GPUs, maintaining high throughput and efficiency.</p>
-<p>Beyond large-scale training, ReaL is also memory-efficient with limited resources. For example, ReaL can
-train 70B LLMs with offloading on a single node.</p>
-<p>For more details, check our <a class="reference external" href="intro">introduction page</a>.</p>
-</section>
-<section id="easy-to-use">
-<h3><strong>Easy to use</strong><a class="headerlink" href="#easy-to-use" title="Link to this heading">¶</a></h3>
-<p>Install with PyPI or use our Docker image, then run your experiment with a single command!</p>
-<p>Check our <a class="reference external" href="quickstart">quickstart guide</a> for more details.</p>
-</section>
-<section id="flexible">
-<h3><strong>Flexible</strong><a class="headerlink" href="#flexible" title="Link to this heading">¶</a></h3>
-<p>ReaL’s system implementations are fully decoupled from algorithm interfaces. Achieve optimal performance
-for your customized application within 100 lines of code!</p>
-<p>Please refer to our <a class="reference external" href="customization">customization guide</a> for more details.</p>
-</section>
-</section>
 <section id="contents">
 <h2>Contents<a class="headerlink" href="#contents" title="Link to this heading">¶</a></h2>
 <div class="toctree-wrapper compound">
@@ -230,6 +204,14 @@ <h2>Contents<a class="headerlink" href="#contents" title="Link to this heading">
 <li class="toctree-l2"><a class="reference internal" href="install.html#install-from-pypi-or-source">Install From PyPI or Source</a></li>
 </ul>
 </li>
+<li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="expconfig.html#experiment-configurations">Experiment Configurations</a></li>
+<li class="toctree-l2"><a class="reference internal" href="expconfig.html#model-configurations">Model Configurations</a></li>
+<li class="toctree-l2"><a class="reference internal" href="expconfig.html#dataset-configurations">Dataset Configurations</a></li>
+<li class="toctree-l2"><a class="reference internal" href="expconfig.html#namedarray"><code class="docutils literal notranslate"><span class="pre">NamedArray</span></code></a></li>
+<li class="toctree-l2"><a class="reference internal" href="expconfig.html#dataflow-graph">Dataflow Graph</a></li>
+</ul>
+</li>
 <li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="quickstart.html#installation">Installation</a></li>
 <li class="toctree-l2"><a class="reference internal" href="quickstart.html#rlhf-with-4x-llama-7b-in-30min">RLHF with 4x LLaMA-7B in 30min</a><ul>
@@ -241,35 +223,25 @@ <h2>Contents<a class="headerlink" href="#contents" title="Link to this heading">
 </li>
 </ul>
 </li>
-<li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a><ul>
-<li class="toctree-l2"><a class="reference internal" href="expconfig.html#experiment-configurations">Experiment Configurations</a></li>
-<li class="toctree-l2"><a class="reference internal" href="expconfig.html#model-configurations">Model Configurations</a></li>
-<li class="toctree-l2"><a class="reference internal" href="expconfig.html#dataset-configurations">Dataset Configurations</a></li>
-<li class="toctree-l2"><a class="reference internal" href="expconfig.html#namedarray"><code class="docutils literal notranslate"><span class="pre">NamedArray</span></code></a></li>
-<li class="toctree-l2"><a class="reference internal" href="expconfig.html#dataflow-graph">Dataflow Graph</a></li>
-</ul>
-</li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="customization.html#customizing-datasets">Customizing Datasets</a><ul>
 <li class="toctree-l3"><a class="reference internal" href="customization.html#overview">Overview</a></li>
-<li class="toctree-l3"><a class="reference internal" href="customization.html#how-dataset-configuration-is-parsed">How dataset configuration is parsed</a></li>
-<li class="toctree-l3"><a class="reference internal" href="customization.html#steps-for-implementing-a-new-dataset">Steps for implementing a new dataset</a></li>
+<li class="toctree-l3"><a class="reference internal" href="customization.html#how-dataset-configuration-is-parsed">How Dataset Configuration is Parsed</a></li>
+<li class="toctree-l3"><a class="reference internal" href="customization.html#steps-for-implementing-a-new-dataset">Steps for Implementing a New Dataset</a></li>
 </ul>
 </li>
 <li class="toctree-l2"><a class="reference internal" href="customization.html#customizing-models">Customizing Models</a><ul>
 <li class="toctree-l3"><a class="reference internal" href="customization.html#id1">Overview</a></li>
-<li class="toctree-l3"><a class="reference internal" href="customization.html#steps-to-support-a-new-huggingface-model">Steps to support a new HuggingFace model</a></li>
+<li class="toctree-l3"><a class="reference internal" href="customization.html#steps-to-support-a-new-huggingface-model">Steps to Support a New HuggingFace Model</a></li>
 </ul>
 </li>
 <li class="toctree-l2"><a class="reference internal" href="customization.html#customizing-algorithms">Customizing Algorithms</a><ul>
 <li class="toctree-l3"><a class="reference internal" href="customization.html#id2">Overview</a></li>
-<li class="toctree-l3"><a class="reference internal" href="customization.html#example-1-replace-the-interface">Example 1: Replace the interface</a></li>
-<li class="toctree-l3"><a class="reference internal" href="customization.html#example-2-develop-a-new-dataflow">Example 2: Develop a new dataflow</a></li>
+<li class="toctree-l3"><a class="reference internal" href="customization.html#example-a-customized-reward-function-for-ppo">Example: A Customized Reward Function for PPO</a></li>
 </ul>
 </li>
 </ul>
 </li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -290,12 +262,6 @@ <h2>Contents<a class="headerlink" href="#contents" title="Link to this heading">
               <nav id="TableOfContents">
                 <ul>
 <li><a class="reference internal" href="#">Welcome to ReaL’s documentation!</a><ul>
-<li><a class="reference internal" href="#highlights-of-real">Highlights of ReaL</a><ul>
-<li><a class="reference internal" href="#super-efficient"><strong>Super-Efficient</strong></a></li>
-<li><a class="reference internal" href="#easy-to-use"><strong>Easy to use</strong></a></li>
-<li><a class="reference internal" href="#flexible"><strong>Flexible</strong></a></li>
-</ul>
-</li>
 <li><a class="reference internal" href="#contents">Contents</a></li>
 </ul>
 </li>
diff --git a/install.html b/install.html
index cfe418a2..015e12b5 100644
--- a/install.html
+++ b/install.html
@@ -30,7 +30,7 @@
         <link rel="index" title="Index" href="genindex.html" />
         <link rel="search" title="Search" href="search.html" />
         <link rel="top" title="ReaL 0.1.0 documentation" href="#" />
-        <link rel="next" title="Quickstart" href="quickstart.html" />
+        <link rel="next" title="Configurations" href="expconfig.html" />
         <link rel="prev" title="Introduction" href="intro.html" />
     <style>
       :root {
@@ -173,10 +173,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1 current"><a class="current reference internal" href="#">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -198,55 +197,44 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
 <h1>Installation<a class="headerlink" href="#installation" title="Link to this heading">¶</a></h1>
 <section id="docker-images">
 <h2>Docker Images<a class="headerlink" href="#docker-images" title="Link to this heading">¶</a></h2>
-<p>The easiest way to run ReaL is to use the provided Docker images.
-We provide a CPU-only image to launch experiments and a runtime GPU
-image to be deployed in the cluster.
-The Dockerfile has been provided in the repository as well.</p>
+<p>The easiest way to run ReaL is by using the provided Docker images.
+We offer a CPU-only image for launching experiments and a runtime GPU
+image for deployment in a cluster. The Dockerfile is also available in the repository.</p>
 <p>To pull the images, run:</p>
-<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>docker<span class="w"> </span>pull<span class="w"> </span>docker.io/garrett4wade/real-cpu
-<span class="gp">$ </span>docker<span class="w"> </span>pull<span class="w"> </span>docker.io/garrett4wade/real-gpu
+<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>docker<span class="w"> </span>pull<span class="w"> </span>docker.io/garrett4wade/real-cpu:22.04-0.1.0
+<span class="gp">$ </span>docker<span class="w"> </span>pull<span class="w"> </span>docker.io/garrett4wade/real-gpu:23.10-py3-0.1.0
 </pre></div>
 </div>
-<div class="admonition warning">
-<p class="admonition-title">Warning</p>
-<p>when using these docker images locally, the user should mount the user code directory
-to path <code class="docutils literal notranslate"><span class="pre">/realhf</span></code> in the container. This is because the image shifts an editable
-installation at <code class="docutils literal notranslate"><span class="pre">/realhf</span></code>. When the user code overwrites this path, the change of user
-code will take effect without re-installing this <code class="docutils literal notranslate"><span class="pre">realhf</span></code> PyPI package.</p>
-<p>It’s also okay to mount to another location and re-install the package in the container.</p>
+<p>The CPU image is built from “ubuntu:22.04” and the GPU image is built from “nvcr.io/nvidia/pytorch:23.10-py3”. The current package version is “0.1.0”.</p>
+<p>After pulling the Docker images, run your Docker container locally on a GPU node with the following command:</p>
+<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>docker<span class="w"> </span>run<span class="w"> </span>-it<span class="w"> </span>--gpus<span class="w"> </span>all<span class="w"> </span>garrett4wade/real-gpu:23.10-py3-0.1.0<span class="w"> </span>bash
+</pre></div>
 </div>
-<p>To build the images from scratch, run:</p>
-<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>docker<span class="w"> </span>build<span class="w"> </span>--target<span class="o">=</span>cpu<span class="w"> </span>-t<span class="w"> </span>real-cpu<span class="w"> </span>.
-<span class="gp">$ </span>docker<span class="w"> </span>build<span class="w"> </span>--target<span class="o">=</span>gpu<span class="w"> </span>-t<span class="w"> </span>real-gpu<span class="w"> </span>.
+<p>The source code is available at /realhf inside the container. This is an editable installation, so you can modify the code or run experiments directly.</p>
+<p>If you want to develop the code outside a Docker container,
+remember to rerun the editable installation command after mounting:</p>
+<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>/your/mounted/code/path<span class="w"> </span>--no-build-isolation
 </pre></div>
 </div>
 </section>
 <section id="install-from-pypi-or-source">
 <h2>Install From PyPI or Source<a class="headerlink" href="#install-from-pypi-or-source" title="Link to this heading">¶</a></h2>
-<p>If you don’t want to use docker, you can also install ReaL from PyPI
-or from source.</p>
-<p>Install from PyPI:</p>
-<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>realhf<span class="w"> </span>--no-build-isolation
-</pre></div>
-</div>
+<p>If you prefer not to use Docker, you can also install ReaL from PyPI or from the source.</p>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<p>Installing from the PyPI wheel still requires the user to clone the
-source code to launch experiments.</p>
+<p>We don’t upload a pre-built wheel to PyPI, so the installation will require compiling the C++ and CUDA extensions. If CUDA is not available on your machine, only the C++ extension will be installed.</p>
 </div>
-<p>Install from source:</p>
-<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/openpsi-project/ReaLHF
-<span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>ReaLHF
-<span class="gp">$ </span>pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>.<span class="w"> </span>--no-build-isolation
+<p>Install from PyPI:</p>
+<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>python3<span class="w"> </span>-m<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>realhf<span class="w"> </span>--no-build-isolation
 </pre></div>
 </div>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>In an environment without CUDA, ReaL will only
-install necessary Python modules for launching distributed experiments.
-That’s why we have two different docker images for
-launching and deploying ReaL.</p>
+<p>The PyPI package allows you to launch existing experiments with the quickstart command. If you want to modify the code, you should clone the source code and install it from the source:</p>
+<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp">$ </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/openpsi-project/ReaLHF
+<span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>ReaLHF
+<span class="gp">$ </span>python3<span class="w"> </span>-m<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>.<span class="w"> </span>--no-build-isolation
+</pre></div>
 </div>
+<p>Next, check <span class="xref std std-doc">quickstart`</span> for instructions on running experiments.</p>
 </section>
 </section>
 
@@ -290,10 +278,10 @@ <h2>Install From PyPI or Source<a class="headerlink" href="#install-from-pypi-or
             </a>
           </li>
         <li class="page-item ms-auto">
-            <a href="quickstart.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Quickstart">
+            <a href="expconfig.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Configurations">
               <div class="d-flex flex-column">
                 <span class="text-small text-end text-start text-muted">Next</span>
-                <span class="underline">Quickstart</span>
+                <span class="underline">Configurations</span>
               </div>
               <span class="next-page"><i class="bi bi-caret-right"></i></span>
             </a>
diff --git a/intro.html b/intro.html
index 64428236..55e55dfd 100644
--- a/intro.html
+++ b/intro.html
@@ -173,10 +173,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1 current"><a class="current reference internal" href="#">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -196,12 +195,6 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
         <article id="content" class="nftt-content" role="main">
     <section id="introduction">
 <h1>Introduction<a class="headerlink" href="#introduction" title="Link to this heading">¶</a></h1>
-<p>ReaL introduces a novel technique called <em>Parameter Reallocation</em>
-(the name <em>ReaL</em> is the abbreviation for <em>ReaLlocation</em>), which dynamically
-shifts model parameters and changes the parallelization strategy during training.
-This technique can significantly reduce the communication overhead and improve
-GPU utilization in RLHF training, leading to a substantial speedup over the state-of-the-art
-open-source systems.</p>
 <p>We observe two major limitations based on our profiling
 of the previous RLHF systems, as shown in the <a class="reference internal" href="#timeline"><span class="std std-ref">Timeline Figure</span></a>.</p>
 <figure class="align-default" id="id1">
@@ -229,8 +222,8 @@ <h1>Introduction<a class="headerlink" href="#introduction" title="Link to this h
 model parameters</strong> between GPUs to improve the efficiency of
 the entire RLHF training process.
 By first choosing a parallelization strategy tailored for
-each model function call
-(e.g., use pipelining for Generation, while tensor parallelism for Training)
+each computation workload
+(e.g., pipelining for Generation and tensor parallelism for Training)
 and then executing these calls concurrently with a smaller
 parallelization degree (e.g., Actor and Critic in Training),
 we can eliminate redundant communication while maximizing GPU utilization,
@@ -238,6 +231,7 @@ <h1>Introduction<a class="headerlink" href="#introduction" title="Link to this h
 prior solutions.</p>
 <p>We show throughput comparison with the state-of-the-art open-source systems
 in the following figure.</p>
+<p>(In the following figure, as the number of GPUs increases, the model size scales up from LLaMA 7B, LLaMA 13B, and CodeLLaMA 34B, to the largest LLaMA 70B.)</p>
 <img alt="_images/vws.svg" src="_images/vws.svg" /><p>We also show the estimated time for
 completing the entire full-scale 4*70B RLHF training process,
 composed of 4 iterations with 400 steps for each iteration as for LLaMA-2.</p>
diff --git a/quickstart.html b/quickstart.html
index b0016a59..3ffa2eb0 100644
--- a/quickstart.html
+++ b/quickstart.html
@@ -30,8 +30,8 @@
         <link rel="index" title="Index" href="genindex.html" />
         <link rel="search" title="Search" href="search.html" />
         <link rel="top" title="ReaL 0.1.0 documentation" href="#" />
-        <link rel="next" title="Configurations" href="expconfig.html" />
-        <link rel="prev" title="Installation" href="install.html" />
+        <link rel="next" title="Customization" href="customization.html" />
+        <link rel="prev" title="Configurations" href="expconfig.html" />
     <style>
       :root {
         --nftt-body-font-family: "Nunito", var(--nftt-font-sans-serif) !important;
@@ -173,10 +173,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1 current"><a class="current reference internal" href="#">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1 current"><a class="current reference internal" href="#">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
@@ -201,6 +200,7 @@ <h2>Installation<a class="headerlink" href="#installation" title="Link to this h
 <p>First, clone the ReaL repository from GitHub:</p>
 <div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/openpsi-project/ReaLHF
 $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>ReaLHF
+$<span class="w"> </span>pip3<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span>.<span class="w"> </span>--no-build-isolation
 </pre></div>
 </div>
 </section>
@@ -343,7 +343,7 @@ <h3>Stage 2.1: Reward Modeling (RM)<a class="headerlink" href="#stage-2-1-reward
 <span class="w">    </span>dataset.valid_bs_n_seqs<span class="o">=</span><span class="m">512</span>
 </pre></div>
 </div>
-<p>It’s common practice to use the SFT model to initialize the reward model.
+<p>It’s a common practice to use the SFT model to initialize the reward model.
 Therefore, we can pass the path of the saved SFT model as the <code class="docutils literal notranslate"><span class="pre">model.path</span></code> option.
 Using the pre-trained LLaMA checkpoint is also feasible, but it may not perform as well.</p>
 <p>In reward modeling, the batch size is the number of paired comparisons.
@@ -473,9 +473,8 @@ <h3>Stage 3: PPO<a class="headerlink" href="#stage-3-ppo" title="Link to this he
 Each GPU can accommodate parameter shards of multiple models (e.g., both the Actor and the Reward).
 Between two function calls upon the same model, ReaL will automatically re-allocate
 model parameters between source and destination locations and properly remap
-parallel strategies.
-.. The reallocation also includes GPU-to-CPU reallocation, referred to as <em>offloading</em>.
-This technique can substantially reduce communication overhead caused by parallelization
+parallel strategies.</p>
+<p>This technique can substantially reduce communication overhead caused by parallelization
 and improve GPU utilization.
 Please check <a class="reference internal" href="intro.html"><span class="doc">Introduction</span></a> for more details.</p>
 <p>In the above command, fields <code class="docutils literal notranslate"><span class="pre">actor</span></code>, <code class="docutils literal notranslate"><span class="pre">critic</span></code>, <code class="docutils literal notranslate"><span class="pre">ref</span></code>, and <code class="docutils literal notranslate"><span class="pre">rew</span></code>
@@ -536,19 +535,19 @@ <h3>Stage 3: PPO<a class="headerlink" href="#stage-3-ppo" title="Link to this he
       <nav id="paginator" class="py-4" aria-label="Documentation navigation">
     <div class="container">
       <ul class="pagination justify-content-between mb-0"><li class="page-item">
-            <a href="install.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Installation">
+            <a href="expconfig.html" class="d-flex px-5 align-items-end" rel="prev" aria-label="Previous page: Configurations">
               <span class="prev-page"><i class="bi bi-caret-left"></i></span>
               <div class="d-flex flex-column">
                 <span class="text-small text-start text-muted">Previous</span>
-                <span class="underline">Installation</span>
+                <span class="underline">Configurations</span>
               </div>
             </a>
           </li>
         <li class="page-item ms-auto">
-            <a href="expconfig.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Configurations">
+            <a href="customization.html" class="d-flex px-5 align-items-end" rel="next" aria-label="Next page: Customization">
               <div class="d-flex flex-column">
                 <span class="text-small text-end text-start text-muted">Next</span>
-                <span class="underline">Configurations</span>
+                <span class="underline">Customization</span>
               </div>
               <span class="next-page"><i class="bi bi-caret-right"></i></span>
             </a>
diff --git a/search.html b/search.html
index 044bea03..5e12a3cf 100644
--- a/search.html
+++ b/search.html
@@ -151,10 +151,9 @@ <h5 class="offcanvas-title fw-bold" id="nfttSidebarOffcanvasLabel">
     <ul>
 <li class="toctree-l1"><a class="reference internal" href="intro.html">Introduction</a></li>
 <li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
-<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="expconfig.html">Configurations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="quickstart.html">Quickstart</a></li>
 <li class="toctree-l1"><a class="reference internal" href="customization.html">Customization</a></li>
-<li class="toctree-l1"><a class="reference internal" href="arch.html">Code Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="distributed.html">Set Up Distributed Experiments</a></li>
 <li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
 </ul>
diff --git a/searchindex.js b/searchindex.js
index 077fe82d..b38093e0 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"Code Architecture": [[0, "code-architecture"]], "Configurations": [[4, "configurations"]], "Contents": [[5, "contents"]], "Contributing": [[1, "contributing"]], "Customization": [[2, "customization"]], "Customizing Algorithms": [[2, "customizing-algorithms"]], "Customizing Datasets": [[2, "customizing-datasets"]], "Customizing Models": [[2, "customizing-models"]], "Dataflow Graph": [[4, "dataflow-graph"]], "Dataset Configurations": [[4, "dataset-configurations"]], "Docker Images": [[6, "docker-images"]], "Easy to use": [[5, "easy-to-use"]], "Example 1: Replace the interface": [[2, "example-1-replace-the-interface"]], "Example 2: Develop a new dataflow": [[2, "example-2-develop-a-new-dataflow"]], "Experiment Configurations": [[4, "experiment-configurations"]], "Flexible": [[5, "flexible"]], "Highlights of ReaL": [[5, "highlights-of-real"]], "How dataset configuration is parsed": [[2, "how-dataset-configuration-is-parsed"]], "Install From PyPI or Source": [[6, "install-from-pypi-or-source"]], "Installation": [[6, "installation"], [8, "installation"]], "Introduction": [[7, "introduction"]], "Model Configurations": [[4, "model-configurations"]], "NamedArray": [[4, "namedarray"]], "Overview": [[2, "overview"], [2, "id1"], [2, "id2"]], "Quickstart": [[8, "quickstart"]], "RLHF with 4x LLaMA-7B in 30min": [[8, "rlhf-with-4x-llama-7b-in-30min"]], "Set Up Distributed Experiments": [[3, "set-up-distributed-experiments"]], "Stage 1: Supervised Fine-Tuning": [[8, "stage-1-supervised-fine-tuning"]], "Stage 2.1: Reward Modeling (RM)": [[8, "stage-2-1-reward-modeling-rm"]], "Stage 2.2: Direct Preference Optimization (DPO)": [[8, "stage-2-2-direct-preference-optimization-dpo"]], "Stage 3: PPO": [[8, "stage-3-ppo"]], "Steps for implementing a new dataset": [[2, "steps-for-implementing-a-new-dataset"]], "Steps to support a new HuggingFace model": [[2, "steps-to-support-a-new-huggingface-model"]], "Super-Efficient": [[5, "super-efficient"]], "Welcome to ReaL\u2019s documentation!": [[5, "welcome-to-real-s-documentation"]]}, "docnames": ["arch", "contributing", "customization", "distributed", "expconfig", "index", "install", "intro", "quickstart"], "envversion": {"sphinx": 61, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["arch.rst", "contributing.rst", "customization.rst", "distributed.rst", "expconfig.rst", "index.rst", "install.rst", "intro.rst", "quickstart.rst"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [2, 6, 8], "0": [2, 8], "00": 8, "007": 8, "01": 3, "03": 8, "051": 8, "081": 8, "083": 8, "094": 8, "1": 5, "10": 8, "100": 5, "1000": [2, 8], "1024": 8, "128": 8, "13": 8, "14": 8, "15": 8, "19": 8, "2": [5, 7], "20240618": 8, "216": 8, "256": 8, "26": 8, "294": 8, "3": 5, "30min": 5, "312": 8, "32": 8, "34": 8, "38": 8, "387": 8, "39": 8, "393": 8, "3d": [5, 8], "4": [7, 8], "400": 7, "46": 8, "4x": 5, "5": 8, "50": 8, "5000": 8, "512": 8, "53": 8, "54": 8, "56": 8, "574": 8, "628": 8, "7": 8, "70b": [5, 7], "7b": 5, "8": 8, "9": [2, 8], "906": 8, "A": [2, 3, 8], "As": [1, 2, 8], "At": 2, "By": 7, "For": [2, 3, 5, 8], "If": [1, 2, 6, 8], "In": [2, 6, 8], "It": [4, 5, 6, 8], "That": [2, 6], "The": [2, 3, 6, 7, 8], "Then": 2, "These": [2, 8], "To": [2, 3, 6], "With": 8, "__getitem__": 2, "__init__": 2, "__post_init__": 2, "_class": 8, "a100": 3, "abbrevi": 7, "about": [1, 2, 8], "abov": [2, 3, 8], "access": 3, "accommod": 8, "accord": 8, "achiev": 5, "acknowledg": [1, 2], "across": 8, "actor": [2, 7, 8], "actor_train": 8, "actortrain": 8, "actual": 2, "adam": 8, "add": 3, "addit": [2, 3, 8], "address": 7, "adjust": 5, "adopt": 8, "adv_norm": 8, "advanc": 5, "advis": 1, "after": 8, "again": 4, "aggreg": 4, "aigc": 8, "algorithm": [5, 8], "all": [2, 3], "alloc": 8, "allocation_mod": 8, "allow": 3, "also": [2, 3, 5, 6, 7, 8], "altern": 7, "although": 4, "among": 8, "an": [2, 3, 4, 6, 7, 8], "analysi": 2, "ani": [1, 2, 3, 8], "anoth": 6, "answer": 8, "api": 2, "app": [2, 3, 8], "appli": [2, 7], "applic": [1, 5, 8], "approxim": 8, "ar": [1, 2, 5, 7, 8], "arbitrari": 3, "architectur": [2, 5], "area": 7, "arg": 2, "argument": [2, 4], "arrai": 4, "art": 7, "assign": [7, 8], "assist": 1, "assum": [3, 8], "asymmetr": 7, "attention_mask": 2, "attribut": 8, "automat": [2, 8], "automodelforsequenceclassif": 2, "autotoken": 2, "avoid": 8, "back": 2, "backend": 2, "backward": 8, "bar": 7, "base": 7, "basic": 2, "batch": 8, "batch_decod": 2, "becaus": [2, 6, 7], "been": [6, 8], "being": 8, "berlin": 8, "bert": 2, "besid": 8, "best": [1, 8], "between": [2, 7, 8], "beyond": 5, "block": 2, "both": [1, 8], "break": 2, "build": 6, "call": [2, 4, 7, 8], "can": [1, 2, 3, 4, 5, 6, 7, 8], "capit": 8, "case": 2, "caus": [4, 7, 8], "cd": [6, 8], "cfg": 2, "challeng": 1, "chang": [2, 6, 7], "chat": 7, "check": [2, 4, 5, 8], "checkpoint": [3, 8], "choos": 7, "class": [2, 4], "clone": [6, 8], "cluster": [3, 6], "cluster_config": 3, "cluster_nam": 3, "cluster_spec_path": 3, "cluster_typ": 3, "code": [1, 2, 5, 6], "com": [3, 6, 8], "combin": 5, "comma": 3, "command": [2, 3, 4, 5, 8], "comment": 8, "common": 8, "commonli": 2, "commun": [1, 5, 7, 8], "comparison": [7, 8], "compil": 2, "complet": [7, 8], "complex": 8, "compos": [7, 8], "comput": [2, 8], "concaten": 2, "conceptu": 2, "concret": 4, "concurr": 7, "config": 2, "config_api": 2, "config_to_codellama": 2, "configur": [3, 5, 8], "consequ": 8, "consist": 2, "consol": 8, "construct": 2, "constructor": 2, "consum": 8, "consumpt": 8, "contact": 1, "contain": [2, 3, 6], "context": 8, "contigu": 2, "contribut": 5, "control": [3, 8], "conveni": 8, "converg": 8, "convert": 2, "core": 2, "correspond": [2, 4, 8], "cover": 8, "cpu": [3, 6, 8], "cpu_imag": 3, "creat": 2, "critic": [2, 7, 8], "cuda": [2, 4, 6], "current": [2, 3], "custom": [5, 8], "data": [2, 4, 8], "data_api": 2, "data_parallel_s": 8, "dataclass": [2, 8], "dataflow": 5, "dataset": [5, 8], "decoupl": 5, "deepspe": [7, 8], "def": 2, "default_mount": 3, "degre": 7, "denot": 2, "depend": [2, 7], "deploi": [6, 8], "describ": 2, "design": 8, "destin": 8, "detail": [2, 5, 8], "develop": [1, 5, 8], "dfg": 2, "dict": 2, "dictionari": [3, 4, 8], "differ": [4, 6, 7, 8], "direct": 5, "directli": 8, "directori": 6, "distinct": 8, "distrbit": 3, "distribut": [5, 6, 7], "do": [1, 2], "docker": [3, 5], "dockerfil": 6, "document": 1, "doe": [2, 8], "don": [2, 4, 6], "done": [2, 8], "download": 8, "dpo": 5, "due": [1, 8], "dump": 8, "dure": [2, 5, 7], "dynam": [5, 7], "e": [2, 3, 4, 6, 7, 8], "each": [2, 4, 7, 8], "easi": 2, "easiest": 6, "edg": 2, "edit": 6, "effect": [6, 7], "effici": [2, 7, 8], "elimin": 7, "embed": 2, "empow": 1, "enabl": [1, 7], "encod": 2, "end": [2, 3], "engin": 8, "enroot": 3, "entir": 7, "entri": 8, "env": 3, "environ": [3, 6], "epoch": 8, "estim": 7, "etc": 2, "eval": 2, "eval_freq_epoch": 8, "evalu": 2, "everi": [2, 4, 7], "exactli": [2, 8], "exampl": [3, 4, 5, 8], "except": [2, 4], "exect": 2, "execut": [2, 7, 8], "exist": 7, "experi": [2, 5, 6, 8], "experiment_nam": [2, 8], "experimentconfig": 2, "explan": 8, "express": 3, "extend": 4, "extern": 2, "fals": 8, "familar": 2, "familiar": 8, "fast": 8, "faster": 8, "favor": 8, "feasibl": 8, "featur": [1, 2, 4], "feel": 2, "field": [2, 8], "figur": [2, 7], "file": [2, 3, 8], "fileroot": 3, "find": 2, "fine": 5, "finish": 8, "first": [2, 7, 8], "float": 2, "folder": 2, "follow": [2, 3, 4, 7, 8], "force_no_logits_mask": 8, "forget": 2, "form": 8, "format": 8, "former": 8, "forth": 2, "forward": 8, "found": 8, "four": 8, "framework": 8, "franc": 8, "free": 2, "friendli": 4, "from": [2, 3, 4, 5, 8], "from_hf": 2, "from_llama": 2, "from_pretrain": 2, "frozen": 8, "fu": 1, "full": 7, "fulli": [5, 8], "function": [2, 4, 7, 8], "fw": 8, "g": [2, 3, 4, 7, 8], "g1": 3, "g2": 3, "g8": 3, "garrett4wad": [3, 6], "gener": [2, 7, 8], "get": 2, "git": [6, 8], "github": [2, 6, 8], "given": [2, 4, 8], "global": 8, "gpu": [2, 3, 5, 6, 7, 8], "gpu_imag": 3, "gpu_type_from_node_nam": 3, "gradient_checkpoint": 8, "grai": 7, "graph": [2, 5], "guid": 5, "ha": [2, 6, 8], "half": 8, "handl": 8, "have": [1, 2, 6, 8], "help": [1, 8], "helper": 2, "here": 2, "hesit": 1, "heurist": 8, "hf": [2, 8], "high": [1, 5], "hope": 1, "host": [3, 4], "hour": 8, "how": 5, "howev": 7, "http": [6, 8], "hub": 2, "huggingfac": [5, 8], "hundr": 5, "hydra": [4, 8], "hyperparamet": 8, "i": [1, 3, 4, 5, 6, 7, 8], "id": 2, "idea": 7, "identifi": 8, "idx": 2, "iii": 1, "illustr": [4, 8], "imag": [3, 5], "imdb": 2, "impl": 2, "implement": 5, "improv": [1, 5, 7, 8], "includ": [3, 8], "independ": 8, "index": 4, "individu": 8, "inf_reward_rpc": 2, "infer": [2, 8], "info": 8, "inherit": 4, "initi": [2, 8], "initial_setup": 2, "input": 2, "input_id": 2, "insid": 3, "instal": 5, "instrctgpt": 8, "integ": 3, "interfac": 5, "interface_impl": 2, "interface_typ": 2, "introduc": [5, 7], "introduct": [5, 8], "involv": 8, "io": [3, 6], "is_crit": 8, "isol": 6, "issu": 2, "iter": [2, 7, 8], "its": 2, "json": [3, 8], "jsonl": 8, "kei": [2, 7, 8], "kl": 8, "kl_ctl": [2, 8], "larg": [2, 5, 8], "latter": 8, "launch": [2, 3, 6, 8], "layer": [2, 3], "lead": 7, "learn": 8, "length": [2, 4], "let": [2, 8], "light": 7, "like": [2, 3, 8], "limit": [1, 5, 7], "line": [2, 4, 5, 8], "list": [3, 8], "llama": [2, 5, 7], "llm": [1, 5, 8], "load": [2, 8], "local": [6, 8], "locat": [6, 8], "log": [3, 8], "logit": 2, "loss": 8, "low": 3, "lustr": 8, "m": [2, 3, 8], "mai": 8, "main": 8, "maintain": [1, 5], "major": [7, 8], "make": 8, "manag": [4, 8], "manual": 8, "map": [2, 3], "mask": 4, "master": [3, 8], "masterwork": 2, "match": 3, "max_new_token": 8, "max_pairs_per_prompt": 8, "max_prompt_len": 8, "max_seqlen": 8, "maxim": [7, 8], "mei": 1, "memori": [2, 5, 8], "metadata": [2, 4], "method": 2, "mfc": 2, "mfcdef": 2, "min_new_token": 8, "minut": 8, "mode": 8, "model": [3, 5, 7], "model_api": 2, "model_nam": 2, "model_parallel_s": 8, "model_rpc": 2, "model_work": 2, "modelbackend": 2, "modelinterfac": 2, "modelwork": 2, "modifi": 2, "modul": 6, "moe": 2, "monitor": 2, "more": [2, 5, 8], "moreov": 8, "most": 8, "mount": [3, 6], "move": 8, "movi": 8, "multipl": [4, 8], "mw": 2, "my": [2, 3], "myppoconfig": 2, "n_node": 8, "name": [2, 3, 4, 7], "namedarrai": [2, 5], "necessari": 6, "need": 3, "neg": [2, 8], "neg_answ": 8, "nest": 4, "new": [1, 5], "next": 8, "nf": 3, "nn": 2, "no_grad": 2, "node": [2, 3, 5, 7, 8], "node_name_prefix": 3, "node_type_from_node_nam": 3, "none": 8, "note": [2, 8], "novel": [5, 7], "now": 2, "null": [2, 8], "number": 8, "object": [2, 4, 8], "observ": 7, "offload": [2, 5, 8], "often": 7, "okai": 6, "onc": [2, 8], "one": [2, 3], "onli": [3, 6], "onlin": 2, "open": [1, 7], "openpsi": [6, 8], "openrlhf": 7, "optim": 5, "option": [2, 4, 8], "ordinari": 8, "origin": 2, "other": 2, "our": [1, 2, 5, 7, 8], "output": [2, 8], "over": [7, 8], "overhead": [5, 7, 8], "overrid": 4, "overview": 5, "overwrit": 6, "overwritten": 8, "packag": 6, "pad": [2, 4], "page": [2, 4, 5], "pair": 8, "pairedcomparisondatasetconfig": 2, "pairwis": 8, "paper": 8, "parallel": [5, 7, 8], "paramet": [2, 5, 7, 8], "pars": 5, "pass": [2, 8], "path": [2, 3, 6, 8], "perform": [5, 8], "phd": 1, "pip": 6, "pipe": 8, "pipelin": [7, 8], "pipeline_parallel_s": 8, "plain": 4, "plan": 8, "pleas": [1, 2, 4, 5, 8], "plugin": 3, "point": 2, "polici": 8, "pos_answ": 8, "posit": [2, 8], "possibli": 4, "post_hook": 2, "potenti": 1, "ppo": [2, 3, 4, 5], "ppo_n_minibatch": 8, "ppo_prompt": 8, "ppo_senti": 2, "ppoconfig": 2, "ppohyperparamet": 8, "practic": 8, "pre": 8, "prefer": 5, "prefix": 3, "prepar": 8, "preserv": 2, "previou": [4, 7], "primari": 8, "prior": 7, "probabl": 8, "proce": 8, "procedur": 8, "process": [7, 8], "professor": 1, "profil": 7, "project": [4, 6, 8], "prompt": 8, "prompt_answ": 2, "prompt_answer_dataset": 2, "promptanswerdataset": 2, "promptanswerdatasetconfig": 2, "promptonlydatasetconfig": 2, "properli": [2, 8], "provid": [2, 4, 6, 8], "pt": 2, "pull": 6, "purpl": 7, "py": 2, "pypi": 5, "python": [6, 8], "python3": [2, 3, 8], "pytorch": 2, "pyxi": 3, "qualiti": 1, "question": [1, 2], "quickstart": [2, 3, 4, 5], "quit": 1, "rais": 2, "rate": 8, "re": [2, 6, 8], "read": 1, "real": [2, 3, 4, 6, 7, 8], "real_llm_api": 2, "realhf": [2, 3, 4, 6, 8], "realloc": [2, 5, 7, 8], "realmodel": 2, "realmodelconfig": 2, "reason": 2, "record": 2, "recurs": [4, 8], "reduc": [5, 7, 8], "redund": 7, "ref": 8, "ref_inf": 8, "refer": [2, 3, 5, 8], "refinf": 8, "regard": 4, "regist": 2, "register_dataset": 2, "register_hf_famili": 2, "register_interfac": 2, "register_metadata": 2, "register_quickstart_exp": 2, "regular": [3, 8], "rel": 8, "releas": 8, "remap": [2, 8], "remind": 8, "remov": 2, "replac": 5, "repositori": [1, 2, 6, 8], "repres": [2, 8], "request": 2, "requir": [2, 6, 8], "resourc": [1, 5], "respect": [2, 8], "respons": 8, "result": [1, 2, 4], "return": 2, "return_tensor": 2, "rew": [2, 8], "reward": [2, 5], "reward_output_sc": 8, "right": 2, "rlhf": [1, 5, 7], "rm": 5, "rm_pair": 8, "role": 2, "rpc": 2, "run": [2, 5, 6, 8], "runtim": 6, "rw": 8, "sai": 2, "same": [7, 8], "sampl": [2, 8], "satisfi": 2, "save": [2, 8], "save_freq_step": 8, "scale": [5, 7], "scenario": 2, "score": 2, "score_model": 2, "score_token": 2, "scratch": 6, "script": 2, "search": 8, "second": 7, "see": [2, 8], "self": 2, "sentiment": [2, 8], "sentiment_scor": 2, "sentimentscoringinterfac": 2, "separ": 3, "sequenc": [2, 4], "set": [5, 8], "sft": [2, 4, 8], "sft_interfac": 2, "sft_po": 8, "sftconfig": [2, 4, 8], "sftinterfac": 2, "shape": 4, "shard": [2, 8], "share": 2, "shift": [5, 6, 7], "should": [2, 3, 4, 6, 8], "show": [2, 7, 8], "shown": 7, "signal": 2, "signatur": 2, "significantli": [5, 7], "similar": 2, "sinc": 4, "singl": [2, 5, 8], "six": 8, "size": 8, "skip_special_token": 2, "slice": 4, "slurm": 3, "smaller": 7, "so": [2, 8], "solut": 7, "some": [2, 3], "sourc": [1, 5, 7, 8], "specif": [2, 4, 8], "specifi": [2, 3, 8], "speedup": 7, "srl": 4, "srun": 3, "stage": 5, "state": [2, 7], "step": [5, 7, 8], "still": 6, "stop": 8, "store": [2, 3, 4], "strategi": [5, 7, 8], "string": [3, 8], "structur": 8, "student": 1, "style": 2, "substanti": [7, 8], "successfulli": 8, "sum": 8, "super": 2, "supervis": 5, "support": [3, 5, 8], "synchron": [4, 7], "system": [2, 4, 5, 7], "system_api": 2, "t": [2, 4, 6], "tailor": 7, "take": [2, 6, 8], "target": 6, "task": 7, "techniqu": [5, 7, 8], "technologi": 1, "tensor": [4, 7, 8], "terribl": 2, "test": 2, "text": 2, "than": 8, "therefor": 8, "thi": [1, 2, 3, 4, 5, 6, 7, 8], "thousand": 5, "three": [2, 8], "throughput": [5, 7], "time": [1, 7, 8], "timelin": 7, "tmp": 3, "todo": 2, "togeth": 2, "token": 2, "tokenizer_path": 2, "top_k": [2, 8], "top_p": [2, 8], "torch": 2, "total": 8, "total_train_epoch": 8, "train": [2, 5, 7, 8], "train_bs_n_seq": 8, "train_path": 8, "transform": 2, "travel": 8, "trial_nam": [2, 8], "true": [2, 8], "truli": 1, "truncat": 2, "tsinghua": 1, "tune": 5, "tutori": 8, "two": [2, 6, 7, 8], "type": [2, 3, 4, 8], "u": 1, "under": [2, 7], "univers": 1, "up": [5, 8], "updat": 8, "upon": 8, "us": [2, 3, 4, 6, 7, 8], "usag": 2, "use_sequence_parallel": 8, "user": [2, 4, 6, 8], "usual": 2, "util": [2, 5, 7, 8], "valid": 8, "valid_bs_n_seq": 8, "valid_path": 8, "valu": 8, "value_eps_clip": 8, "value_norm": 8, "variabl": [2, 3], "veri": [1, 8], "version": 2, "via": 8, "visit": 8, "vllm": 8, "wai": [4, 6, 7], "want": [2, 6], "we": [1, 2, 3, 4, 6, 7, 8], "wei": 1, "well": [2, 6, 8], "what": [2, 8], "wheel": 6, "when": [2, 3, 6, 7], "where": [2, 3, 7, 8], "which": [2, 4, 7, 8], "while": [7, 8], "whom": 1, "why": 6, "wish": 1, "within": [5, 8], "without": [2, 4, 6], "worker": [3, 8], "workload": 8, "would": 2, "wrap": 2, "write": 3, "wu": 1, "yeah": 8, "yi": 1, "you": [1, 2, 3, 6, 8], "your": [2, 5, 8], "zero": [5, 8], "zhiyu": 1}, "titles": ["Code Architecture", "Contributing", "Customization", "Set Up Distributed Experiments", "Configurations", "Welcome to ReaL\u2019s documentation!", "Installation", "Introduction", "Quickstart"], "titleterms": {"": 5, "1": [2, 8], "2": [2, 8], "3": 8, "30min": 8, "4x": 8, "7b": 8, "algorithm": 2, "architectur": 0, "code": 0, "configur": [2, 4], "content": 5, "contribut": 1, "custom": 2, "dataflow": [2, 4], "dataset": [2, 4], "develop": 2, "direct": 8, "distribut": 3, "docker": 6, "document": 5, "dpo": 8, "easi": 5, "effici": 5, "exampl": 2, "experi": [3, 4], "fine": 8, "flexibl": 5, "from": 6, "graph": 4, "highlight": 5, "how": 2, "huggingfac": 2, "i": 2, "imag": 6, "implement": 2, "instal": [6, 8], "interfac": 2, "introduct": 7, "llama": 8, "model": [2, 4, 8], "namedarrai": 4, "new": 2, "optim": 8, "overview": 2, "pars": 2, "ppo": 8, "prefer": 8, "pypi": 6, "quickstart": 8, "real": 5, "replac": 2, "reward": 8, "rlhf": 8, "rm": 8, "set": 3, "sourc": 6, "stage": 8, "step": 2, "super": 5, "supervis": 8, "support": 2, "tune": 8, "up": 3, "us": 5, "welcom": 5}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"Code Architecture": [[0, "code-architecture"]], "Configurations": [[4, "configurations"]], "Contents": [[5, "contents"]], "Contributing": [[1, "contributing"]], "Customization": [[2, "customization"]], "Customizing Algorithms": [[2, "customizing-algorithms"]], "Customizing Datasets": [[2, "customizing-datasets"]], "Customizing Models": [[2, "customizing-models"]], "Dataflow Graph": [[4, "dataflow-graph"]], "Dataset Configurations": [[4, "dataset-configurations"]], "Docker Images": [[6, "docker-images"]], "Example: A Customized Reward Function for PPO": [[2, "example-a-customized-reward-function-for-ppo"]], "Experiment Configurations": [[4, "experiment-configurations"]], "How Dataset Configuration is Parsed": [[2, "how-dataset-configuration-is-parsed"]], "Install From PyPI or Source": [[6, "install-from-pypi-or-source"]], "Installation": [[6, "installation"], [8, "installation"]], "Introduction": [[7, "introduction"]], "Model Configurations": [[4, "model-configurations"]], "NamedArray": [[4, "namedarray"]], "Overview": [[2, "overview"], [2, "id1"], [2, "id2"]], "Quickstart": [[8, "quickstart"]], "RLHF with 4x LLaMA-7B in 30min": [[8, "rlhf-with-4x-llama-7b-in-30min"]], "Set Up Distributed Experiments": [[3, "set-up-distributed-experiments"]], "Stage 1: Supervised Fine-Tuning": [[8, "stage-1-supervised-fine-tuning"]], "Stage 2.1: Reward Modeling (RM)": [[8, "stage-2-1-reward-modeling-rm"]], "Stage 2.2: Direct Preference Optimization (DPO)": [[8, "stage-2-2-direct-preference-optimization-dpo"]], "Stage 3: PPO": [[8, "stage-3-ppo"]], "Steps for Implementing a New Dataset": [[2, "steps-for-implementing-a-new-dataset"]], "Steps to Support a New HuggingFace Model": [[2, "steps-to-support-a-new-huggingface-model"]], "Welcome to ReaL\u2019s documentation!": [[5, "welcome-to-real-s-documentation"]]}, "docnames": ["arch", "contributing", "customization", "distributed", "expconfig", "index", "install", "intro", "quickstart"], "envversion": {"sphinx": 61, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["arch.rst", "contributing.rst", "customization.rst", "distributed.rst", "expconfig.rst", "index.rst", "install.rst", "intro.rst", "quickstart.rst"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [2, 8], "0": [2, 3, 6, 8], "00": 8, "007": 8, "01": 3, "03": 8, "04": [3, 6], "051": 8, "081": 8, "083": 8, "094": 8, "1": [2, 3, 5, 6], "10": [6, 8], "1000": [2, 8], "1024": 8, "128": 8, "13": 8, "13b": 7, "14": 8, "15": 8, "19": 8, "2": [5, 7], "20240618": 8, "216": 8, "22": [3, 6], "23": 6, "256": 8, "26": 8, "294": 8, "3": 5, "30min": 5, "312": 8, "32": 8, "34": 8, "34b": 7, "38": 8, "387": 8, "39": 8, "393": 8, "3d": 8, "4": [7, 8], "400": 7, "46": 8, "4x": 5, "5": 8, "50": 8, "5000": 8, "512": 8, "53": 8, "54": 8, "56": 8, "574": 8, "628": 8, "7": 8, "70b": 7, "7b": [5, 7], "8": 8, "9": [2, 8], "906": 8, "A": [3, 5, 8], "As": [2, 8], "At": 2, "By": 7, "For": [2, 3, 8], "If": [1, 2, 6, 8], "In": [2, 7, 8], "It": [4, 8], "The": [2, 3, 6, 7, 8], "There": 2, "These": [2, 8], "To": [2, 3, 6], "With": 8, "__getitem__": 2, "__init__": 2, "__post_init__": 2, "_class": 8, "a100": 3, "about": [1, 8], "abov": [2, 3, 8], "access": 3, "accommod": 8, "accord": 8, "acknowledg": 2, "across": 8, "actor": [2, 7, 8], "actor_train": 8, "actortrain": 8, "actual": 2, "adam": 8, "add": 3, "addit": [2, 3, 8], "address": 7, "adopt": 8, "adv_norm": 8, "after": [6, 8], "again": 4, "aggreg": 4, "aigc": 8, "algorithm": [5, 8], "all": [2, 3, 6], "alloc": 8, "allocation_mod": 8, "allow": [3, 6], "along": 2, "also": [2, 3, 6, 7, 8], "altern": 7, "although": 4, "among": 8, "an": [2, 3, 4, 6, 7, 8], "analysi": 2, "ani": [1, 2, 8], "answer": 8, "api": 2, "app": [2, 3, 8], "appli": [2, 7], "applic": [1, 2, 8], "approxim": 8, "ar": [2, 3, 7, 8], "arbitrari": 3, "architectur": 2, "area": 7, "arg": 2, "argument": [2, 4], "arrai": 4, "art": 7, "assign": [7, 8], "assist": 1, "assum": [3, 8], "asymmetr": 7, "attention_mask": 2, "attribut": 8, "automat": [2, 8], "automodelforsequenceclassif": 2, "autotoken": 2, "avail": 6, "avoid": 8, "back": 2, "backend": 2, "backward": 8, "bar": 7, "base": 7, "bash": 6, "batch": [2, 8], "batch_decod": 2, "becaus": [2, 7], "been": 8, "being": 8, "below": 3, "berlin": 8, "bert": 2, "besid": 8, "best": [1, 8], "between": [2, 7, 8], "block": 2, "both": 8, "break": 2, "build": [6, 8], "built": 6, "c": 6, "call": [2, 4, 7, 8], "can": [1, 2, 4, 6, 7, 8], "capit": 8, "case": 2, "caus": [4, 7, 8], "cd": [6, 8], "cfg": 2, "challeng": 2, "chang": 2, "chat": 7, "check": [2, 4, 6, 8], "checkpoint": [3, 8], "choos": 7, "class": [2, 4], "clone": [6, 8], "cluster": [3, 6], "cluster_config": 3, "cluster_nam": 3, "cluster_spec_path": 3, "cluster_typ": 3, "code": [1, 2, 6], "codellama": 7, "com": [3, 6, 8], "comma": 3, "command": [2, 3, 4, 6, 8], "comment": 8, "common": 8, "commonli": 2, "commun": [1, 7, 8], "comparison": [7, 8], "compil": [2, 6], "complet": [7, 8], "complex": 8, "compos": [7, 8], "comput": [2, 7, 8], "concaten": 2, "conceptu": 2, "concret": 4, "concurr": 7, "config": 2, "config_api": 2, "configur": [3, 5, 8], "consequ": 8, "consid": 2, "consist": 2, "consol": 8, "construct": 2, "constructor": 2, "consum": 8, "consumpt": 8, "contact": 1, "contain": [2, 3, 6], "context": 8, "contigu": 2, "contribut": 5, "control": [3, 8], "conveni": 8, "converg": 8, "convers": 2, "convert": 2, "core": 2, "correspond": [2, 4, 8], "cover": 8, "cpu": [3, 6], "cpu_imag": 3, "creat": [2, 3], "critic": [2, 7, 8], "cuda": [2, 4, 6], "current": [1, 2, 3, 6], "custom": [5, 8], "data": [2, 4, 8], "data_api": 2, "data_parallel_s": 8, "dataclass": [2, 8], "dataflow": [2, 5], "dataset": [5, 8], "dataset_path": 2, "deepspe": [7, 8], "def": 2, "default_mount": 3, "degre": 7, "demonstr": 2, "depend": [2, 7], "deploi": 8, "deploy": 6, "describ": 2, "design": 8, "destin": 8, "detail": [2, 8], "develop": [6, 8], "dfg": 2, "dict": 2, "dictionari": [3, 4, 8], "differ": [4, 7, 8], "direct": 5, "directli": [1, 6, 8], "distinct": 8, "distribut": [5, 7], "do": [1, 2], "docker": [3, 5], "dockerfil": 6, "document": 2, "doe": [2, 8], "don": [2, 4, 6], "done": 8, "download": 8, "dpo": 5, "due": 8, "dump": 8, "dure": 2, "dynam": 7, "e": [2, 3, 4, 6, 7, 8], "each": [2, 4, 7, 8], "easiest": 6, "edg": 2, "edit": 6, "effect": 7, "effici": [2, 7, 8], "elimin": 7, "embed": 2, "empow": 1, "enabl": [1, 7], "encod": 2, "end": [2, 3], "engin": 8, "enroot": 3, "entir": 7, "entri": 8, "env": 3, "environ": 3, "epoch": 8, "estim": 7, "eval": 2, "eval_freq_epoch": 8, "evalu": 2, "everi": [2, 3, 4, 7], "exactli": 8, "exampl": [3, 4, 5, 8], "except": [2, 4], "execut": [2, 7, 8], "exist": [6, 7], "experi": [2, 5, 6, 8], "experiment_nam": [2, 8], "experimentconfig": 2, "explan": 8, "express": 3, "extend": 4, "extens": 6, "extern": 2, "fals": 8, "familiar": 8, "fast": 8, "faster": 8, "favor": 8, "feasibl": 8, "featur": [2, 4], "feel": 2, "field": [2, 8], "figur": [2, 7], "file": [2, 3, 8], "fileroot": 3, "final": 2, "find": 2, "fine": 5, "finish": 8, "first": [2, 7, 8], "float": 2, "folder": 2, "follow": [2, 3, 4, 6, 7, 8], "force_no_logits_mask": 8, "forget": 2, "form": 8, "format": 8, "former": 8, "forth": 2, "forward": 8, "found": [2, 8], "four": 8, "framework": 8, "franc": 8, "free": 2, "friendli": 4, "from": [2, 3, 4, 5, 7, 8], "from_hf": 2, "from_llama": 2, "from_pretrain": 2, "frozen": 8, "full": 7, "fulli": 8, "function": [4, 5, 8], "fw": 8, "g": [2, 3, 4, 7, 8], "g1": 3, "g2": 3, "g8": 3, "garrett4wad": [3, 6], "gener": [2, 7, 8], "get": 2, "git": [6, 8], "github": [2, 6, 8], "given": [4, 8], "global": 8, "gpu": [2, 3, 6, 7, 8], "gpu_imag": 3, "gpu_type_from_node_nam": 3, "gradient_checkpoint": 8, "grai": 7, "graph": [2, 5], "ha": [2, 8], "half": 8, "handl": 8, "have": [1, 2, 8], "help": [1, 8], "helper": 2, "hesit": 1, "heurist": 8, "hf": 8, "hope": 1, "host": [3, 4], "hour": 8, "how": 5, "howev": 7, "http": [6, 8], "hub": 2, "huggingfac": [5, 8], "hydra": [4, 8], "hyperparamet": 8, "i": [1, 3, 4, 5, 6, 7, 8], "id": 2, "idea": 7, "identifi": [2, 8], "idx": 2, "illustr": [2, 4, 8], "imag": [3, 5], "imdb": 2, "impl": 2, "implement": 5, "improv": [1, 7, 8], "includ": [2, 3, 8], "increas": 7, "independ": 8, "index": 4, "indic": 2, "individu": 8, "inf_reward_rpc": 2, "infer": [2, 8], "info": 8, "inherit": 4, "initi": [2, 8], "initial_setup": 2, "input": 2, "input_id": 2, "insid": [3, 6], "instal": 5, "instrctgpt": 8, "instruct": [2, 6], "integ": 3, "interfac": 2, "interface_impl": 2, "interface_typ": 2, "introduct": [5, 8], "involv": [2, 8], "io": 6, "is_crit": 8, "isol": [6, 8], "issu": [1, 2], "iter": [2, 7, 8], "its": 2, "json": [3, 8], "jsonl": 8, "kei": [2, 7, 8], "kl": 8, "kl_ctl": [2, 8], "larg": [2, 8], "largest": 7, "latter": 8, "launch": [2, 3, 6, 8], "layer": [2, 3], "lead": 7, "learn": 8, "length": [2, 4], "let": [2, 8], "light": 7, "like": [2, 8], "limit": 7, "line": [2, 4, 8], "list": [3, 8], "llama": [2, 5, 7], "llm": [1, 8], "load": [2, 8], "local": [6, 8], "locat": [2, 8], "log": [3, 8], "logit": 2, "loss": 8, "low": 3, "lustr": 8, "m": [2, 3, 6, 8], "machin": 6, "mai": 8, "main": 8, "major": [7, 8], "make": 8, "manag": [4, 8], "manual": 8, "map": [2, 3], "mask": 4, "master": [3, 8], "masterwork": 2, "match": 3, "max_length": 2, "max_new_token": 8, "max_pairs_per_prompt": 8, "max_prompt_len": 8, "max_seqlen": [2, 8], "maxim": [7, 8], "memori": [2, 8], "mention": 3, "metadata": [2, 4], "method": 2, "mfc": 2, "mfcdef": 2, "micro": 2, "min_new_token": 8, "minut": 8, "mode": 8, "model": [3, 5, 7], "model_api": 2, "model_nam": 2, "model_parallel_s": 8, "model_rpc": 2, "model_work": 2, "modelbackend": 2, "modelinterfac": 2, "modelwork": 2, "modif": 2, "modifi": [2, 6], "moe": 2, "monitor": 2, "more": [2, 8], "moreov": 8, "most": 8, "mount": [3, 6], "move": 8, "movi": 8, "multipl": [4, 8], "mw": 2, "my": [2, 3], "myppoconfig": 2, "n_node": 8, "name": [2, 3, 4], "namedarrai": [2, 5], "necessari": 2, "need": [2, 3], "neg": [2, 8], "neg_answ": 8, "nest": 4, "new": 5, "next": [6, 8], "nf": 3, "nn": 2, "no_grad": 2, "node": [2, 3, 6, 7, 8], "node_name_prefix": 3, "node_type_from_node_nam": 3, "none": 8, "note": [2, 8], "now": 2, "null": [2, 8], "number": [7, 8], "nvcr": 6, "nvidia": 6, "object": [2, 4, 8], "observ": 7, "offer": 6, "offload": [2, 8], "often": [2, 7], "onc": [2, 8], "one": [2, 3], "onli": [3, 6], "onlin": 2, "open": [1, 7], "openpsi": [6, 8], "openrlhf": 7, "optim": 5, "option": [2, 4, 8], "ordinari": 8, "origin": 2, "other": 2, "our": [1, 2, 7, 8], "output": [2, 8], "outsid": 6, "over": [7, 8], "overhead": [7, 8], "overrid": 4, "overview": 5, "overwritten": 8, "packag": 6, "pad": [2, 4], "page": 4, "pair": 8, "pairedcomparisondatasetconfig": 2, "pairwis": 8, "paper": 8, "parallel": [7, 8], "paramet": [2, 7, 8], "pars": 5, "pass": [2, 8], "path": [2, 3, 6, 8], "perform": [2, 8], "pip": 6, "pip3": 8, "pipe": 8, "pipelin": [2, 7, 8], "pipeline_parallel_s": 8, "plain": 4, "plan": 8, "pleas": [1, 2, 4, 8], "plugin": 3, "point": 2, "polici": 8, "pos_answ": 8, "posit": [2, 8], "possibli": 4, "post_hook": 2, "ppo": [3, 4, 5], "ppo_n_minibatch": 8, "ppo_prompt": 8, "ppo_senti": 2, "ppoconfig": 2, "ppohyperparamet": 8, "practic": 8, "pre": [6, 8], "prefer": [5, 6], "prefix": 3, "prepar": 8, "previou": [4, 7], "primari": 8, "primarili": 2, "prior": 7, "probabl": 8, "proce": 8, "procedur": 8, "process": [7, 8], "profil": 7, "project": [4, 6, 8], "prompt": 8, "prompt_answ": 2, "prompt_answer_dataset": 2, "promptanswerdataset": 2, "promptanswerdatasetconfig": 2, "promptonlydatasetconfig": 2, "properli": 8, "properti": 2, "provid": [2, 4, 6, 8], "pt": 2, "pull": [1, 6], "purpl": 7, "py": 2, "py3": 6, "pypi": 5, "python": 8, "python3": [2, 3, 6, 8], "pytorch": [2, 6], "pyxi": 3, "question": [1, 2], "quickstart": [2, 3, 4, 5, 6], "rais": [1, 2], "rate": 8, "re": [2, 8], "real": [2, 3, 4, 6, 7, 8], "real_llm_api": 2, "realhf": [2, 3, 4, 6, 8], "realloc": [2, 7], "realmodel": 2, "realmodelconfig": 2, "reason": 2, "record": 2, "recurs": [4, 8], "reduc": 8, "redund": 7, "ref": 8, "ref_inf": 8, "refer": [2, 3, 8], "refinf": 8, "regard": 4, "regist": 2, "register_dataset": 2, "register_hf_famili": 2, "register_interfac": 2, "register_metadata": 2, "register_quickstart_exp": 2, "regular": [3, 8], "rel": 8, "releas": 8, "remap": [2, 8], "rememb": 6, "remind": 8, "remov": 2, "replac": 2, "repositori": [1, 2, 6, 8], "repres": [2, 8], "request": [1, 2], "requir": [2, 6, 8], "rerun": 6, "reserv": 2, "respect": [2, 8], "respons": 8, "result": [2, 4], "return": 2, "return_tensor": 2, "rew": [2, 8], "reward": 5, "reward_output_sc": 8, "rlhf": [1, 5, 7], "rm": 5, "rm_pair": 8, "role": 2, "rpc": 2, "run": [2, 6, 8], "runtim": 6, "rw": 8, "same": [7, 8], "sampl": [2, 8], "satisfi": 2, "save": [2, 8], "save_freq_step": 8, "scale": 7, "scenario": 2, "score": 2, "score_model": 2, "score_token": 2, "script": 2, "search": 8, "second": 7, "see": [2, 8], "self": 2, "sentiment": [2, 8], "sentiment_scor": 2, "sentimentscoringinterfac": 2, "separ": 3, "sequenc": [2, 4], "set": [5, 8], "sft": [2, 4, 8], "sft_interfac": 2, "sft_po": 8, "sftconfig": [2, 4, 8], "sftinterfac": 2, "shape": 4, "shard": [2, 8], "share": 2, "should": [2, 3, 4, 6, 8], "show": [7, 8], "shown": [3, 7], "signal": 2, "signatur": 2, "similar": 2, "sinc": 4, "singl": [2, 8], "six": 8, "size": [7, 8], "skip_special_token": 2, "slice": 4, "slurm": 3, "smaller": 7, "so": [2, 6, 8], "solut": 7, "some": 2, "sourc": [1, 5, 7, 8], "specif": [2, 4, 8], "specifi": [2, 3, 8], "srl": 4, "srun": 3, "stage": 5, "state": [2, 7], "step": [5, 7, 8], "stop": 8, "store": [2, 3, 4], "strategi": [7, 8], "string": [3, 8], "structur": 8, "style": 2, "substanti": [7, 8], "successfulli": 8, "sum": 8, "super": 2, "supervis": 5, "support": [3, 5, 8], "synchron": [4, 7], "system": [2, 4, 7], "system_api": 2, "t": [2, 4, 6], "tailor": 7, "take": 8, "task": 7, "techniqu": 8, "technologi": 1, "templat": 1, "tensor": [4, 7, 8], "test": 2, "text": 2, "than": 8, "therefor": 8, "thi": [1, 2, 3, 4, 6, 8], "three": [2, 8], "throughput": 7, "time": [7, 8], "timelin": 7, "tmp": 3, "to_llama": 2, "token": 2, "tokenizer_path": 2, "top_k": [2, 8], "top_p": [2, 8], "torch": 2, "total": 8, "total_train_epoch": 8, "train": [2, 7, 8], "train_bs_n_seq": 8, "train_path": [2, 8], "transform": 2, "travel": 8, "trial_nam": [2, 8], "true": [2, 8], "truli": 1, "truncat": 2, "tune": 5, "tutori": 8, "two": [2, 7, 8], "type": [2, 3, 4, 8], "u": 1, "ubuntu": 6, "under": [2, 7], "unfamiliar": 2, "up": [5, 7, 8], "updat": [2, 8], "upload": 6, "upon": 8, "us": [2, 3, 4, 6, 8], "use_sequence_parallel": 8, "user": [2, 4, 8], "util": [2, 7, 8], "valid": 8, "valid_bs_n_seq": 8, "valid_path": 8, "valu": 8, "value_eps_clip": 8, "value_norm": 8, "variabl": [2, 3], "veri": 8, "version": [2, 6], "via": 8, "visit": 8, "vllm": 8, "wai": [4, 6, 7], "want": [2, 6], "we": [1, 2, 3, 4, 6, 7, 8], "well": [2, 8], "what": [2, 8], "wheel": 6, "when": [2, 3, 7], "where": [2, 3, 7, 8], "which": [2, 4, 8], "while": [7, 8], "wish": [1, 2], "within": 8, "without": [2, 4], "worker": [3, 8], "workload": [7, 8], "wrap": 2, "yeah": 8, "you": [1, 2, 3, 6, 8], "your": [2, 6, 8], "zero": 8}, "titles": ["Code Architecture", "Contributing", "Customization", "Set Up Distributed Experiments", "Configurations", "Welcome to ReaL\u2019s documentation!", "Installation", "Introduction", "Quickstart"], "titleterms": {"": 5, "1": 8, "2": 8, "3": 8, "30min": 8, "4x": 8, "7b": 8, "A": 2, "algorithm": 2, "architectur": 0, "code": 0, "configur": [2, 4], "content": 5, "contribut": 1, "custom": 2, "dataflow": 4, "dataset": [2, 4], "direct": 8, "distribut": 3, "docker": 6, "document": 5, "dpo": 8, "exampl": 2, "experi": [3, 4], "fine": 8, "from": 6, "function": 2, "graph": 4, "how": 2, "huggingfac": 2, "i": 2, "imag": 6, "implement": 2, "instal": [6, 8], "introduct": 7, "llama": 8, "model": [2, 4, 8], "namedarrai": 4, "new": 2, "optim": 8, "overview": 2, "pars": 2, "ppo": [2, 8], "prefer": 8, "pypi": 6, "quickstart": 8, "real": 5, "reward": [2, 8], "rlhf": 8, "rm": 8, "set": 3, "sourc": 6, "stage": 8, "step": 2, "supervis": 8, "support": 2, "tune": 8, "up": 3, "welcom": 5}})
\ No newline at end of file