Support Multi-GPU training based on the paper "On Scaling Up 3D Gauss…

…ian Splatting Training" (#253) * checkin the code * nicer API * mcmc script now can works with multigpu * trainer supports multi gpu * get rid of reduduant code * func doc * support packed mode * format * more exp * multi GPU viewer * optim * cleanup * cleanup * merge main * MCMC * doc * scripts * scripts and performance --------- Co-authored-by: Ruilong Li <[email protected]>
nerfstudio-project · Aug 3, 2024 · f92fd3f · f92fd3f
1 parent 8a0e500
commit f92fd3f
Show file tree

Hide file tree

Showing 17 changed files with 1,364 additions and 425 deletions.
diff --git a/EXPLORATION.md b/EXPLORATION.md
@@ -26,7 +26,7 @@
 | `--absgrad --grow_grad2d 2e-4`               | 8m30s    | 0.018s/im | 2.21 GB | 0.6251 | 20.68 | 0.587 | 0.89M |
 | `--absgrad --grow_grad2d 2e-4` (30k)         | --       | 0.030s/im | 5.25 GB | 0.7442 | 24.12 | 0.291 | 2.62M |
 
-Note: default args means running `python simple_trainer.py --data_dir <DATA_DIR>` with:
+Note: default args means running `CUDA_VISIBLE_DEVICES=0 python simple_trainer.py --data_dir <DATA_DIR>` with:
 
 - Garden ([Source](https://jonbarron.info/mipnerf360/)): `--result_dir results/garden`
 - U1 (a.k.a University 1 from [Source](https://localrf.github.io/)): `--result_dir results/u1 --data_factor 1 --grow_scale3d 0.001`

diff --git a/README.md b/README.md
@@ -42,7 +42,7 @@ pip install -r requirements.txt
 # download mipnerf_360 benchmark data
 python datasets/download_dataset.py
 # run batch evaluation
-bash benchmark.sh
+bash benchmarks/basic.sh
 ```
 
 ## Examples

diff --git a/docs/source/examples/colmap.rst b/docs/source/examples/colmap.rst
@@ -15,7 +15,7 @@ Simply run the script under `examples/`:
 
 .. code-block:: bash
 
-    python simple_trainer.py \
+    CUDA_VISIBLE_DEVICES=0 python simple_trainer.py \
         --data_dir data/360_v2/garden/ --data_factor 4 \
         --result_dir ./results/garden
 

diff --git a/docs/source/examples/large_scale.rst b/docs/source/examples/large_scale.rst
@@ -35,7 +35,7 @@ The code for this example can be found under `examples/`:
 .. code-block:: bash
 
     # First train a 3DGS model
-    python simple_trainer.py \
+    CUDA_VISIBLE_DEVICES=0 python simple_trainer.py \
         --data_dir data/360_v2/garden/ --data_factor 4 \
         --result_dir ./results/garden
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -13,13 +13,21 @@ Overview
 Real-Time Rendering of Radiance Fields" :cite:p:`kerbl3Dgaussians`, but we've made *gsplat* even 
 faster, more memory efficient, and with a growing list of new features!
 
-* *gsplat* is developed with efficiency in mind. Comparing to the `official implementation <https://github.com/graphdeco-inria/gaussian-splatting>`_, *gsplat* enables up to **4x less training memory footprint**, and up to **15% less training time** on Mip-NeRF 360 captures, and potential more on larger scenes. See :doc:`tests/eval` for details.
+* *gsplat* is developed with efficiency in mind. Comparing to the `official implementation <https://github.com/graphdeco-inria/gaussian-splatting>`_, 
+  *gsplat* enables up to **4x less training memory footprint**, and up to **15% less training time** on Mip-NeRF 360 captures, and potential more on larger scenes. See :doc:`tests/eval` for details.
 
-* *gsplat* is designed to **support extremely large scene rendering**, which is magnitudes faster than the official CUDA backend `diff-gaussian-rasterization <https://github.com/graphdeco-inria/diff-gaussian-rasterization>`_. See :doc:`examples/large_scale` for an example.
+* *gsplat* is designed to **support extremely large scene rendering**, which is magnitudes 
+  faster than the official CUDA backend `diff-gaussian-rasterization <https://github.com/graphdeco-inria/diff-gaussian-rasterization>`_. See :doc:`examples/large_scale` for an example.
 
-* *gsplat* offers many extra features, including **batch rasterization**,  **N-D feature rendering**, **depth rendering**, **sparse gradient** etc. See :doc:`apis/rasterization` for details.
+* *gsplat* offers many extra features, including **batch rasterization**,  
+  **N-D feature rendering**, **depth rendering**, **sparse gradient**, 
+  **multi-GPU distributed rasterization**
+  etc. See :doc:`apis/rasterization` for details.
 
-* *gsplat* is equipped with the **latest and greatest** 3D Gaussian Splatting techniques, including `absgrad <https://ty424.github.io/AbsGS.github.io/>`_, `anti-aliasing <https://niujinshuchong.github.io/mip-splatting/>`_ etc. And more to come!
+* *gsplat* is equipped with the **latest and greatest** 3D Gaussian Splatting techniques, 
+  including `absgrad <https://ty424.github.io/AbsGS.github.io/>`_, 
+  `anti-aliasing <https://niujinshuchong.github.io/mip-splatting/>`_,
+  `3DGS-MCMC <https://ubc-vision.github.io/3dgs-mcmc/>`_ etc. And more to come!
 
 
 .. raw:: html

diff --git a/docs/source/tests/eval.rst b/docs/source/tests/eval.rst
@@ -3,17 +3,19 @@ Evaluation
 
 .. table:: Performance on `Mip-NeRF 360 Captures <https://jonbarron.info/mipnerf360/>`_ (Averaged Over 7 Scenes)
 
-    +------------+-------+-------+-------+------------------+------------+
-    |            | PSNR  | SSIM  | LPIPS | Train Mem        | Train Time |
-    +============+=======+=======+=======+==================+============+
-    | inria-7k   | 27.23 | 0.829 | 0.204 | 7.7 GB           | 6m05s      |
-    +------------+-------+-------+-------+------------------+------------+
-    | gsplat-7k  | 27.21 | 0.831 | 0.202 | **4.3GB**        | **5m35s**  |
-    +------------+-------+-------+-------+------------------+------------+
-    | inria-30k  | 28.95 | 0.870 | 0.138 | 9.0 GB           | 37m13s     |
-    +------------+-------+-------+-------+------------------+------------+
-    | gsplat-30k | 28.95 | 0.870 | 0.135 | **5.7 GB**       | **35m49s** |
-    +------------+-------+-------+-------+------------------+------------+
+    +---------------------+-------+-------+-------+------------------+------------+
+    |                     | PSNR  | SSIM  | LPIPS | Train Mem        | Train Time |
+    +=====================+=======+=======+=======+==================+============+
+    | inria-7k            | 27.23 | 0.829 | 0.204 | 7.7 GB           | 6m05s      |
+    +---------------------+-------+-------+-------+------------------+------------+
+    | gsplat-7k           | 27.21 | 0.831 | 0.202 | **4.3GB**        | **5m35s**  |
+    +---------------------+-------+-------+-------+------------------+------------+
+    | inria-30k           | 28.95 | 0.870 | 0.138 | 9.0 GB           | 37m13s     |
+    +---------------------+-------+-------+-------+------------------+------------+
+    | gsplat-30k (1 GPU)  | 28.95 | 0.870 | 0.135 | **5.7 GB**       | **35m49s** |
+    +---------------------+-------+-------+-------+------------------+------------+
+    | gsplat-30k (4 GPUs) | 28.91 | 0.871 | 0.135 | **2.0 GB**       | **11m28s** |
+    +---------------------+-------+-------+-------+------------------+------------+
 
 This repo comes with a standalone script (:code:`examples/simple_trainer.py`) that reproduces 
 the `Gaussian Splatting <https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/>`_ with
@@ -131,7 +133,7 @@ is different from what's reported in the original paper that uses
 :code:`from lpipsPyTorch import lpips`.
 
 The evaluation of `gsplat-X` can be reproduced with the command 
-:code:`cd examples; bash benchmark.sh` 
+:code:`cd examples; bash benchmarks/basic.sh` 
 within the gsplat repo (commit 6acdce4). 
 
 The evaluation of `inria-X` can be 

diff --git a/examples/benchmark.sh → examples/benchmarks/basic.sh b/examples/benchmark.sh → examples/benchmarks/basic.sh
@@ -11,14 +11,14 @@ do
     echo "Running $SCENE"
 
     # train without eval
-    python simple_trainer.py --eval_steps -1 --disable_viewer --data_factor $DATA_FACTOR \
+    CUDA_VISIBLE_DEVICES=0 python simple_trainer.py --eval_steps -1 --disable_viewer --data_factor $DATA_FACTOR \
         --data_dir data/360_v2/$SCENE/ \
         --result_dir $RESULT_DIR/$SCENE/
 
     # run eval and render
     for CKPT in $RESULT_DIR/$SCENE/ckpts/*;
     do
-        python simple_trainer.py --disable_viewer --data_factor $DATA_FACTOR \
+        CUDA_VISIBLE_DEVICES=0 python simple_trainer.py --disable_viewer --data_factor $DATA_FACTOR \
             --data_dir data/360_v2/$SCENE/ \
             --result_dir $RESULT_DIR/$SCENE/ \
             --ckpt $CKPT
@@ -30,7 +30,7 @@ for SCENE in bicycle bonsai counter garden kitchen room stump;
 do
     echo "=== Eval Stats ==="
 
-    for STATS in $RESULT_DIR/$SCENE/stats/val*;
+    for STATS in $RESULT_DIR/$SCENE/stats/val*.json;
     do  
         echo $STATS
         cat $STATS; 
@@ -39,7 +39,7 @@ do
 
     echo "=== Train Stats ==="
 
-    for STATS in $RESULT_DIR/$SCENE/stats/train*;
+    for STATS in $RESULT_DIR/$SCENE/stats/train*_rank0.json;
     do  
         echo $STATS
         cat $STATS; 

diff --git a/examples/benchmarks/basic_4gpus.sh b/examples/benchmarks/basic_4gpus.sh
@@ -0,0 +1,43 @@
+RESULT_DIR=results/benchmark_4gpus
+
+for SCENE in bicycle bonsai counter garden kitchen room stump;
+do
+    if [ "$SCENE" = "bicycle" ] || [ "$SCENE" = "stump" ] || [ "$SCENE" = "garden" ]; then
+        DATA_FACTOR=4
+    else
+        DATA_FACTOR=2
+    fi
+
+    echo "Running $SCENE"
+
+    # train and eval at the last step
+    CUDA_VISIBLE_DEVICES=0,1,2,3 python simple_trainer.py --eval_steps -1 --disable_viewer --data_factor $DATA_FACTOR \
+        # 4 GPUs is effectively 4x batch size so we scale down the steps by 4x as well.
+        # "--packed" reduces the data transfer between GPUs, which leads to faster training. 
+        --steps_scaler 0.25 --packed \
+        --data_dir data/360_v2/$SCENE/ \
+        --result_dir $RESULT_DIR/$SCENE/
+
+done
+
+
+for SCENE in bicycle bonsai counter garden kitchen room stump;
+do
+    echo "=== Eval Stats ==="
+
+    for STATS in $RESULT_DIR/$SCENE/stats/val_step7499.json;
+    do  
+        echo $STATS
+        cat $STATS; 
+        echo
+    done
+
+    echo "=== Train Stats ==="
+
+    for STATS in $RESULT_DIR/$SCENE/stats/train_step7499_rank0.json;
+    do  
+        echo $STATS
+        cat $STATS; 
+        echo
+    done
+done