Skip to content

Commit

Permalink
Support Multi-GPU training based on the paper "On Scaling Up 3D Gauss…
Browse files Browse the repository at this point in the history
…ian Splatting Training" (#253)

* checkin the code

* nicer API

* mcmc script now can works with multigpu

* trainer supports multi gpu

* get rid of reduduant code

* func doc

* support packed mode

* format

* more exp

* multi GPU viewer

* optim

* cleanup

* cleanup

* merge main

* MCMC

* doc

* scripts

* scripts and performance

---------

Co-authored-by: Ruilong Li <[email protected]>
  • Loading branch information
liruilong940607 and Ruilong Li authored Aug 3, 2024
1 parent 8a0e500 commit f92fd3f
Show file tree
Hide file tree
Showing 17 changed files with 1,364 additions and 425 deletions.
2 changes: 1 addition & 1 deletion EXPLORATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
| `--absgrad --grow_grad2d 2e-4` | 8m30s | 0.018s/im | 2.21 GB | 0.6251 | 20.68 | 0.587 | 0.89M |
| `--absgrad --grow_grad2d 2e-4` (30k) | -- | 0.030s/im | 5.25 GB | 0.7442 | 24.12 | 0.291 | 2.62M |

Note: default args means running `python simple_trainer.py --data_dir <DATA_DIR>` with:
Note: default args means running `CUDA_VISIBLE_DEVICES=0 python simple_trainer.py --data_dir <DATA_DIR>` with:

- Garden ([Source](https://jonbarron.info/mipnerf360/)): `--result_dir results/garden`
- U1 (a.k.a University 1 from [Source](https://localrf.github.io/)): `--result_dir results/u1 --data_factor 1 --grow_scale3d 0.001`
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ pip install -r requirements.txt
# download mipnerf_360 benchmark data
python datasets/download_dataset.py
# run batch evaluation
bash benchmark.sh
bash benchmarks/basic.sh
```

## Examples
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/colmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Simply run the script under `examples/`:

.. code-block:: bash
python simple_trainer.py \
CUDA_VISIBLE_DEVICES=0 python simple_trainer.py \
--data_dir data/360_v2/garden/ --data_factor 4 \
--result_dir ./results/garden
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/large_scale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The code for this example can be found under `examples/`:
.. code-block:: bash
# First train a 3DGS model
python simple_trainer.py \
CUDA_VISIBLE_DEVICES=0 python simple_trainer.py \
--data_dir data/360_v2/garden/ --data_factor 4 \
--result_dir ./results/garden
Expand Down
16 changes: 12 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,21 @@ Overview
Real-Time Rendering of Radiance Fields" :cite:p:`kerbl3Dgaussians`, but we've made *gsplat* even
faster, more memory efficient, and with a growing list of new features!

* *gsplat* is developed with efficiency in mind. Comparing to the `official implementation <https://github.com/graphdeco-inria/gaussian-splatting>`_, *gsplat* enables up to **4x less training memory footprint**, and up to **15% less training time** on Mip-NeRF 360 captures, and potential more on larger scenes. See :doc:`tests/eval` for details.
* *gsplat* is developed with efficiency in mind. Comparing to the `official implementation <https://github.com/graphdeco-inria/gaussian-splatting>`_,
*gsplat* enables up to **4x less training memory footprint**, and up to **15% less training time** on Mip-NeRF 360 captures, and potential more on larger scenes. See :doc:`tests/eval` for details.

* *gsplat* is designed to **support extremely large scene rendering**, which is magnitudes faster than the official CUDA backend `diff-gaussian-rasterization <https://github.com/graphdeco-inria/diff-gaussian-rasterization>`_. See :doc:`examples/large_scale` for an example.
* *gsplat* is designed to **support extremely large scene rendering**, which is magnitudes
faster than the official CUDA backend `diff-gaussian-rasterization <https://github.com/graphdeco-inria/diff-gaussian-rasterization>`_. See :doc:`examples/large_scale` for an example.

* *gsplat* offers many extra features, including **batch rasterization**, **N-D feature rendering**, **depth rendering**, **sparse gradient** etc. See :doc:`apis/rasterization` for details.
* *gsplat* offers many extra features, including **batch rasterization**,
**N-D feature rendering**, **depth rendering**, **sparse gradient**,
**multi-GPU distributed rasterization**
etc. See :doc:`apis/rasterization` for details.

* *gsplat* is equipped with the **latest and greatest** 3D Gaussian Splatting techniques, including `absgrad <https://ty424.github.io/AbsGS.github.io/>`_, `anti-aliasing <https://niujinshuchong.github.io/mip-splatting/>`_ etc. And more to come!
* *gsplat* is equipped with the **latest and greatest** 3D Gaussian Splatting techniques,
including `absgrad <https://ty424.github.io/AbsGS.github.io/>`_,
`anti-aliasing <https://niujinshuchong.github.io/mip-splatting/>`_,
`3DGS-MCMC <https://ubc-vision.github.io/3dgs-mcmc/>`_ etc. And more to come!


.. raw:: html
Expand Down
26 changes: 14 additions & 12 deletions docs/source/tests/eval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,19 @@ Evaluation

.. table:: Performance on `Mip-NeRF 360 Captures <https://jonbarron.info/mipnerf360/>`_ (Averaged Over 7 Scenes)

+------------+-------+-------+-------+------------------+------------+
| | PSNR | SSIM | LPIPS | Train Mem | Train Time |
+============+=======+=======+=======+==================+============+
| inria-7k | 27.23 | 0.829 | 0.204 | 7.7 GB | 6m05s |
+------------+-------+-------+-------+------------------+------------+
| gsplat-7k | 27.21 | 0.831 | 0.202 | **4.3GB** | **5m35s** |
+------------+-------+-------+-------+------------------+------------+
| inria-30k | 28.95 | 0.870 | 0.138 | 9.0 GB | 37m13s |
+------------+-------+-------+-------+------------------+------------+
| gsplat-30k | 28.95 | 0.870 | 0.135 | **5.7 GB** | **35m49s** |
+------------+-------+-------+-------+------------------+------------+
+---------------------+-------+-------+-------+------------------+------------+
| | PSNR | SSIM | LPIPS | Train Mem | Train Time |
+=====================+=======+=======+=======+==================+============+
| inria-7k | 27.23 | 0.829 | 0.204 | 7.7 GB | 6m05s |
+---------------------+-------+-------+-------+------------------+------------+
| gsplat-7k | 27.21 | 0.831 | 0.202 | **4.3GB** | **5m35s** |
+---------------------+-------+-------+-------+------------------+------------+
| inria-30k | 28.95 | 0.870 | 0.138 | 9.0 GB | 37m13s |
+---------------------+-------+-------+-------+------------------+------------+
| gsplat-30k (1 GPU) | 28.95 | 0.870 | 0.135 | **5.7 GB** | **35m49s** |
+---------------------+-------+-------+-------+------------------+------------+
| gsplat-30k (4 GPUs) | 28.91 | 0.871 | 0.135 | **2.0 GB** | **11m28s** |
+---------------------+-------+-------+-------+------------------+------------+

This repo comes with a standalone script (:code:`examples/simple_trainer.py`) that reproduces
the `Gaussian Splatting <https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/>`_ with
Expand Down Expand Up @@ -131,7 +133,7 @@ is different from what's reported in the original paper that uses
:code:`from lpipsPyTorch import lpips`.

The evaluation of `gsplat-X` can be reproduced with the command
:code:`cd examples; bash benchmark.sh`
:code:`cd examples; bash benchmarks/basic.sh`
within the gsplat repo (commit 6acdce4).

The evaluation of `inria-X` can be
Expand Down
8 changes: 4 additions & 4 deletions examples/benchmark.sh → examples/benchmarks/basic.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ do
echo "Running $SCENE"

# train without eval
python simple_trainer.py --eval_steps -1 --disable_viewer --data_factor $DATA_FACTOR \
CUDA_VISIBLE_DEVICES=0 python simple_trainer.py --eval_steps -1 --disable_viewer --data_factor $DATA_FACTOR \
--data_dir data/360_v2/$SCENE/ \
--result_dir $RESULT_DIR/$SCENE/

# run eval and render
for CKPT in $RESULT_DIR/$SCENE/ckpts/*;
do
python simple_trainer.py --disable_viewer --data_factor $DATA_FACTOR \
CUDA_VISIBLE_DEVICES=0 python simple_trainer.py --disable_viewer --data_factor $DATA_FACTOR \
--data_dir data/360_v2/$SCENE/ \
--result_dir $RESULT_DIR/$SCENE/ \
--ckpt $CKPT
Expand All @@ -30,7 +30,7 @@ for SCENE in bicycle bonsai counter garden kitchen room stump;
do
echo "=== Eval Stats ==="

for STATS in $RESULT_DIR/$SCENE/stats/val*;
for STATS in $RESULT_DIR/$SCENE/stats/val*.json;
do
echo $STATS
cat $STATS;
Expand All @@ -39,7 +39,7 @@ do

echo "=== Train Stats ==="

for STATS in $RESULT_DIR/$SCENE/stats/train*;
for STATS in $RESULT_DIR/$SCENE/stats/train*_rank0.json;
do
echo $STATS
cat $STATS;
Expand Down
43 changes: 43 additions & 0 deletions examples/benchmarks/basic_4gpus.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
RESULT_DIR=results/benchmark_4gpus

for SCENE in bicycle bonsai counter garden kitchen room stump;
do
if [ "$SCENE" = "bicycle" ] || [ "$SCENE" = "stump" ] || [ "$SCENE" = "garden" ]; then
DATA_FACTOR=4
else
DATA_FACTOR=2
fi

echo "Running $SCENE"

# train and eval at the last step
CUDA_VISIBLE_DEVICES=0,1,2,3 python simple_trainer.py --eval_steps -1 --disable_viewer --data_factor $DATA_FACTOR \
# 4 GPUs is effectively 4x batch size so we scale down the steps by 4x as well.
# "--packed" reduces the data transfer between GPUs, which leads to faster training.
--steps_scaler 0.25 --packed \
--data_dir data/360_v2/$SCENE/ \
--result_dir $RESULT_DIR/$SCENE/

done


for SCENE in bicycle bonsai counter garden kitchen room stump;
do
echo "=== Eval Stats ==="

for STATS in $RESULT_DIR/$SCENE/stats/val_step7499.json;
do
echo $STATS
cat $STATS;
echo
done

echo "=== Train Stats ==="

for STATS in $RESULT_DIR/$SCENE/stats/train_step7499_rank0.json;
do
echo $STATS
cat $STATS;
echo
done
done
Loading

0 comments on commit f92fd3f

Please sign in to comment.