Initial release

HaoyiZhu · Oct 11, 2024 · 8f0e46a · 8f0e46a
1 parent b711dd4
commit 8f0e46a
Show file tree

Hide file tree

Showing 116 changed files with 14,361 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@
 [![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
 [![license](https://img.shields.io/badge/License-MIT-green.svg?labelColor=gray)](https://github.com/ashleve/lightning-hydra-template#license)
 
-[**Project Page**](https://haoyizhu.github.io/spa/) | [**Paper**](https://haoyizhu.github.io/spa/static/images/paper.pdf) | [**Arxiv**]() | [**HF Model**](https://huggingface.co/HaoyiZhu/SPA)
+[**Project Page**](https://haoyizhu.github.io/spa/) | [**Paper**](https://haoyizhu.github.io/spa/static/images/paper.pdf) | [**arXiv**](https://arxiv.org/abs/2410.08208) | [**HuggingFace Model**](https://huggingface.co/HaoyiZhu/SPA) | [**Real-World Codebase**](https://github.com/HaoyiZhu/RealRobot)
 
 [Haoyi Zhu](https://www.haoyizhu.site/), [Honghui Yang](https://hhyangcs.github.io/), [Yating Wang](https://scholar.google.com/citations?hl=zh-CN&user=5SuBWh0AAAAJ),  [Jiange Yang](https://yangjiangeyjg.github.io/), [Liming Wang](https://wanglimin.github.io/), [Tong He](http://tonghe90.github.io/)
 </div>
@@ -20,13 +20,352 @@
 
 **SPA** is a novel representation learning framework that emphasizes the importance of **3D spatial awareness in embodied AI**. It leverages **differentiable neural rendering** on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date, covering **268 tasks** across **8 simulators** with diverse policies in both single-task and language-conditioned multi-task scenarios.
 
+:partying_face: **NEWS**: 
+
+- *Oct. 2024:* Codebase and pre-trained checkpoints are released! Paper is available on [arXiv](https://arxiv.org/abs/2410.08208).
+
+## :clipboard: Contents
+
+- [Project Structure](#telescope-project-structure)
+- [Installation](#installation)
+- [Usage](#star2-usage)
+- [Pre-Training](#rocket-pre-training)
+- [SPA Large-Scale Evaluation](#bulb-spa-large-scale-evaluation)
+- [Gotchas](#tada-gotchas)
+- [License](#books-license)
+- [Acknowledgement](#sparkles-acknowledgement)
+- [Citation](#pencil-citation)
+
+## :telescope: Project Structure
+
+Our codebase draws significant inspiration from the excellent [Lightning Hydra Template](https://github.com/ashleve/lightning-hydra-template). The directory structure of this project is organized as follows:
+
+<details>
+<summary><b>Show directory structure</b></summary>
+
+```
+├── .github                   <- Github Actions workflows
+│
+├── configs                   <- Hydra configs
+│   ├── callbacks                         <- Callbacks configs
+│   ├── data                              <- Data configs
+│   ├── debug                             <- Debugging configs
+│   ├── experiment                        <- Experiment configs
+│   ├── extras                            <- Extra utilities configs
+│   ├── hydra                             <- Hydra configs
+│   ├── local                             <- Local configs
+│   ├── logger                            <- Logger configs
+│   ├── model                             <- Model configs
+│   ├── paths                             <- Project paths configs
+│   ├── trainer                           <- Trainer configs
+|   |
+│   └── train.yaml            <- Main config for training
+│
+├── data                   <- Project data
+│
+├── logs                   <- Logs generated by hydra and lightning loggers
+│
+├── scripts                <- Shell or Python scripts
+|
+├── spa                    <- Source code of SPA
+│   ├── data                     <- Data scripts
+│   ├── models                   <- Model scripts
+│   ├── utils                    <- Utility scripts
+│   │
+│   └── train.py                 <- Run SPA pre-training
+│
+├── .gitignore                <- List of files ignored by git
+├── .project-root             <- File for inferring the position of project root directory
+├── requirements.txt          <- File for installing python dependencies
+├── setup.py                  <- File for installing project as a package
+└── README.md
+```
+
+</details>
+
+## :hammer: Installation
+<details>
+<summary><b>Basics</b></summary>
+
+```bash
+# clone project
+git clone https://github.com/HaoyiZhu/SPA.git
+cd SPA
+
+# crerate conda environment
+conda create -n spa python=3.11 -y
+conda activate spa
+
+# install PyTorch, please refer to https://pytorch.org/ for other CUDA versions
+# e.g. cuda 11.8:
+pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+# install basic packages
+pip3 install -r requirements.txt
+```
+</details>
+
+<details>
+<summary><b>SPA</b></summary>
+
+```bash
+# (optional) if you want to use SPA's volume decoder
+cd libs/spa-ops
+pip install -e .
+cd ../..
+
+# install SPA, so that you can import from anywhere
+pip install -e .
+```
+</details>
+
+## :star2: Usage
+
+<details open>
+  <summary><b>Example of Using SPA Pre-trained Encoder </b></summary>
+
+We provide pre-trained SPA weights for feature extraction. The checkpoints are available on [🤗Hugging Face](https://huggingface.co/HaoyiZhu/SPA). You don't need to manually download the weights, as SPA will automatically handle this if needed.
+
+```python
+import torch
+
+from spa.models import spa_vit_base_patch16, spa_vit_large_patch16
+
+image = torch.rand((1, 3, 224, 224))  # range in [0, 1]
+
+# Example usage of SPA-Large (recommended)
+# or you can use `spa_vit_base_patch16` for SPA-base
+model = spa_vit_large_patch16(pretrained=True)
+model.eval()
+
+# Freeze the model
+model.freeze()
+
+# (Recommended) move to CUDA
+image = image.cuda()
+model = model.cuda()
+
+# Obtain the [CLS] token
+cls_token = model(image)  # torch.Size([1, 1024])
+
+# Obtain the reshaped feature map concatenated with [CLS] token
+feature_map_cat_cls = model(
+    image, feature_map=True, cat_cls=True
+)  # torch.Size([1, 2048, 14, 14])
+
+# Obtain the reshaped feature map without [CLS] token
+feature_map_wo_cls = model(
+    image, feature_map=True, cat_cls=False
+)  # torch.Size([1, 1024, 14, 14])
+```
+
+> **Note:** The inputs will be automatically resized to `224 x 224` and normalized within the [SPA ViT encoder](spa/models/components/img_backbones/vit.py#L69).
+
+</details>
+
+
+
+## :rocket: Pre-Training
+
+<details>
+  <summary><b>Example of Pre-Training on ScanNet </b></summary>
+
+We give an example on pre-training SPA on the [ScanNet](http://www.scan-net.org/) v2 dataset.
+
+1) Prepare the dataset
+    - Download the [ScanNet](http://www.scan-net.org/) v2 dataset.
+    - Pre-process and extract RGB-D images following [PonderV2](https://github.com/OpenGVLab/PonderV2/blob/main/docs/data_preparation.md#scannet-v2). The preprocessed data should be put under `data/scannet/`.
+    - Pre-generate metadata for fast data loading. The following command will generate metadata under `data/scannet/metadata`.
+        ```bash
+        python scripts/generate_scannet_metadata.py
+        ```
+
+2) Run the following command for pre-training. Remember to modify hyper-parameters such as number of nodes and GPU devices according to your machines.
+    ```bash
+    python spa/train.py experiment=spa_pretrain_vitl trainer.num_nodes=5 trainer.devices=8
+    ```
+
+</details>
+
+## :bulb: SPA Large-Scale Evaluation
+
+<details>
+  <summary><b>TBD</b></summary>
+
+</details>
+
+## :tada: Gotchas
+
+<details>
+<summary><b> Override any config parameter from command line </b></summary>
+
+This codebase is based on [Hydra](https://github.com/facebookresearch/hydra), which allows for convenient configuration overriding:
+```bash
+python src/train.py trainer.max_epochs=20 seed=300
+```
+> **Note**: You can also add new parameters with `+` sign.
+```bash
+python src/train.py +some_new_param=some_new_value
+```
+
+</details>
+
+<details>
+<summary><b>Train on CPU, GPU, multi-GPU and TPU</b></summary>
+
+```bash
+# train on CPU
+python src/train.py trainer=cpu
+
+# train on 1 GPU
+python src/train.py trainer=gpu
+
+# train on TPU
+python src/train.py +trainer.tpu_cores=8
+
+# train with DDP (Distributed Data Parallel) (4 GPUs)
+python src/train.py trainer=ddp trainer.devices=4
+
+# train with DDP (Distributed Data Parallel) (8 GPUs, 2 nodes)
+python src/train.py trainer=ddp trainer.devices=4 trainer.num_nodes=2
+
+# simulate DDP on CPU processes
+python src/train.py trainer=ddp_sim trainer.devices=2
+
+# accelerate training on mac
+python src/train.py trainer=mps
+```
+
+</details>
+
+<details>
+<summary><b>Train with mixed precision</b></summary>
+
+```bash
+# train with pytorch native automatic mixed precision (AMP)
+python src/train.py trainer=gpu +trainer.precision=16
+```
+
+</details>
+
+<details>
+<summary><b>Use different tricks available in Pytorch Lightning</b></summary>
+
+```yaml
+# gradient clipping may be enabled to avoid exploding gradients
+python src/train.py trainer.gradient_clip_val=0.5
+
+# run validation loop 4 times during a training epoch
+python src/train.py +trainer.val_check_interval=0.25
+
+# accumulate gradients
+python src/train.py trainer.accumulate_grad_batches=10
+
+# terminate training after 12 hours
+python src/train.py +trainer.max_time="00:12:00:00"
+```
+
+> **Note**: PyTorch Lightning provides about [40+ useful trainer flags](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-flags).
+
+</details>
+
+<details>
+<summary><b>Easily debug</b></summary>
+
+```bash
+# runs 1 epoch in default debugging mode
+# changes logging directory to `logs/debugs/...`
+# sets level of all command line loggers to 'DEBUG'
+# enforces debug-friendly configuration
+python src/train.py debug=default
+
+# run 1 train, val and test loop, using only 1 batch
+python src/train.py debug=fdr
+
+# print execution time profiling
+python src/train.py debug=profiler
+
+# try overfitting to 1 batch
+python src/train.py debug=overfit
+
+# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
+python src/train.py +trainer.detect_anomaly=true
+
+# use only 20% of the data
+python src/train.py +trainer.limit_train_batches=0.2 \
++trainer.limit_val_batches=0.2 +trainer.limit_test_batches=0.2
+```
+
+> **Note**: Visit [configs/debug/](configs/debug/) for different debugging configs.
+
+</details>
+
+<details>
+<summary><b>Resume training from checkpoint</b></summary>
+
+```yaml
+python src/train.py ckpt_path="/path/to/ckpt/name.ckpt"
+```
+
+> **Note**: Checkpoint can be either path or URL.
+
+> **Note**: Currently loading ckpt doesn't resume logger experiment, but it will be supported in future Lightning release.
+
+</details>
+
+<details>
+<summary><b>Create a sweep over hyperparameters</b></summary>
+
+```bash
+# this will run 9 experiments one after the other,
+# each with different combination of seed and learning rate
+python src/train.py -m seed=100,200,300 model.optimizer.lr=0.0001,0.00005,0.00001
+```
+
+> **Note**: Hydra composes configs lazily at job launch time. If you change code or configs after launching a job/sweep, the final composed configs might be impacted.
+
+</details>
+
+<details>
+<summary><b>Execute all experiments from folder</b></summary>
+
+```bash
+python src/train.py -m 'exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=glob(*)'
+```
+
+> **Note**: Hydra provides special syntax for controlling behavior of multiruns. Learn more [here](https://hydra.cc/docs/next/tutorials/basic/running_your_app/multi-run). The command above executes all task experiments from [configs/exp_maniskill2_act_policy/maniskill2_task](configs/experiment/).
+
+</details>
+
+<details>
+<summary><b>Execute run for multiple different seeds</b></summary>
+
+```bash
+python src/train.py -m seed=100,200,300 trainer.deterministic=True
+```
+
+> **Note**: `trainer.deterministic=True` makes pytorch more deterministic but impacts the performance.
+
+</details>
+
+For more instructions, refer to the official documentation for [Pytorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), [Hydra](https://github.com/facebookresearch/hydra), and [Lightning Hydra Template](https://github.com/ashleve/lightning-hydra-template).
+
+## :books: License
+
+This repository is released under the [MIT license](LICENSE).
+
+## :sparkles: Acknowledgement
+
+Our work is primarily built upon [PonderV2](https://github.com/OpenGVLab/PonderV2), [UniPAD](https://github.com/Nightmare-n/UniPAD), [Pytorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), [Hydra](https://github.com/facebookresearch/hydra), [Lightning Hydra Template](https://github.com/ashleve/lightning-hydra-template), [RLBench](https://github.com/stepjam/RLBench), [PerAct](https://github.com/peract/peract), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), [Meta-Wolrd](https://github.com/Farama-Foundation/Metaworld), [ACT](https://github.com/tonyzhaozh/act), [Diffusion Policy](https://github.com/real-stanford/diffusion_policy), [DP3](https://github.com/YanjieZe/3D-Diffusion-Policy), [TIMM](https://github.com/huggingface/pytorch-image-models), [VC1](https://github.com/facebookresearch/eai-vc), [R3M](https://github.com/facebookresearch/r3m). We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.
+
+Contact [Haoyi Zhu](https://www.haoyizhu.site/) if you have any questions or suggestions.
+
 ## :pencil: Citation
 
 ```bib
 @article{zhu2024spa,
     title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation},
     author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong},
-    journal = {arXiv preprint},
+    journal = {arXiv preprint arxiv:2410.08208},
     year = {2024},
 }
 ```
diff --git a/configs/__init__.py b/configs/__init__.py
@@ -0,0 +1 @@
+# this file is needed here to include configs when building project as a package
diff --git a/configs/callbacks/default.yaml b/configs/callbacks/default.yaml
@@ -0,0 +1,26 @@
+defaults:
+  - model_checkpoint
+  - early_stopping
+  - model_summary
+  - rich_progress_bar
+  - lr_monitor
+  - device_stats_monitor
+  # - stochastic_weight_averaging
+  - _self_
+
+model_checkpoint:
+  dirpath: ${paths.output_dir}/checkpoints
+  filename: "epoch_{epoch:03d}"
+  monitor: "val/loss"
+  mode: "min"
+  save_last: true
+  auto_insert_metric_name: false
+  save_top_k: 3
+
+early_stopping:
+  monitor: "val/loss"
+  patience: 100
+  mode: "min"
+
+model_summary:
+  max_depth: -1
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		# this file is needed here to include configs when building project as a package