Skip to content

Commit

Permalink
Initial release
Browse files Browse the repository at this point in the history
  • Loading branch information
HaoyiZhu committed Oct 11, 2024
1 parent b711dd4 commit 8f0e46a
Show file tree
Hide file tree
Showing 116 changed files with 14,361 additions and 16 deletions.
343 changes: 341 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
[![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![license](https://img.shields.io/badge/License-MIT-green.svg?labelColor=gray)](https://github.com/ashleve/lightning-hydra-template#license)

[**Project Page**](https://haoyizhu.github.io/spa/) | [**Paper**](https://haoyizhu.github.io/spa/static/images/paper.pdf) | [**Arxiv**]() | [**HF Model**](https://huggingface.co/HaoyiZhu/SPA)
[**Project Page**](https://haoyizhu.github.io/spa/) | [**Paper**](https://haoyizhu.github.io/spa/static/images/paper.pdf) | [**arXiv**](https://arxiv.org/abs/2410.08208) | [**HuggingFace Model**](https://huggingface.co/HaoyiZhu/SPA) | [**Real-World Codebase**](https://github.com/HaoyiZhu/RealRobot)

[Haoyi Zhu](https://www.haoyizhu.site/), [Honghui Yang](https://hhyangcs.github.io/), [Yating Wang](https://scholar.google.com/citations?hl=zh-CN&user=5SuBWh0AAAAJ), [Jiange Yang](https://yangjiangeyjg.github.io/), [Liming Wang](https://wanglimin.github.io/), [Tong He](http://tonghe90.github.io/)
</div>
Expand All @@ -20,13 +20,352 @@

**SPA** is a novel representation learning framework that emphasizes the importance of **3D spatial awareness in embodied AI**. It leverages **differentiable neural rendering** on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date, covering **268 tasks** across **8 simulators** with diverse policies in both single-task and language-conditioned multi-task scenarios.

:partying_face: **NEWS**:

- *Oct. 2024:* Codebase and pre-trained checkpoints are released! Paper is available on [arXiv](https://arxiv.org/abs/2410.08208).

## :clipboard: Contents

- [Project Structure](#telescope-project-structure)
- [Installation](#installation)
- [Usage](#star2-usage)
- [Pre-Training](#rocket-pre-training)
- [SPA Large-Scale Evaluation](#bulb-spa-large-scale-evaluation)
- [Gotchas](#tada-gotchas)
- [License](#books-license)
- [Acknowledgement](#sparkles-acknowledgement)
- [Citation](#pencil-citation)

## :telescope: Project Structure

Our codebase draws significant inspiration from the excellent [Lightning Hydra Template](https://github.com/ashleve/lightning-hydra-template). The directory structure of this project is organized as follows:

<details>
<summary><b>Show directory structure</b></summary>

```
├── .github <- Github Actions workflows
├── configs <- Hydra configs
│ ├── callbacks <- Callbacks configs
│ ├── data <- Data configs
│ ├── debug <- Debugging configs
│ ├── experiment <- Experiment configs
│ ├── extras <- Extra utilities configs
│ ├── hydra <- Hydra configs
│ ├── local <- Local configs
│ ├── logger <- Logger configs
│ ├── model <- Model configs
│ ├── paths <- Project paths configs
│ ├── trainer <- Trainer configs
| |
│ └── train.yaml <- Main config for training
├── data <- Project data
├── logs <- Logs generated by hydra and lightning loggers
├── scripts <- Shell or Python scripts
|
├── spa <- Source code of SPA
│ ├── data <- Data scripts
│ ├── models <- Model scripts
│ ├── utils <- Utility scripts
│ │
│ └── train.py <- Run SPA pre-training
├── .gitignore <- List of files ignored by git
├── .project-root <- File for inferring the position of project root directory
├── requirements.txt <- File for installing python dependencies
├── setup.py <- File for installing project as a package
└── README.md
```

</details>

## :hammer: Installation
<details>
<summary><b>Basics</b></summary>

```bash
# clone project
git clone https://github.com/HaoyiZhu/SPA.git
cd SPA

# crerate conda environment
conda create -n spa python=3.11 -y
conda activate spa

# install PyTorch, please refer to https://pytorch.org/ for other CUDA versions
# e.g. cuda 11.8:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# install basic packages
pip3 install -r requirements.txt
```
</details>

<details>
<summary><b>SPA</b></summary>

```bash
# (optional) if you want to use SPA's volume decoder
cd libs/spa-ops
pip install -e .
cd ../..

# install SPA, so that you can import from anywhere
pip install -e .
```
</details>

## :star2: Usage

<details open>
<summary><b>Example of Using SPA Pre-trained Encoder </b></summary>

We provide pre-trained SPA weights for feature extraction. The checkpoints are available on [🤗Hugging Face](https://huggingface.co/HaoyiZhu/SPA). You don't need to manually download the weights, as SPA will automatically handle this if needed.

```python
import torch

from spa.models import spa_vit_base_patch16, spa_vit_large_patch16

image = torch.rand((1, 3, 224, 224)) # range in [0, 1]

# Example usage of SPA-Large (recommended)
# or you can use `spa_vit_base_patch16` for SPA-base
model = spa_vit_large_patch16(pretrained=True)
model.eval()

# Freeze the model
model.freeze()

# (Recommended) move to CUDA
image = image.cuda()
model = model.cuda()

# Obtain the [CLS] token
cls_token = model(image) # torch.Size([1, 1024])

# Obtain the reshaped feature map concatenated with [CLS] token
feature_map_cat_cls = model(
image, feature_map=True, cat_cls=True
) # torch.Size([1, 2048, 14, 14])

# Obtain the reshaped feature map without [CLS] token
feature_map_wo_cls = model(
image, feature_map=True, cat_cls=False
) # torch.Size([1, 1024, 14, 14])
```

> **Note:** The inputs will be automatically resized to `224 x 224` and normalized within the [SPA ViT encoder](spa/models/components/img_backbones/vit.py#L69).
</details>



## :rocket: Pre-Training

<details>
<summary><b>Example of Pre-Training on ScanNet </b></summary>

We give an example on pre-training SPA on the [ScanNet](http://www.scan-net.org/) v2 dataset.

1) Prepare the dataset
- Download the [ScanNet](http://www.scan-net.org/) v2 dataset.
- Pre-process and extract RGB-D images following [PonderV2](https://github.com/OpenGVLab/PonderV2/blob/main/docs/data_preparation.md#scannet-v2). The preprocessed data should be put under `data/scannet/`.
- Pre-generate metadata for fast data loading. The following command will generate metadata under `data/scannet/metadata`.
```bash
python scripts/generate_scannet_metadata.py
```

2) Run the following command for pre-training. Remember to modify hyper-parameters such as number of nodes and GPU devices according to your machines.
```bash
python spa/train.py experiment=spa_pretrain_vitl trainer.num_nodes=5 trainer.devices=8
```

</details>

## :bulb: SPA Large-Scale Evaluation

<details>
<summary><b>TBD</b></summary>

</details>

## :tada: Gotchas

<details>
<summary><b> Override any config parameter from command line </b></summary>

This codebase is based on [Hydra](https://github.com/facebookresearch/hydra), which allows for convenient configuration overriding:
```bash
python src/train.py trainer.max_epochs=20 seed=300
```
> **Note**: You can also add new parameters with `+` sign.
```bash
python src/train.py +some_new_param=some_new_value
```

</details>

<details>
<summary><b>Train on CPU, GPU, multi-GPU and TPU</b></summary>

```bash
# train on CPU
python src/train.py trainer=cpu
# train on 1 GPU
python src/train.py trainer=gpu
# train on TPU
python src/train.py +trainer.tpu_cores=8
# train with DDP (Distributed Data Parallel) (4 GPUs)
python src/train.py trainer=ddp trainer.devices=4
# train with DDP (Distributed Data Parallel) (8 GPUs, 2 nodes)
python src/train.py trainer=ddp trainer.devices=4 trainer.num_nodes=2
# simulate DDP on CPU processes
python src/train.py trainer=ddp_sim trainer.devices=2
# accelerate training on mac
python src/train.py trainer=mps
```

</details>

<details>
<summary><b>Train with mixed precision</b></summary>

```bash
# train with pytorch native automatic mixed precision (AMP)
python src/train.py trainer=gpu +trainer.precision=16
```

</details>

<details>
<summary><b>Use different tricks available in Pytorch Lightning</b></summary>

```yaml
# gradient clipping may be enabled to avoid exploding gradients
python src/train.py trainer.gradient_clip_val=0.5
# run validation loop 4 times during a training epoch
python src/train.py +trainer.val_check_interval=0.25
# accumulate gradients
python src/train.py trainer.accumulate_grad_batches=10
# terminate training after 12 hours
python src/train.py +trainer.max_time="00:12:00:00"
```

> **Note**: PyTorch Lightning provides about [40+ useful trainer flags](https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-flags).

</details>

<details>
<summary><b>Easily debug</b></summary>

```bash
# runs 1 epoch in default debugging mode
# changes logging directory to `logs/debugs/...`
# sets level of all command line loggers to 'DEBUG'
# enforces debug-friendly configuration
python src/train.py debug=default
# run 1 train, val and test loop, using only 1 batch
python src/train.py debug=fdr
# print execution time profiling
python src/train.py debug=profiler
# try overfitting to 1 batch
python src/train.py debug=overfit
# raise exception if there are any numerical anomalies in tensors, like NaN or +/-inf
python src/train.py +trainer.detect_anomaly=true
# use only 20% of the data
python src/train.py +trainer.limit_train_batches=0.2 \
+trainer.limit_val_batches=0.2 +trainer.limit_test_batches=0.2
```

> **Note**: Visit [configs/debug/](configs/debug/) for different debugging configs.

</details>

<details>
<summary><b>Resume training from checkpoint</b></summary>

```yaml
python src/train.py ckpt_path="/path/to/ckpt/name.ckpt"
```

> **Note**: Checkpoint can be either path or URL.

> **Note**: Currently loading ckpt doesn't resume logger experiment, but it will be supported in future Lightning release.
</details>
<details>
<summary><b>Create a sweep over hyperparameters</b></summary>
```bash
# this will run 9 experiments one after the other,
# each with different combination of seed and learning rate
python src/train.py -m seed=100,200,300 model.optimizer.lr=0.0001,0.00005,0.00001
```
> **Note**: Hydra composes configs lazily at job launch time. If you change code or configs after launching a job/sweep, the final composed configs might be impacted.
</details>
<details>
<summary><b>Execute all experiments from folder</b></summary>
```bash
python src/train.py -m 'exp_maniskill2_act_policy/maniskill2_task@maniskill2_task=glob(*)'
```
> **Note**: Hydra provides special syntax for controlling behavior of multiruns. Learn more [here](https://hydra.cc/docs/next/tutorials/basic/running_your_app/multi-run). The command above executes all task experiments from [configs/exp_maniskill2_act_policy/maniskill2_task](configs/experiment/).
</details>
<details>
<summary><b>Execute run for multiple different seeds</b></summary>
```bash
python src/train.py -m seed=100,200,300 trainer.deterministic=True
```
> **Note**: `trainer.deterministic=True` makes pytorch more deterministic but impacts the performance.
</details>
For more instructions, refer to the official documentation for [Pytorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), [Hydra](https://github.com/facebookresearch/hydra), and [Lightning Hydra Template](https://github.com/ashleve/lightning-hydra-template).
## :books: License
This repository is released under the [MIT license](LICENSE).
## :sparkles: Acknowledgement
Our work is primarily built upon [PonderV2](https://github.com/OpenGVLab/PonderV2), [UniPAD](https://github.com/Nightmare-n/UniPAD), [Pytorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), [Hydra](https://github.com/facebookresearch/hydra), [Lightning Hydra Template](https://github.com/ashleve/lightning-hydra-template), [RLBench](https://github.com/stepjam/RLBench), [PerAct](https://github.com/peract/peract), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), [Meta-Wolrd](https://github.com/Farama-Foundation/Metaworld), [ACT](https://github.com/tonyzhaozh/act), [Diffusion Policy](https://github.com/real-stanford/diffusion_policy), [DP3](https://github.com/YanjieZe/3D-Diffusion-Policy), [TIMM](https://github.com/huggingface/pytorch-image-models), [VC1](https://github.com/facebookresearch/eai-vc), [R3M](https://github.com/facebookresearch/r3m). We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.
Contact [Haoyi Zhu](https://www.haoyizhu.site/) if you have any questions or suggestions.
## :pencil: Citation
```bib
@article{zhu2024spa,
title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation},
author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong},
journal = {arXiv preprint},
journal = {arXiv preprint arxiv:2410.08208},
year = {2024},
}
```
1 change: 1 addition & 0 deletions configs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# this file is needed here to include configs when building project as a package
26 changes: 26 additions & 0 deletions configs/callbacks/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
defaults:
- model_checkpoint
- early_stopping
- model_summary
- rich_progress_bar
- lr_monitor
- device_stats_monitor
# - stochastic_weight_averaging
- _self_

model_checkpoint:
dirpath: ${paths.output_dir}/checkpoints
filename: "epoch_{epoch:03d}"
monitor: "val/loss"
mode: "min"
save_last: true
auto_insert_metric_name: false
save_top_k: 3

early_stopping:
monitor: "val/loss"
patience: 100
mode: "min"

model_summary:
max_depth: -1
Loading

0 comments on commit 8f0e46a

Please sign in to comment.