Skip to content

Latest commit

 

History

History
84 lines (69 loc) · 3.22 KB

README.md

File metadata and controls

84 lines (69 loc) · 3.22 KB

Sapiens-Lite

⚡ Introduction

Sapiens-Lite is our optimized "inference-only" solution, offering:

  • Up to 4x faster inference
  • Minimal dependencies
  • Negligible accuracy loss

🚀 Getting Started

  • Set the sapiens_lite code root.

    export SAPIENS_LITE_ROOT=$SAPIENS_ROOT/lite
  • We support lite-inference for multiple GPU architectures, primarily in two modes.

    • MODE=torchscript: All GPUs with PyTorch2.2+. Inference at float32, slower but closest to original model performance.
    • MODE=bfloat16: Optimized mode for A100 GPUs with PyTorch2.2 or 2.3. Uses FlashAttention for accelerated inference. Coming Soon!
  • Note to Windows users: Please use the python scripts in ./demo instead of ./scripts.

  • Please download the checkpoints from hugging-face.
    Checkpoints are suffixed with "_$MODE.pt2".
    You can be selective about only downloading the checkpoints of interest.
    Set $SAPIENS_LITE_CHECKPOINT_ROOT to the path of sapiens_lite_host/$MODE. Checkpoint directory structure:

    sapiens_lite_host/
    ├── torchscript
        ├── pretrain/
        │   └── checkpoints/
        │       ├── sapiens_0.3b/
        │       ├── sapiens_0.6b/
        │       ├── sapiens_1b/
        │       └── sapiens_2b/
        ├── pose/
        └── seg/
        └── depth/
        └── normal/
    ├── bfloat16
        ├── pretrain/
        ├── pose/
        └── seg/
        └── depth/
        └── normal/
    

🔧 Installation

Set up the minimal sapiens_lite conda environment (pytorch >= 2.2):

conda create -n sapiens_lite python=3.10
conda activate sapiens_lite
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install opencv-python tqdm json-tricks

🌟 Sapiens-Lite Inference

Note: For inference in bfloat16 mode:

  • Outputs may result in slight variations from the original float32 predictions.
  • The first model run will autotune the model and print the log. Subsequent runs automatically load the tuned model.
  • Due to torch.compile warmup iterations, you'll observe better speedups with a larger number of images, thanks to amortization.

Available tasks:

⚙️ Converting Models to Lite

Obtain a torch.ExportedProgram or torchscript from the existing sapiens model checkpoint. Note, this requires the full-install sapiens conda env.

cd $SAPIENS_ROOT/scripts/[pretrain,pose,seg]/optimize/local
./[feature_extracter,keypoints*,seg,depth,normal]_optimizer.sh

For inference:

  • Use demo.AdhocImageDataset wrapped with a DataLoader for image fetching and preprocessing.\
  • Utilize the WorkerPool class for multiprocessing capabilities in tasks like saving predictions and visualizations.