Sapiens-Lite is our optimized "inference-only" solution, offering:
- Up to 4x faster inference
- Minimal dependencies
- Negligible accuracy loss
-
Set the sapiens_lite code root.
export SAPIENS_LITE_ROOT=$SAPIENS_ROOT/lite
-
We support lite-inference for multiple GPU architectures, primarily in two modes.
MODE=torchscript
: All GPUs with PyTorch2.2+. Inference atfloat32
, slower but closest to original model performance.MODE=bfloat16
: Optimized mode for A100 GPUs with PyTorch2.2 or 2.3. Uses FlashAttention for accelerated inference. Coming Soon!
-
Note to Windows users: Please use the python scripts in
./demo
instead of./scripts
. -
Please download the checkpoints from hugging-face.
Checkpoints are suffixed with "_$MODE.pt2".
You can be selective about only downloading the checkpoints of interest.
Set$SAPIENS_LITE_CHECKPOINT_ROOT
to the path ofsapiens_lite_host/$MODE
. Checkpoint directory structure:sapiens_lite_host/ ├── torchscript ├── pretrain/ │ └── checkpoints/ │ ├── sapiens_0.3b/ │ ├── sapiens_0.6b/ │ ├── sapiens_1b/ │ └── sapiens_2b/ ├── pose/ └── seg/ └── depth/ └── normal/ ├── bfloat16 ├── pretrain/ ├── pose/ └── seg/ └── depth/ └── normal/
Set up the minimal sapiens_lite
conda environment (pytorch >= 2.2):
conda create -n sapiens_lite python=3.10
conda activate sapiens_lite
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install opencv-python tqdm json-tricks
Note: For inference in bfloat16
mode:
- Outputs may result in slight variations from the original
float32
predictions. - The first model run will
autotune
the model and print the log. Subsequent runs automatically load the tuned model. - Due to
torch.compile
warmup iterations, you'll observe better speedups with a larger number of images, thanks to amortization.
Available tasks:
Obtain a torch.ExportedProgram
or torchscript
from the existing sapiens model checkpoint. Note, this requires the full-install sapiens
conda env.
cd $SAPIENS_ROOT/scripts/[pretrain,pose,seg]/optimize/local
./[feature_extracter,keypoints*,seg,depth,normal]_optimizer.sh
For inference:
- Use
demo.AdhocImageDataset
wrapped with aDataLoader
for image fetching and preprocessing.\ - Utilize the
WorkerPool
class for multiprocessing capabilities in tasks like saving predictions and visualizations.