This document aims to outline how to use the code in the repository herein on a general dataset (could be your own). For examples, see the additional code and documentation for the addition of the CMU Panoptic Studio dataset at mvn/datasets/cmu_preprocessing/README.md
Note that this document only entails the setup for a general dataset. The next steps are probably to test or train your dataset; you can consult the other documents for testing and training respectively.
There are actually 4 main parts that one is required to do before to fully do testing/training:
-
Generate a labels file (
.npy
) file containing all the necessary data the algorithm needs, as listed in the requirements section below. This is done using agenerate labels.py
file undermvn/datasets/<your_dataset>
, specific to the dataset and how the data is organised.Part of this label file generation will also include generating a consolidated
npy
file with the BBOX data. This may be done separately using another python script. -
Create a subclass of the pytorch
Dataset
class that loads information specific to your dataset, as organised in your npy labels file. This should be inmvn/datasets/
-
Create config files under the
experiments
folder that tell the algorithm how to handle your data. -
Updating the
train.py
(ordemo.py
) file
- Setup for a General Dataset
- 1. Generating the Labels
- 2. Dataset Subclass
- 3. Config Files
- 4. Modifying main algorithm files
For testing (and training), you will need the following data:
Preferably, the data should be organised similar to that of the CMU Panoptic Studio dataset, where the data is grouped by action/scene
> camera
> person
.
Specifically, it would be good if the data is organised as below. Of course, the data does not necessarily have to be in the exact format; you would just need to make the appropriate changes to the respective label generation and dataset subclass files.
$DIR_ROOT/[ACTION_NAME]/hdImgs/[CAMERA_ID]/[FRAME_ID].jpg
$DIR_ROOT/[ACTION_NAME]/calibration_[ACTION_NAME].json
The JSON data should have this format, with the camera IDs in their appropriate order, or labelled accordingly:
[
{
'id': 0, // optional
'R': [ /* 3x3 rotation matrix */ ],
'k': [ /* 3x3 calibration/instrinsics matrix */ ],
't': [ /* 3x1 translation matrix */ ],
'dist': [ /* 5x1 distortion coefficients */ ]
},
{
'id': 1, // optional
'R': [ /* 3x3 rotation matrix */ ],
'k': [ /* 3x3 calibration/instrinsics matrix */ ],
't': [ /* 3x1 translation matrix */ ],
'dist': [ /* 5x1 distortion coefficients */ ]
},
{
// ...
}
]
More information on distortion coefficients here.
$DIR_ROOT/[ACTION_NAME]/3DKeypoints_[FRAME_ID].json
The JSON data should have this notable format:
[
{
'id': [ /* PERSON_ID */ ],
'joints': [ /* ARRAY OF JOINT COORDINATES IN COCO 19 FORMAT */ ]
},
{
// ...
}
]
There are 2 inherent parts to this: the algorithm (MRCNN or SSD) to figure out the actual bounding boxes; and the generation of a labels file that consolidates the said data into a single labels file.
This repository does not contain any algorithm to detect persons in the scene. For now, you need to find your own. Popular algorithms include Mask-RCNN (MRCNN) and Single Shot Detectors (SSD). Current SOTA include Detectron2 and MM Detection.
UPDATE: I have used detectron to generate BBOXes for the CMU panoptic studio dance dataset here.
The data should ideally be organised by action/scene > camera ID > person ID with a JSON file containing an array of BBOXes in order of frame number.
A python script is needed to consolidate the bouding box labels. More information is given below
I have included a template python script for a dataset called ExampleDataset
which can be modified accordingly for your use. Parts which require attention have been marked with TODO
statements. The scripts are modified directly from the relevant files related to the CMU Panoptic Dataset; you can reference those scripts too.
Modify the ./mvn/datasets/example_preprocessing/collect-bboxes-npy.py
script. This script is used to generate an npy file that consolidates the BBOX data needed for the labels generation script. In the file are TODO
statements which will point out what needs to be changed, and where.
Modify the ./mvn/datasets/example_preprocessing/generate-labels-npy.py
script. This script is used to generate an npy file containing all the information needed for the dataset subclass python file to parse. In the file are TODO
statements which will point out what needs to be changed, and where.
In particular, note that if you only want to do testing (no training) and have no ground truth keypoint data, you have to remove the keypoints
field accordingly.
An example of the dataset subclass is found in ./mvn/datasets/example_dataset.py
. Just follow the TODO
comments in the example_dataset.py
file and modify accordingly.
There are also appropriate .yaml
config files to be found that require appropriate modification in ./experiments/example
folder that the above subclass file uses. Again, follow the TODO
comments in the respective YAML files. In particular, you need to update the file paths in the individual config files.
In particular, there is an example_frames.yaml
file which allows you to specify the train/val splits by action > person > frames. This is an optional file. The config files that you will most likely use are train/vol
.
Feel free to add more options into the config files, and then change the train.py
files accordingly.
After setting up your dataset subclass and config files, you need to let the train.py
file "know about" your new dataset. If you directly run your dataset config files, you will get a NotImplementedError
, which you need to fix by implementing your dataset. In this example, we are assuming that you dataset is named as example
; this name is set in the config files above.
Moreover, if you are testing using H36M as your pretrained weights, there may be some differences between your dataset and the pretrained one. This may require modifications to the other sections. For example, there may be differences in metric system (mm vs cm), or you may have different axes. See the issues here and here for reference.
NOTE: If know that you are only performing testing (i.e. you have no ground truth), then you should be modifying demo.py
instead.
The first thing you need to do is to import
your dataset subclass at the top of train.py
:
from mvn.datasets import human36m, cmupanoptic, example # name of your dataset
After that, you need to setup your dataset for loading. Under the setup_dataloaders
function, you need to add the following:
# Change according to name of dataset
elif config.dataset.kind == 'example':
train_dataloader, val_dataloader, train_sampler = setup_example_dataloaders(config, is_train, distributed_train)
Then, you will need to create the actual setup_example_dataloaders
function. This is a literal copy pasta of the setup_cmu_dataloaders
function, with modifications to the names of the dataset subclasses:
def setup_cmu_dataloaders(config, is_train, distributed_train):
train_dataloader = None
if is_train:
# train
train_dataset = example_dataset.ExampleDataset(
example_root=config.dataset.train.example_root,
# ...
)
# ...
# val
val_dataset = example_dataset.ExampleDataset(
example_root=config.dataset.train.example_root,
#...
)
# ...
return train_dataloader, val_dataloader, train_sampler
Note that it does not matter whether or not you have keypoint ground truth data, or whether you intend to use it for training or not. The code will just ignore it later accordingly.
It may be possible that your world coordinate system is different from that of the dataset used for the pretrained weights (Human 3.6M by default). Please refer to these issues here and here to check. If this is the case, you may need to change code in triangulation.py
.
In triangulation.py
, search for the comment # different world coordinates
or the variable self.transfer_cmu_to_human36m
. You should find a code similar to the following.
# transfer
if self.transfer_cmu_to_human36m or self.kind == "cmu": # different world coordinates
coord_volume = coord_volume.permute(0, 2, 1, 3)
inv_idx = torch.arange(coord_volume.shape[1] - 1, -1, -1).long().to(device)
coord_volume = coord_volume.index_select(1, inv_idx)
# print("Using different world coordinates")
Similary, you need to write code which changes the world coordinates if necessary, based on the self.kind
parameter set in the config file.