CUHK03 is the first person re-identification dataset that is large enough for deep learning. It provides the bounding boxes detected from deformable part models (DPM) and manually labeling. This dataset was published in 2014 by this article. The original file of this dataset in .mat format can be downloaded here.
Images of this database in .jpg format can be downloaded here.
This dataset is collected from The Chinese University of Hong Kong (CUHK) campus. The data is stored in a matlab file named cuhk03.mat.The CUHK03 dataset includes 14,097 images of 1,467 pedestrians.
Each identity is observed by 2 cameras and has 4.8 images on average. Two types of person images are provided: manually labelled pedestrian bounding boxes (labelled) and bounding boxes automatically detected by the deformable-part-model detector (detected). The manually labelled images generally are of higher quality than those detected images.
Because we want to use this dataset in Tensorflow/Keras, we need to convert .mat data to ".jpg" file.
The following image shows the cells of the cuhk03.mat file:
- detected: bounding boxes automatically detected by the deformable-part-model detector
- labeled: manually labelled pedestrian bounding boxes.
- testsets: 20 random testset settings.
The following image is the structure detected:
For example, 843 means the number of identities. 10 is the number of frames.
The labeled structure is similar to detected and the difference in image size is not important. The following image is the structure labeled:
The structure of testsets as shown below:
There are 20 different testset settings in it. Each setting contains 100 random identities. Each of these 20 settings is stored in a separate .csv file.