This repository is no longer maintained. Please refer the latest version.
For the RAP dataset, please contact Dangwei Li ([email protected]).
By Ken Yu, under guidance of Dr. Zhang Zhang and Prof. Kaiqi Huang.
Weakly-supervised Pedestrian Attribute Localization Network (WPAL-network) is a Convolutional Neural Network (CNN) structure designed for recognizing attributes from objects as well as localizing them. Currently it is developed to recognize attributes from pedestrians only, using the Richly Annotated Pedestrian (RAP) database or PETA database.
-
Clone this repository
# Make sure to clone with --recursive git clone --recursive https://github.com/kyu-sz/Weakly-supervised-Pedestrian-Attribute-Localization-Network.git
-
Build Caffe and pycaffe
This project use python layers for input, etc. When building Caffe, set the WITH_PYTHON_LAYER option to true.
WITH_PYTHON_LAYER=1 make all pycaffe -j 8
-
Download the RAP database
To get the Richly Annotated Pedestrian (RAP) database, please visit rap.idealtest.org to learn about how to download a copy of it.
It should have two zip files.
$RAP/RAP_annotation.zip $RAP/RAP_dataset.zip
-
Unzip them both to the directory.
cd $RAP unzip RAP_annotation.zip unzip RAP_dataset.zip
-
Create symlinks for the RAP database
cd $WPAL_NET_ROOT/data/dataset/ ln -s $RAP RAP
To train the model, first fetch a pretrained VGG_CNN_S model by:
./data/scripts/fetch_pretrained_vgg_cnn_s_model.sh
Then run experiment script for training:
./experiments/examples/VGG_CNN_S/train_vgg_s_rap_0.sh
Experiment script for testing is also available:
./experiments/examples/VGG_CNN_S/test_vgg_s_rap.sh
The project layout and some codes are derived from Mr. Ross Girshick's py-faster-rcnn.
We use VGG_CNN_S as pretrained model. Information can be found on Mr. K. Simonyan's Gist. It is from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets":
Return of the Devil in the Details: Delving Deep into Convolutional Nets
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman
British Machine Vision Conference, 2014 (arXiv ref. cs1405.3531)