Skip to content

Latest commit

 

History

History
68 lines (62 loc) · 2.39 KB

README.md

File metadata and controls

68 lines (62 loc) · 2.39 KB

Pre-Processing the Original Dataset

1. Download the data

train validation test
1 Mpx download download download
crc32 d677488a 72f13c3e 643e61ef
Gen1 download download download
crc32 3d23bd30 cc802022 cdd4fd69

2. Extract the tar files

The following directory structure is assumed:

data_dir
├── test
│   ├── ..._bbox.npy
│   ├── ..._td.dat.h5
│   ...
│
├── train
│   ├── ....npy
│   ├── ..._td.dat.h5
│   ...
│
└── val
    ├── ..._bbox.npy
    ├── ..._td.dat.h5
    ... 

3. Run the pre-processing script

${DATA_DIR} should point to the directory structure mentioned above. ${DEST_DIR} should point to the directory to which the data will be written.

For the 1 Mpx dataset:

NUM_PROCESSES=20  # set to the number of parallel processes to use
python preprocess_dataset.py ${DATA_DIR} ${DEST_DIR} conf_preprocess/representation/stacked_hist.yaml \
conf_preprocess/extraction/const_duration.yaml conf_preprocess/filter_gen4.yaml -ds gen4 -np ${NUM_PROCESSES}

For the Gen1 dataset:

NUM_PROCESSES=20  # set to the number of parallel processes to use
python preprocess_dataset.py ${DATA_DIR} ${DEST_DIR} conf_preprocess/representation/stacked_hist.yaml \
conf_preprocess/extraction/const_duration.yaml conf_preprocess/filter_gen1.yaml -ds gen1 -np ${NUM_PROCESSES}