Skip to content

Large-Scale Pre-Training for Dual-Accelerometer Human Activity Recognition

License

Notifications You must be signed in to change notification settings

ntnu-ai-lab/SelfPAB

Repository files navigation

SelfPAB

Implementations of the SelfPAB, MonoSelfPAB, and LTA2V methods presented in our papers: "SelfPAB: large-scale pre-training on accelerometer data for human activity recognition", "Self-supervised Learning with Randomized Cross-sensor Masked Reconstruction for Human Activity Recognition", and "Long-term self-supervised learning for accelerometer-based sleep–wake recognition", respectively.

Preface: Access to HUNT4 pre-training data

The HUNT4 subset used for pre-training can be requested by contacting [email protected]. Ask for the data used in the paper "Self-supervised Learning with Randomized Cross-sensor Masked Reconstruction for Human Activity Recognition".

Requirements

git-lfs 3.4.1 Python 3.8.10

pip install -r requirements.txt

Note: Per default the script tries to use a GPU to run the trainings. If no supported GPU can be allocated, a MisconfigurationException will occur. All the training scripts can be executed on the CPU instead by setting the "NUM_GPUS" parameter in the corresponding config.yml files to an empty list:

NUM_GPUS: []

Download Datasets

Download the required datasets. Currently, the downstream datasets HARTH, HAR70+, PAMAP2, USC-HAD and DualSleep are supported. The HUNT4 pre-training data can be requested by contacting [email protected], see Preface.

python download_dataset.py <dataset_name>
# Example (HARTH): python download_dataset.py harth
# Example (HAR70+): python download_dataset.py har70plus
# Example (PAMAP2): python download_dataset.py pamap2
# Example (USC-HAD): python download_dataset.py uschad
# Example (DualSleep): python download_dataset.py dualsleep

This command will download the given dataset into the data/ folder.

Upstream Pre-training

After unzipping the pre-training HUNT4 data and storing it in data/hunt4, the upstream training can be started with:

python upstream.py -p <path/to/config.yml> -d <path/to/dataset>
# Example (SelfPAB): python upstream.py -p params/selfPAB_upstream/config.yml -d data/hunt4/
# Example (MonoSelfPAB): python upstream.py -p params/MonoSelfPAB_upstream/config.yml -d data/hunt4/

The params/selfPAB_upstream/config.yml, params/MonoSelfPAB_upstream/config.yml, and params/LTA2V_upstream/config.yml files contain all upstream model hyperparameters, presented in our papers, for SelfPAB, MonoSelfPAB, and LTA2V, respectively.

After upstream pre-training, the model with the best validation loss is stored as .ckpt file. This saved model can now be used as feature extractor for downstream training.

  • The parmas/selfPAB_upstream/upstream_model.ckpt is the transformer encoder, which we pre-trained on 100,000 hours of unlabeled HUNT4 data using the "masked reconstruction" auxiliary task, as presented in our paper: Large-Scale Pre-Training for Dual-Accelerometer Human Activity Recognition.
  • The parmas/MonoSelfPAB_upstream/upstream_model.ckpt is the transformer encoder, which we pre-trained on 100,000 hours of unlabeled HUNT4 data using the "randomized cross-sensor masked reconstruction (RCSMR)" auxiliary task, as presented in our paper: Self-supervised Learning with Randomized Cross-sensor Masked Reconstruction for Human Activity Recognition.
  • The parmas/LTA2V_upstream/upstream_model.ckpt is the transformer encoder, which we pre-trained on 821,700 hours of unlabeled HUNT4 data using the "masked reconstruction" auxiliary task with long-term (30 hours) spectrograms and global positional encoding, as presented in our paper: Long-term self-supervised learning for accelerometer-based sleep–wake recognition.

Downstream Training

A downstream leave-one-subject-out cross-validation (LOSO) / 5-fold cross-validation can be started with:

python loo_cross_validation.py -p <path/to/config.yml> -d <path/to/dataset>
# Example (SelfPAB, HARTH): python loo_cross_validation.py -p params/selfPAB_downstream_experiments/selfPAB_downstream_harth/config.yml -d data/harth/
# Example (MonoSelfPAB, HAR70+[Back]): python loo_cross_validation.py -p params/MonoSelfPAB_downstream_experiments/MonoSelfPAB_downstream_har70_B/config.yml -d data/har70plus/
# Example (MonoSelfPAB, PAMAP2): python loo_cross_validation.py -p params/MonoSelfPAB_downstream_experiments/MonoSelfPAB_downstream_pamap2/config.yml -d data/pamap2/
# Example (LTA2V, DualSleep): python loo_cross_validation.py -p params/LTA2V_downstream_experiments/LTA2V_downstream_DualSleep/config.yml -d data/dualsleep/

For HARTH, HAR70+, and DualSleep LOSOs are performed. For PAMAP2 and USC-HAD 5-fold cross-validations are performed with this command. The downstream config has an argument called "upstream_model_path", defining the upstream model to use. In the first example above the upstream model is set to be our pre-trained SelfPAB: parmas/selfPAB_upstream/upstream_model.ckpt. In the second example the upstream model is set to be our pre-trained MonoSelfPAB: parmas/MonoSelfPAB_upstream/upstream_model.ckpt. In the fourth example the upstream model is set to be our pre-trained LTA2V: parmas/LTA2V_upstream/upstream_model.ckpt.

Results Visualization

The training results/progress can be logged in Weights and Biases by setting the WANDB argument in the config.yml files to True. For downstream training, the WANDB_KEY has to be provided in the config.yml file as well.

The LOSO results are stored as .pkl files in a params subfolder called loso_cmats. These can be visualized using the loso_viz.ipynb script.

Embedding Generation

In case you are interested in investigating the latent representations generated by the respective upstream model, you can use the feature_extraction.py script:

python feature_extraction.py -p <path/to/downstream_config.yml> -d <path/to/dataset>
# Example (SelfPAB, HARTH): python feature_extraction.py -p params/selfPAB_downstream_experiments/selfPAB_downstream_harth/config.yml -d data/harth/

It will create the feature vectors of the given dataset batch-wise in a features.h5 file.

Citation

If you use the pre-trained SelfPAB upstream model for your research, please cite the following paper:

@article{logacjovSelfPABLargescalePretraining2024,
  title = {{{SelfPAB}}: Large-Scale Pre-Training on Accelerometer Data for Human Activity Recognition},
  shorttitle = {{{SelfPAB}}},
  author = {Logacjov, Aleksej and Herland, Sverre and Ustad, Astrid and Bach, Kerstin},
  year = {2024},
  month = mar,
  journal = {Applied Intelligence},
  issn = {1573-7497},
  doi = {10.1007/s10489-024-05322-3},
  urldate = {2024-03-29},
  keywords = {Accelerometer,Human activity recognition,Machine learning,Physical activity behavior,Self-supervised learning,Transformer},
}

If you use the pre-trained MonoSelfPAB upstream model for your research, please cite the following paper:

@article{logacjovSelfsupervisedLearningRandomized2024,
  title = {Self-Supervised Learning with Randomized Cross-Sensor Masked Reconstruction for Human Activity Recognition},
  author = {Logacjov, Aleksej and Bach, Kerstin},
  year = {2024},
  month = feb,
  journal = {Engineering Applications of Artificial Intelligence},
  volume = {128},
  pages = {107478},
  issn = {0952-1976},
  doi = {10.1016/j.engappai.2023.107478},
  urldate = {2023-11-15},
  keywords = {Accelerometer,Human activity recognition,Machine learning,Representation learning,Self-supervised learning,Transformer}
}

If you use the pre-trained LTA2V upstream model for your research, please cite the following paper:

@article{logacjovLongtermSelfsupervisedLearning2025,
  title = {Long-Term Self-Supervised Learning for Accelerometer-Based Sleep--Wake Recognition},
  author = {Logacjov, Aleksej and Bach, Kerstin and Mork, Paul Jarle},
  year = {2025},
  month = feb,
  journal = {Engineering Applications of Artificial Intelligence},
  volume = {141},
  pages = {109758},
  issn = {0952-1976},
  doi = {10.1016/j.engappai.2024.109758},
  urldate = {2024-12-09},
  keywords = {Accelerometer,Machine learning,Representation learning,Self-supervised learning,Sleep-wake recognition,Transformer}
}

About

Large-Scale Pre-Training for Dual-Accelerometer Human Activity Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published