HPC-Cluster-ML-Workflow

This template provides a structured workflow tailored for audio machine learning research on the HPC Cluster of ZECM at TU Berlin. It was developed for projects that require continuous management of multiple experiments to ensure high reproducibility and reliability of results. By incorporating tools such as DVC, Docker, and TensorBoard, the template not only enhances reproducibility but also provides a robust framework for effective collaboration and seamless sharing of experiments.

Features

Reproducible Experiments:
- Tracks all dependencies, configurations, and artifacts to ensure experiments can be easily reproduced and shared.
- Uses containerization to maintain consistency across different systems.
Resource Optimization:
- Reuses unchanged stages to avoid redundant computations, speeding up workflows and conserving resources.
Automation:
- Reduces manual tasks through automated builds, data pipelines, and syncing, allowing you to focus on research.
HPC Integration:
- Extends DVC for multi-node parallel experiments, optimizing HPC resource utilization.
- Supports Docker for development, with automated conversion to Singularity for seamless HPC deployment.
TensorBoard Integration:
- Provides visualization and comparison of DVC experiments with audio logging support of TensorBoard.
- Enables real-time monitoring and quick decisions on underperforming runs.

Overview

The table below summarizes the key tools involved in the HPC-Cluster-ML-Workflow, detailing their primary roles and providing links to their official documentation for further reference.

Tool	Role	Documentation
Git	Version control for code.	Git Docs
DVC	Data version control and pipeline management.	DVC Docs
TensorBoard	DVC experiment visualization and monitoring.	TensorBoard Docs
Docker	Containerization for development, converted to Singularity for HPC.	Docker Docs
Singularity	HPC-compatible containerization tool.	Singularity Docs
SLURM	Job scheduling and workload management on the HPC-Cluster.	SLURM Docs

System Transfer

The figure below offers a simplified overview of how data is transferred between systems. While some of the commands depicted are automated by the provided workflows, the visualization is intended for comprehension and not as a direct usage reference.

Prerequisites

macOS, Windows or Linux operating system.
Access to an HPC Cluster with SLURM-sheduler.
Local Python installation.
Familiarity with Git, DVC, and Docker.
Docker Hub account.

Setup

Follow the setup instructions below for step-by-step guidance on configuring this template repository, which offers a basic PyTorch project that you can customize, reuse, or reference for your pipeline implementation.

Setup Instructions

Usage

Once the setup is complete, you can begin using the setup by referring to the User Guide provided. This guide will help you to understand how to develop, initiate experiments and monitor your training processes.

User Guide

Contributors

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE.

References

Schulz, F. [faressc]. (n.d.). Guitar LSTM [pytorch-version]. GitHub. Link

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data/processed		data/processed
docs		docs
logs		logs
model		model
source		source
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
CompressionRate.py		CompressionRate.py
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
exp_workflow.sh		exp_workflow.sh
global.env		global.env
multi_submission.py		multi_submission.py
params.yaml		params.yaml
requirements.txt		requirements.txt
slurm_job.sh		slurm_job.sh
sync_logs.sh		sync_logs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPC-Cluster-ML-Workflow

Features

Overview

System Transfer

Prerequisites

Setup

Usage

Contributors

License

References

About

Releases

Packages

Contributors 2

Languages

License

tu-studio/neural-reverb-emulation

Folders and files

Latest commit

History

Repository files navigation

HPC-Cluster-ML-Workflow

Features

Overview

System Transfer

Prerequisites

Setup

Usage

Contributors

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages