Skip to content

DKRZ-AIM/tutorial-large-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tutorial for Machine Learning with Large Datasets

Problem

Data can easily exceed memory, and we cannot load it all at once. Training a machine learning algorithm requires only one batch of data at a time, a tiny fraction of the overall dataset. Therefore, it can be efficient to load data only when needed. This repository collects notebooks for a demo machine learning algorithm using a large dataset. We cover the following frameworks:

  • Pytorch Lightning 2.4.0
  • Tensorflow
  • Keras

Install

Create a virtual environment

module load python3
python3 -m venv --system-site-packages .venv

Install the machine learning packages via pip

source .venv/bin/activate
pip install -r requirements.txt

Create Jupyterhub kernel

python -m ipykernel install --user --name tutorial_ml --display-name="Tutorial Machine Learning"

Use this kernel ("Tutorial Machine Learning") to run the notebook corresponding to your framework of choice.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published