Skip to content

Latest commit

 

History

History
33 lines (22 loc) · 977 Bytes

README.md

File metadata and controls

33 lines (22 loc) · 977 Bytes

Tutorial for Machine Learning with Large Datasets

Problem

Data can easily exceed memory, and we cannot load it all at once. Training a machine learning algorithm requires only one batch of data at a time, a tiny fraction of the overall dataset. Therefore, it can be efficient to load data only when needed. This repository collects notebooks for a demo machine learning algorithm using a large dataset. We cover the following frameworks:

  • Pytorch Lightning 2.4.0
  • Tensorflow
  • Keras

Install

Create a virtual environment

module load python3
python3 -m venv --system-site-packages .venv

Install the machine learning packages via pip

source .venv/bin/activate
pip install -r requirements.txt

Create Jupyterhub kernel

python -m ipykernel install --user --name tutorial_ml --display-name="Tutorial Machine Learning"

Use this kernel ("Tutorial Machine Learning") to run the notebook corresponding to your framework of choice.