Skip to content

This package provides tools for training supervised machine learning models for computational pathology tasks using tile-level embeddings.

License

Notifications You must be signed in to change notification settings

kz1990/paige-ml-sdk

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paige ML SDK (Alpha)

This package provides tools for training supervised machine learning models for computational pathology tasks using tile-level embeddings.

This repository accompanies the publication of the Paige Virchow foundation model suite. If you use this repository for academic research, please cite the following paper: Vorontsov, Eugene et al. “Virchow: A Million-Slide Digital Pathology Foundation Model.” ArXiv abs/2309.07778 (2023).

This repository is intended strictly for non-commercial academic research use. For commercial collaborations, please contact Paige AI under the appropriate terms. For detailed terms, please refer to our LICENSE.

Installation

cd paige-ml-sdk; pip install -e .

Getting Started

The SDK is equipped with a cli which can be used to train models and run inference. Run python -m paige.ml_sdk --help to get started, or refer to the examples directory for a basic tutorial illustrating how to use the SDK and how to organize your data.

Advanced Usage

The SDK and its CLI are powered by pytorch lightning which has many customizeable features.

Changing Loggers

By default, the sdk relies on Lightning's CSV logger which writes outputs to the lightning_logs folder. Use --trainer.logger and choose from any of Lightning's built-in loggers such as WandB or TensorBoard.

Changing Optimizers and Adjusting Learning Rate

Use --optimizer and choose from any of the torch.optim optimizers. Similarly, use --lr_scheduler to adjust the learning rate.

Customizing Callbacks

Per pytorch lightning's documentation:

Callbacks allow you to add arbitrary self-contained programs to your training. At specific points during the flow of execution (hooks), the Callback interface allows you to design programs that encapsulate a full set of functionality. It de-couples functionality that does not need to be in the lightning module and can be shared across projects.

Lightning has a callback system to execute them when needed. Callbacks should capture NON-ESSENTIAL logic that is NOT required for your lightning module to run.

Callbacks can be used for many things, like saving model checkpoints, profiling exectution, or early stopping. By default, the CLI only applies a single callback which saves model checkpoints. Users wishing to configure this callback differently, or add additional callbacks may consider the options offered here: https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_advanced_3.html#trainer-callbacks-and-arguments-with-class-type

Changing Arbitrary Trainer Args

Lightning's Trainer class automates most of the complexity surrounding model training. It handles the training strategy (e.g., DDP, FSDP), the choice of hardware (cpu, gpu), training precision, the number of epochs to train for, and much more. Most common practice tricks in AI engineering can be configured via the Trainer class. All of the available trainer flags are documented in the cli's help text:

python -m paige.ml_sdk fit --help

for example, to train a model in 16-bit precision, set --trainer.precision 16.

About

This package provides tools for training supervised machine learning models for computational pathology tasks using tile-level embeddings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%