Skip to content

Commit

Permalink
Fix tutorial links (#5714)
Browse files Browse the repository at this point in the history
  • Loading branch information
samadejacobs authored Jul 1, 2024
1 parent dd7a5be commit 3d34727
Showing 1 changed file with 4 additions and 6 deletions.
10 changes: 4 additions & 6 deletions docs/_tutorials/universal-checkpointing.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
---
#title: "Universal Checkpointing with DeepSpeed: A Practical Guide"
title: "Universal Checkpointing with DeepSpeed: A Practical Guide"
tags: checkpointing, training, deepspeed
---
# Universal Checkpointing with DeepSpeed: A Practical Guide

DeepSpeed Universal Checkpointing feature is a powerful tool for saving and loading model checkpoints in a way that is both efficient and flexible, enabling seamless model training continuation and finetuning across different model architectures, different parallelism techniques and training configurations. This tutorial, tailored for both begininers and experienced users, provides a step-by-step guide on how to leverage Universal Checkpointing in your DeepSpeed-powered applications. This tutorial will guide you through the process of creating ZeRO checkpoints, converting them into a Universal format, and resuming training with these universal checkpoints. This approach is crucial for leveraging pre-trained models and facilitating seamless model training across different setups.

Expand All @@ -23,7 +22,7 @@ Follow the three simple steps below:

### Step 1: Create ZeRO Checkpoint

The first step in leveraging DeepSpeed Universal Checkpointing is to create a ZeRO checkpoint. [ZeRO]((/tutorials/zero/)) (Zero Redundancy Optimizer) is a memory optimization technology in DeepSpeed that allows for efficient training of large models. To create a ZeRO checkpoint, you'll need to:
The first step in leveraging DeepSpeed Universal Checkpointing is to create a ZeRO checkpoint. [ZeRO](/tutorials/zero/) (Zero Redundancy Optimizer) is a memory optimization technology in DeepSpeed that allows for efficient training of large models. To create a ZeRO checkpoint, you'll need to:

- Initialize your model with DeepSpeed using the ZeRO optimizer.
- Train your model to the desired state (iterations).
Expand All @@ -50,9 +49,8 @@ With the Universal checkpoint ready, you can now resume training on potentially
## Conclusion
DeepSpeed Universal Checkpointing simplifies the management of model states, making it easier to save, load, and transfer model states across different training sessions and parallelism techniques. By following the steps outlined in this tutorial, you can integrate Universal Checkpointing into your DeepSpeed applications, enhancing your model training and development workflow.

For more detailed examples and advanced configurations, please refer to the [Megatron-DeepSpeed examples]((https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing)).
For more detailed examples and advanced configurations, please refer to the [Megatron-DeepSpeed examples](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing).

For technical in-depth of DeepSpeed Universal Checkpointing, please see [arxiv manuscript](https://arxiv.org/abs/2406.18820) and [blog](/blogs/deepspeed-ucp/).
For technical in-depth of DeepSpeed Universal Checkpointing, please see [arxiv manuscript](https://arxiv.org/abs/2406.18820) and [blog](https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ucp/).

Happy training!
```

0 comments on commit 3d34727

Please sign in to comment.