Skip to content

Commit

Permalink
bump DSE and doc tweak
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffra authored and Ubuntu committed Sep 10, 2020
1 parent 240ea97 commit 4b1df25
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion DeepSpeedExamples
2 changes: 1 addition & 1 deletion docs/_posts/2020-09-09-sparse-attention.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ To learn more about Sparsity Config, and also how to use this library, please ch
## Performance Results

* **Power over 10x longer sequences**
In a pre-training experiment, we ran BERT model under three settings: dense, dense with activation checkpoint, and sparse (SA) with activation checkpoint. SA empowers 10x and 16x longer sequences comparing with dense for BERT base and large, respectively. Following figure shows the longest sequence length runnable in BERT base and large model; experiment is performed with batch size 1 on a single Nvidia V100 GPU-32GB memory.
In a pre-training experiment, we ran BERT model under three settings: dense, dense with activation checkpoint, and sparse (SA) with activation checkpoint. SA empowers 10x and 16x longer sequences comparing with dense for BERT base and large, respectively. Following figure shows the longest sequence length runnable in BERT base and large model; experiment is performed with batch size 1 on a single NVIDIA V100 GPU-32GB memory.

![Maximum sequence runnable on BERT](/assets/images/sa_maximum_sequence_runnable_on_bert.png){: .align-center}

Expand Down
2 changes: 1 addition & 1 deletion docs/_tutorials/sparse-attention.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "DeepSpeed Sparse Attention"

In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launcher. We will describe this through an example in [How to use sparse attention with DeepSpeed launcher](/tutorials/sparse-attention/#how-to-use-sparse-attention-with-deepspeed-launcher) section. But before that, we introduce modules provided by DeepSpeed SA in the [next](/tutorials/sparse-attention/#sparse-attention-modules) section.

**Note:** Currently DeepSpeed Sparse Attention can be used only on Nvidia V100 GPU using Torch >= 1.5 and Cuda 10.1 or 10.2.
**Note:** Currently DeepSpeed Sparse Attention can be used only on NVIDIA V100 GPU using Torch >= 1.5 and Cuda 10.1 or 10.2.
{: .notice--warning}

## Sparse attention modules
Expand Down

0 comments on commit 4b1df25

Please sign in to comment.