From 6b4dd101337a021bf83127e418685d3df7663c3d Mon Sep 17 00:00:00 2001 From: root Date: Tue, 16 Jul 2024 20:08:17 +0000 Subject: [PATCH] add more docs --- docs/ORTModule_Training_Guidelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ORTModule_Training_Guidelines.md b/docs/ORTModule_Training_Guidelines.md index 6ac59a18edee0..6ba77ff8448bf 100644 --- a/docs/ORTModule_Training_Guidelines.md +++ b/docs/ORTModule_Training_Guidelines.md @@ -307,7 +307,7 @@ A classical usage of disabling the deep copy: when the deep copy before module e #### ORTMODULE_ATEN_SDPA_FALLBACK - **Feature Area**: *ORTMODULE/Optimizations* -- **Description**: By default, this is disabled. This env var can be used for enabling pre-export attention fall back to PyTorch's efficient_attention ATen kernel for execution. +- **Description**: By default, this is disabled. This env var can be used for enabling pre-export attention fall back to PyTorch's efficient_attention ATen kernel for execution. NOTE: will not work if model uses both masked and unmasked attention, can only be one. ```bash export ORTMODULE_ATEN_SDPA_FALLBACK=1 # ENABLE **WITHOUT** ATTN_MASK INPUT