Additional README Changes for PR #57 (#61)

* edits to readme Signed-off-by: 1000960000 user <[email protected]> * Apply suggestions from code review Co-authored-by: Yu Chin Fabian Lim <[email protected]> Signed-off-by: 1000960000 user <[email protected]> * more readme changes Signed-off-by: 1000960000 user <[email protected]> --------- Signed-off-by: 1000960000 user <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]>
foundation-model-stack · Aug 2, 2024 · 0e15e8c · 0e15e8c
1 parent d510923
commit 0e15e8c
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -10,6 +10,7 @@ The fms-acceleration framework includes accelerators for Full and Parameter Effi
  - Bits-and-Bytes (BNB) quantised LoRA : QLoRA acceleration
  - AutoGPTQ quantised LoRA : GPTQ-LoRA acceleration
  - Full Fine Tuning acceleration (coming soon)
+ - Padding-Free Attention
 
 Our tests show a significant increase in training token throughput using this fms-acceleration framework.
 
@@ -29,9 +30,10 @@ For example:
 
 Plugin | Description | Depends | License | Status
 --|--|--|--|--
-[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Beta
-[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface<br>AutoGPTQ | Apache 2.0<br>MIT | Beta
+[framework](./plugins/framework/README.md) | This acceleration framework for integration with huggingface trainers | | | Alpha
+[accelerated-peft](./plugins/accelerated-peft/README.md) | For PEFT-training, e.g., 4bit QLoRA. | Huggingface<br>AutoGPTQ | Apache 2.0<br>MIT | Alpha
 [fused-op-and-kernels](./plugins/fused-ops-and-kernels/README.md)  | Fused LoRA and triton kernels (e.g., fast cross-entropy, rms, rope) | -- | Apache 2.0 [(contains extracted code)](./plugins/fused-ops-and-kernels/README.md#code-extracted-from-unsloth)| Beta
+[instruct-lab](./plugins/instruct-lab/README.md)  | Padding-Free Flash Attention Computation | flash-attn | Apache 2.0 | Beta
  MOE-training-acceleration  | [MegaBlocks](https://github.com/databricks/megablocks) inspired triton Kernels and acclerations for Mixture-of-Expert models |  | Apache 2.0 | Coming Soon
 
 ## Usage with FMS HF Tuning

diff --git a/plugins/instruct-lab/README.md b/plugins/instruct-lab/README.md
@@ -9,12 +9,12 @@ This library contains plugins to accelerate finetuning with the following optimi
 
 Plugin | Description | Depends | Loading | Augmentation | Callbacks
 --|--|--|--|--|--
-[padding_free](./src/fms_acceleration_ilab/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | ✅ | ✅
+[padding_free](./src/fms_acceleration_ilab/framework_plugin_padding_free.py) | Padding-Free Flash Attention Computation | flash_attn | | ✅ | ✅
 
 
-## Native Transformers Support from V4.44.0
-Transformers natively supports padding-free from v4.44.0. The padding-free plugin will use the transformers library if compatible, 
-otherwise if `transformers < V4.44.0` the plugin will use an internal implementation instead.
+## Native Transformers Support from v4.44.0
+Transformers natively supports padding-free from v4.44.0 [see here](https://github.com/huggingface/transformers/pull/31629). The padding-free plugin will use the transformers library if compatible, 
+otherwise if `transformers < v4.44.0` the plugin will use an internal implementation instead.
 
 ## Known Issues