Releases: foundation-model-stack/fms-hf-tuning
Releases · foundation-model-stack/fms-hf-tuning
v0.1.0 - First release
Summary of Changes
- Supports and validated tuning technique: full fine tuning using single-GPU and multi-GPU
- Multi-GPU training using HuggingFace accelerate library, focused on FSDP
- Experimental tuning techniques:
- Single GPU Prompt tuning
- Single GPU LoRA tuning
- Scripts to allow local inference and evaluation of tuned models
- Build scripts for containerization of library
- Initial trainer controller framework for controlling the trainer loop using user-defined rules and metrics
Pip package: pip install fms-hf-tuning==0.1.0
What's Changed
- Init by @raghukiran1224 in #1
- allows disable flash attn and torch dtype param by @Ssukriti in #2
- First refactor train by @Ssukriti in #3
- fix : the way args are passed by @Ssukriti in #10
- fix full param tuning by @lchu-ibm in #14
- fix import of aim_loader by @anhuong in #13
- fix: set model max length to either passed in or tokenizer value by @anhuong in #17
- fix: do not set model max length when loading model by @anhuong in #21
- add EOS token to dataset by @Ssukriti in #15
- Local inference by @alex-jw-brooks in #27
- feat: add validation dataset to train by @anhuong in #26
- feat: support str in target_modules for LoraConfig by @VassilisVassiliadis in #39
- Add formatting tools by @hickeyma in #31
- Enable code formatting by @hickeyma in #40
- Enable daily dependabot updates by @hickeyma in #41
- Add file logger callback & export train loss json file by @alex-jw-brooks in #22
- Merge models by @alex-jw-brooks in #32
- Local inference merged models by @alex-jw-brooks in #43
- feat: track validation loss in logs file by @anhuong in #51
- Add linting capability by @hickeyma in #52
- Add PR/Issue templates by @tedhtchang in #65
- Add sample unit tests by @tedhtchang in #61
- Initial commit for trainer image by @tharapalanivel in #69
- Adding copyright notices by @tharapalanivel in #77
- Enable pylint in the github workflow by @tedhtchang in #63
- Bump aim from 3.17.5 to 3.18.1 by @dependabot in #42
- Add Contributing file by @jbusche in #58
- docs: lora and getting modules list by @anhuong in #46
- Allow SFT_TRAINER_CONFIG_JSON_ENV_VAR to be encoded json string by @kellyaa in #82
- Document lint by @tedhtchang in #84
- Let Huggingface Properly Initialize Arguments, and Fix FSDP-LORA Checkpoint-Saves and Resumption by @fabianlim in #53
- Unit tests by @tharapalanivel in #83
- Update CONTRIBUTING.md by @Ssukriti in #86
- Update input args to max_seq_length and training_data_path by @anhuong in #94
- feat: move to accelerate launch for distributed training by @kmehant in #92
- Update README.md by @Ssukriti in #95
- Modify copyright notice by @tharapalanivel in #96
- Switches dependencies from txt file to toml file by @jbusche in #68
- fix: use attn_implementation="flash_attention_2" by @kmehant in #101
- fix: not passing PEFT argument should default to full parameter finetuning by @kmehant in #100
- feat: update launch training with accelerate for multi-gpu by @anhuong in #98
- Setting default values in training job config by @tharapalanivel in #104
- add refactored build utils into docker image by @anhuong in #108
- feat: combine train and eval loss into one file by @anhuong in #109
- docs: add note on ephemeral storage by @anhuong in #106
- Move accelerate launch args parsing by @tharapalanivel in #107
- Docs improvements by @Ssukriti in #111
- feat: add env var SET_NUM_PROCESSES_TO_NUM_GPUS by @anhuong in #110
- feat: Trainer controller framework by @seshapad in #45
- Copying logs file by @tharapalanivel in #113
- Fix copying over logs by @tharapalanivel in #114
- Add eval script by @alex-jw-brooks in #102
- Lint tests by @tharapalanivel in #112
- Move sklearn to optional, install optionals for linting by @alex-jw-brooks in #117
- Build Wheel Action by @jbusche in #105
- rstrip eos in evaluation by @alex-jw-brooks in #121
- Fix eos token suffix removal by @alex-jw-brooks in #125
- Make use of instruction field optional by @alex-jw-brooks in #123
- Deprecating the requirements.txt for dependencies management by @tedhtchang in #116
- Add unit tests for various edge cases by @alex-jw-brooks in #97
- fix typo in build gha by @jbusche in #138
- Install whl in Dockerfile by @tedhtchang in #126
- feat: add flash attn to inference and eval scripts by @anhuong in #132
- OS update in dockerfile by @jbusche in #127
- fix: ignore the build output and auto-generated files by @HarikrishnanBalagopal in #140
- Propose ADR for Training Acceleration by @fabianlim in #119
- feat: new format for the controller metrics and operations by @HarikrishnanBalagopal in #130
- adr: Format change to the trainer controller configuration by @seshapad in #128
- Generic tracker API and implementation of Aimstack tracker by @dushyantbehl in #89
- fix: Allow makefile to run test independent of fmt/lint by @dushyantbehl in #145
- feat: Trainer state as a trainer controller metric by @seshapad in #150
- Bump aim from 3.18.1 to 3.19.0 by @dependabot in #93
- fix: launch_training.py arguments with new tracker api by @dushyantbehl in #153
- feat: Exposed the evaluation metrics for rules within trainer controller by @seshapad in #146
- Comment out aim in dockerfile by @jbusche in #155
- fix: replace eval with a safer alternative by @HarikrishnanBalagopal in #147
- doc...
v0.1.0-rc.1
What's Changed
- fix: replace eval with a safer alternative by @HarikrishnanBalagopal in #147
- docs: ADR for moving from
eval
tosimpleeval
for evaluating trainer controller rules by @HarikrishnanBalagopal in #151 - Add exception catching / writing to termination log by @kellyaa in #149
- fix: merging of model for multi-gpu by @anhuong in #158
- add .complete file to output dir when done by @kellyaa in #159
Full Changelog: v0.0.2rc2...v0.1.0-rc.1
v0.0.2rc.2
What's Changed
- fix typo in build gha by @jbusche in #138
- Install whl in Dockerfile by @tedhtchang in #126
- feat: add flash attn to inference and eval scripts by @anhuong in #132
- OS update in dockerfile by @jbusche in #127
- fix: ignore the build output and auto-generated files by @HarikrishnanBalagopal in #140
- Propose ADR for Training Acceleration by @fabianlim in #119
- feat: new format for the controller metrics and operations by @HarikrishnanBalagopal in #130
- adr: Format change to the trainer controller configuration by @seshapad in #128
- Generic tracker API and implementation of Aimstack tracker by @dushyantbehl in #89
- fix: Allow makefile to run test independent of fmt/lint by @dushyantbehl in #145
- feat: Trainer state as a trainer controller metric by @seshapad in #150
- Bump aim from 3.18.1 to 3.19.0 by @dependabot in #93
- fix: launch_training.py arguments with new tracker api by @dushyantbehl in #153
- feat: Exposed the evaluation metrics for rules within trainer controller by @seshapad in #146
- Comment out aim in dockerfile by @jbusche in #155
New Contributors
- @HarikrishnanBalagopal made their first contribution in #140
- @dushyantbehl made their first contribution in #89
Full Changelog: v0.0.2rc1...v0.0.2rc2
v0.0.2rc1
What's Changed
- Init by @raghukiran1224 in #1
- allows disable flash attn and torch dtype param by @Ssukriti in #2
- First refactor train by @Ssukriti in #3
- fix : the way args are passed by @Ssukriti in #10
- fix full param tuning by @lchu-ibm in #14
- fix import of aim_loader by @anhuong in #13
- fix: set model max length to either passed in or tokenizer value by @anhuong in #17
- fix: do not set model max length when loading model by @anhuong in #21
- add EOS token to dataset by @Ssukriti in #15
- Local inference by @alex-jw-brooks in #27
- feat: add validation dataset to train by @anhuong in #26
- feat: support str in target_modules for LoraConfig by @VassilisVassiliadis in #39
- Add formatting tools by @hickeyma in #31
- Enable code formatting by @hickeyma in #40
- Enable daily dependabot updates by @hickeyma in #41
- Add file logger callback & export train loss json file by @alex-jw-brooks in #22
- Merge models by @alex-jw-brooks in #32
- Local inference merged models by @alex-jw-brooks in #43
- feat: track validation loss in logs file by @anhuong in #51
- Add linting capability by @hickeyma in #52
- Add PR/Issue templates by @tedhtchang in #65
- Add sample unit tests by @tedhtchang in #61
- Initial commit for trainer image by @tharapalanivel in #69
- Adding copyright notices by @tharapalanivel in #77
- Enable pylint in the github workflow by @tedhtchang in #63
- Bump aim from 3.17.5 to 3.18.1 by @dependabot in #42
- Add Contributing file by @jbusche in #58
- docs: lora and getting modules list by @anhuong in #46
- Allow SFT_TRAINER_CONFIG_JSON_ENV_VAR to be encoded json string by @kellyaa in #82
- Document lint by @tedhtchang in #84
- Let Huggingface Properly Initialize Arguments, and Fix FSDP-LORA Checkpoint-Saves and Resumption by @fabianlim in #53
- Unit tests by @tharapalanivel in #83
- Update CONTRIBUTING.md by @Ssukriti in #86
- Update input args to max_seq_length and training_data_path by @anhuong in #94
- feat: move to accelerate launch for distributed training by @kmehant in #92
- Update README.md by @Ssukriti in #95
- Modify copyright notice by @tharapalanivel in #96
- Switches dependencies from txt file to toml file by @jbusche in #68
- fix: use attn_implementation="flash_attention_2" by @kmehant in #101
- fix: not passing PEFT argument should default to full parameter finetuning by @kmehant in #100
- feat: update launch training with accelerate for multi-gpu by @anhuong in #98
- Setting default values in training job config by @tharapalanivel in #104
- add refactored build utils into docker image by @anhuong in #108
- feat: combine train and eval loss into one file by @anhuong in #109
- docs: add note on ephemeral storage by @anhuong in #106
- Move accelerate launch args parsing by @tharapalanivel in #107
- Docs improvements by @Ssukriti in #111
- feat: add env var SET_NUM_PROCESSES_TO_NUM_GPUS by @anhuong in #110
- feat: Trainer controller framework by @seshapad in #45
- Copying logs file by @tharapalanivel in #113
- Fix copying over logs by @tharapalanivel in #114
- Add eval script by @alex-jw-brooks in #102
- Lint tests by @tharapalanivel in #112
- Move sklearn to optional, install optionals for linting by @alex-jw-brooks in #117
- Build Wheel Action by @jbusche in #105
- rstrip eos in evaluation by @alex-jw-brooks in #121
- Fix eos token suffix removal by @alex-jw-brooks in #125
- Make use of instruction field optional by @alex-jw-brooks in #123
- Deprecating the requirements.txt for dependencies management by @tedhtchang in #116
- Add unit tests for various edge cases by @alex-jw-brooks in #97
New Contributors
- @raghukiran1224 made their first contribution in #1
- @Ssukriti made their first contribution in #2
- @lchu-ibm made their first contribution in #14
- @anhuong made their first contribution in #13
- @alex-jw-brooks made their first contribution in #27
- @VassilisVassiliadis made their first contribution in #39
- @hickeyma made their first contribution in #31
- @tedhtchang made their first contribution in #65
- @tharapalanivel made their first contribution in #69
- @dependabot made their first contribution in #42
- @jbusche made their first contribution in #58
- @kellyaa made their first contribution in #82
- @fabianlim made their first contribution in #53
- @kmehant made their first contribution in #92
- @seshapad made their first contribution in #45
Full Changelog: https://github.com/foundation-model-stack/fms-hf-tuning/commits/v.0.0.2rc1