v0.19.0
What's New
1. Improved DTensor Support
Composer now supports elastic saving and loading of DTensors at various mesh sizes.
2. Checkpoint Saving and Loading from Databricks MLFlow
Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.
composer_model = MyComposerModel(...)
trainer = Trainer(
model=composer_model,
save_folder= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
logger=MLFlowLogger(...),
load_path= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
...
)
3. Better Communication Computation Overlap in FSDP
Composer now has improved communication/computation overlap in our FSDP code which should improve MFU across several architectures.
4. Python3.11 + Torch2.2 Support
Initial support of Python3.11 + Torch2.2 added in Composer.
5. PEFT LoRA
PEFT LoRA is now supported in the HuggingFaceModel class.
6. Refactored Evaluation
in_context_learning_evaluation.py
has a new design with cleaner abstractions and easier interfaces to work wtih.
7. Azure Checkpointing
Composer now supports saving your model in Azure.
8. MLFlow Checkpointing
Composer now supports saving your model in MLFlow.
Bug Fixes
- Fix MLFlowLogger test by @ngcgarcia in #2912
- Fix bug with CoT early stopping and LLama2 tokenizer by @bmosaicml in #2902
- Fix split_batch bug with empty generation_kwargs by @maxisawesome in #2913
- Only load RNG keys that exist by @mvpatel2000 in #2901
- Fix daily tests by @mvpatel2000 in #2891
- Fix seed for FSDP wrap by @mvpatel2000 in #2833
- Fix load_ignore_keys with rng by @mvpatel2000 in #2803
- Fix mosaicml logger on close by @mvpatel2000 in #2816
- Fix torch profiler error on close by @mvpatel2000 in #2818
- Fix import for daily test by @snarayan21 in #2826
- Fix how single value tensors are logged by @aspfohl in #2831
- Fix torch bump by @j316chuck in #2855
- Fix MPS with sequence loss by @JAEarly in #2834
What's Changed
- Bump transformers version by @dakinggg in #2781
- Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in #2784
- Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in #2783
- Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in #2785
- [UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in #2789
- Enable system metrics in mosaic mlflow logger by @chenmoneygithub in #2775
- Update parse_uri by @irenedea in #2787
- default to no torch profiler memory timeline by @cli99 in #2790
- Add eot token to ICL generate kwargs by @bmosaicml in #2782
- Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in #2791
- Add torch nightly 12-13 by @j316chuck in #2792
- Add process group as arg to FSDP by @mvpatel2000 in #2794
- Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in #2798
- Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in #2806
- Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in #2805
- Bump pytest from 7.4.3 to 7.4.4 by @dependabot in #2807
- Avoid futures on close for MosaicML logger by @mvpatel2000 in #2804
- Require sync module states with HSDP by @mvpatel2000 in #2812
- Better communication computation overlap by @snarayan21 in #2811
- Improve error message for speed monitor by @mvpatel2000 in #2801
- Bump torch version -- DO NOT RELEASE by @mvpatel2000 in #2814
- Bump torchvision for nightly by @mvpatel2000 in #2815
- Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in #2817
- Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in #2822
- All unshard streams wait on computation every step by @snarayan21 in #2823
- Add encoding=utf-8 by @dakinggg in #2824
- [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in #2802
- Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in #2827
- checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in #2819
- code-quality timeout update by @aspfohl in #2830
- Adds DTensor Support by @mvpatel2000 in #2821
- Remove duplicate checkpoint verifications by @eracah in #2828
- Remove fsdp patch for comm overlap by @mvpatel2000 in #2836
- Allow hsdp by @mvpatel2000 in #2838
- Bump torch 2.1.2 by @mvpatel2000 in #2840
- Upgrade pyright to 1.1.310 by @b-chu in #2841
- [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in #2810
- update nightly to torch 2.3 by @j316chuck in #2842
- Pin sphinxcontrib applehelp by @mvpatel2000 in #2854
- Torch 2.3 patch by @dakinggg in #2849
- Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in #2866
- Rewrite to use individual state functions by @mvpatel2000 in #2860
- Add custom stopping criteria to ICL generate tasks by @bmosaicml in #2800
- Add save_ignore_keys by @mvpatel2000 in #2868
- Remome log debug by @mvpatel2000 in #2871
- Update monkeypatch to put barrier in optim load by @mvpatel2000 in #2874
- Remove toml by @b-chu in #2872
- Update license by @b-chu in #2875
- Add ignore_metrics field to the MLflow logger by @ngcgarcia in #2869
- Convert print to log.info by @mvpatel2000 in #2876
- Bump version to 0.18.0 by @irenedea in #2877
- Removed commented-out unshard streams patching. by @snarayan21 in #2873
- Make code quality workflow reusable by @b-chu in #2878
- Bump gitpython from 3.1.40 to 3.1.41 by @dependabot in #2885
- Bump torchmetrics by @mvpatel2000 in #2890
- Bump transformers to 4.37 by @dakinggg in #2894
- Azure checkpointing support by @mvpatel2000 in #2893
- Pass PG into checkpoint load and load rng with state_dict by @mvpatel2000 in #2897
- Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in #2899
- Bump version to 0.18.1 by @b-chu in #2905
- Refactor in_context_learning_evaluation.py by @maxisawesome in #2713
- Fix FP8 checkpoint resumption with onnx export flag by @j316chuck in #2907
- Add Python 3.11 + FA 2.5.0 + Torch 2.3.0 Image by @KuuCi in #2898
- Add yamllint to pre commit by @b-chu in #2909
- Add ignore_hyperparameters to MLFlowLogger by @ngcgarcia in #2908
- Bump coverage[toml] from 7.3.4 to 7.4.1 by @dependabot in #2915
- Add checkpoint test for 0.18.1 by @b-chu in #2906
- Integrate PEFT LoRA with HuggingFaceModel by @dakinggg in #2829
New Contributors
- @jerrychen109 made their first contribution in #2802
- @JAEarly made their first contribution in #2834
- @maxisawesome made their first contribution in #2713
Full Changelog: v0.17.2...v0.19.0