Release v0.19.0 · mosaicml/composer

What's New

1. Improved DTensor Support

Composer now supports elastic saving and loading of DTensors at various mesh sizes.

2. Checkpoint Saving and Loading from Databricks MLFlow

Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.

composer_model = MyComposerModel(...)

trainer = Trainer(
      model=composer_model,
      save_folder= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      logger=MLFlowLogger(...),
      load_path= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      ...
)

3. Better Communication Computation Overlap in FSDP

Composer now has improved communication/computation overlap in our FSDP code which should improve MFU across several architectures.

4. Python3.11 + Torch2.2 Support

Initial support of Python3.11 + Torch2.2 added in Composer.

5. PEFT LoRA

PEFT LoRA is now supported in the HuggingFaceModel class.

6. Refactored Evaluation

in_context_learning_evaluation.py has a new design with cleaner abstractions and easier interfaces to work wtih.

7. Azure Checkpointing

Composer now supports saving your model in Azure.

8. MLFlow Checkpointing

Composer now supports saving your model in MLFlow.

Bug Fixes

Fix MLFlowLogger test by @ngcgarcia in #2912
Fix bug with CoT early stopping and LLama2 tokenizer by @bmosaicml in #2902
Fix split_batch bug with empty generation_kwargs by @maxisawesome in #2913
Only load RNG keys that exist by @mvpatel2000 in #2901
Fix daily tests by @mvpatel2000 in #2891
Fix seed for FSDP wrap by @mvpatel2000 in #2833
Fix load_ignore_keys with rng by @mvpatel2000 in #2803
Fix mosaicml logger on close by @mvpatel2000 in #2816
Fix torch profiler error on close by @mvpatel2000 in #2818
Fix import for daily test by @snarayan21 in #2826
Fix how single value tensors are logged by @aspfohl in #2831
Fix torch bump by @j316chuck in #2855
Fix MPS with sequence loss by @JAEarly in #2834

What's Changed

Bump transformers version by @dakinggg in #2781
Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in #2784
Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in #2783
Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in #2785
[UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in #2789
Enable system metrics in mosaic mlflow logger by @chenmoneygithub in #2775
Update parse_uri by @irenedea in #2787
default to no torch profiler memory timeline by @cli99 in #2790
Add eot token to ICL generate kwargs by @bmosaicml in #2782
Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in #2791
Add torch nightly 12-13 by @j316chuck in #2792
Add process group as arg to FSDP by @mvpatel2000 in #2794
Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in #2798
Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in #2806
Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in #2805
Bump pytest from 7.4.3 to 7.4.4 by @dependabot in #2807
Avoid futures on close for MosaicML logger by @mvpatel2000 in #2804
Require sync module states with HSDP by @mvpatel2000 in #2812
Better communication computation overlap by @snarayan21 in #2811
Improve error message for speed monitor by @mvpatel2000 in #2801
Bump torch version -- DO NOT RELEASE by @mvpatel2000 in #2814
Bump torchvision for nightly by @mvpatel2000 in #2815
Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in #2817
Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in #2822
All unshard streams wait on computation every step by @snarayan21 in #2823
Add encoding=utf-8 by @dakinggg in #2824
[MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in #2802
Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in #2827
checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in #2819
code-quality timeout update by @aspfohl in #2830
Adds DTensor Support by @mvpatel2000 in #2821
Remove duplicate checkpoint verifications by @eracah in #2828
Remove fsdp patch for comm overlap by @mvpatel2000 in #2836
Allow hsdp by @mvpatel2000 in #2838
Bump torch 2.1.2 by @mvpatel2000 in #2840
Upgrade pyright to 1.1.310 by @b-chu in #2841
[MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in #2810
update nightly to torch 2.3 by @j316chuck in #2842
Pin sphinxcontrib applehelp by @mvpatel2000 in #2854
Torch 2.3 patch by @dakinggg in #2849
Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in #2866
Rewrite to use individual state functions by @mvpatel2000 in #2860
Add custom stopping criteria to ICL generate tasks by @bmosaicml in #2800
Add save_ignore_keys by @mvpatel2000 in #2868
Remome log debug by @mvpatel2000 in #2871
Update monkeypatch to put barrier in optim load by @mvpatel2000 in #2874
Remove toml by @b-chu in #2872
Update license by @b-chu in #2875
Add ignore_metrics field to the MLflow logger by @ngcgarcia in #2869
Convert print to log.info by @mvpatel2000 in #2876
Bump version to 0.18.0 by @irenedea in #2877
Removed commented-out unshard streams patching. by @snarayan21 in #2873
Make code quality workflow reusable by @b-chu in #2878
Bump gitpython from 3.1.40 to 3.1.41 by @dependabot in #2885
Bump torchmetrics by @mvpatel2000 in #2890
Bump transformers to 4.37 by @dakinggg in #2894
Azure checkpointing support by @mvpatel2000 in #2893
Pass PG into checkpoint load and load rng with state_dict by @mvpatel2000 in #2897
Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in #2899
Bump version to 0.18.1 by @b-chu in #2905
Refactor in_context_learning_evaluation.py by @maxisawesome in #2713
Fix FP8 checkpoint resumption with onnx export flag by @j316chuck in #2907
Add Python 3.11 + FA 2.5.0 + Torch 2.3.0 Image by @KuuCi in #2898
Add yamllint to pre commit by @b-chu in #2909
Add ignore_hyperparameters to MLFlowLogger by @ngcgarcia in #2908
Bump coverage[toml] from 7.3.4 to 7.4.1 by @dependabot in #2915
Add checkpoint test for 0.18.1 by @b-chu in #2906
Integrate PEFT LoRA with HuggingFaceModel by @dakinggg in #2829

New Contributors

@jerrychen109 made their first contribution in #2802
@JAEarly made their first contribution in #2834
@maxisawesome made their first contribution in #2713

Full Changelog: v0.17.2...v0.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.19.0