v0.18.0
This release has been yanked, please skip directly to Composer v0.18.1
New Features
1. Improved DTensor Support
Composer now supports elastic saving and loading of DTensors at various mesh sizes.
2. Checkpoint Saving and Loading from Databricks MLFlow
Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.
composer_model = MyComposerModel(...)
trainer = Trainer(
model=composer_model,
save_folder= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
logger=MLFlowLogger(...),
load_path= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
...
)
Bug Fixes
- Fix load_ignore_keys with rng by @mvpatel2000 in #2803
- Fix mosaicml logger on close by @mvpatel2000 in #2816
- Fix torch profiler error on close by @mvpatel2000 in #2818
- Fix import for daily test by @snarayan21 in #2826
- [S] Fix how single value tensors are logged by @aspfohl in #2831
Deprecations
- Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in #2827
What's Changed
- Bump transformers version by @dakinggg in #2781
- Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in #2784
- Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in #2783
- Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in #2785
- [UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in #2789
- Enable system metrics in mosaic mlflow logger by @chenmoneygithub in #2775
- Update parse_uri by @irenedea in #2787
- default to no torch profiler memory timeline by @cli99 in #2790
- Add eot token to ICL generate kwargs by @bmosaicml in #2782
- Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in #2791
- Add torch nightly 12-13 by @j316chuck in #2792
- Add process group as arg to FSDP by @mvpatel2000 in #2794
- Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in #2798
- Fix load_ignore_keys with rng by @mvpatel2000 in #2803
- Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in #2806
- Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in #2805
- Bump pytest from 7.4.3 to 7.4.4 by @dependabot in #2807
- Avoid futures on close for MosaicML logger by @mvpatel2000 in #2804
- Require sync module states with HSDP by @mvpatel2000 in #2812
- Better communication computation overlap by @snarayan21 in #2811
- Improve error message for speed monitor by @mvpatel2000 in #2801
- Bump torch version -- DO NOT RELEASE by @mvpatel2000 in #2814
- Bump torchvision for nightly by @mvpatel2000 in #2815
- Fix mosaicml logger on close by @mvpatel2000 in #2816
- Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in #2817
- Fix torch profiler error on close by @mvpatel2000 in #2818
- Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in #2822
- All unshard streams wait on computation every step by @snarayan21 in #2823
- Add encoding=utf-8 by @dakinggg in #2824
- Fix import for daily test by @snarayan21 in #2826
- [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in #2802
- Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in #2827
- checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in #2819
- code-quality timeout update by @aspfohl in #2830
- [S] Fix how single value tensors are logged by @aspfohl in #2831
- Adds DTensor Support by @mvpatel2000 in #2821
- Remove duplicate checkpoint verifications by @eracah in #2828
- Fix seed for FSDP wrap by @mvpatel2000 in #2833
- Remove fsdp patch for comm overlap by @mvpatel2000 in #2836
- Allow hsdp by @mvpatel2000 in #2838
- Bump torch 2.1.2 by @mvpatel2000 in #2840
- Upgrade pyright to 1.1.310 by @b-chu in #2841
- [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in #2810
- update nightly to torch 2.3 by @j316chuck in #2842
- Pin sphinxcontrib applehelp by @mvpatel2000 in #2854
- Fix torch bump by @j316chuck in #2855
- Torch 2.3 patch by @dakinggg in #2849
- Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in #2866
- Rewrite to use individual state functions by @mvpatel2000 in #2860
- Add custom stopping criteria to ICL generate tasks by @bmosaicml in #2800
- Add save_ignore_keys by @mvpatel2000 in #2868
- Remome log debug by @mvpatel2000 in #2871
- Update monkeypatch to put barrier in optim load by @mvpatel2000 in #2874
- Remove toml by @b-chu in #2872
- Update license by @b-chu in #2875
- Add ignore_metrics field to the MLflow logger by @ngcgarcia in #2869
- Convert print to log.info by @mvpatel2000 in #2876
New Contributors
- @jerrychen109 made their first contribution in #2802
Full Changelog: v0.17.2...v0.18.0