Skip to content

Releases: mosaicml/composer

v0.19.0

02 Feb 09:07
Compare
Choose a tag to compare

What's New

1. Improved DTensor Support

Composer now supports elastic saving and loading of DTensors at various mesh sizes.

2. Checkpoint Saving and Loading from Databricks MLFlow

Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.

composer_model = MyComposerModel(...)

trainer = Trainer(
      model=composer_model,
      save_folder= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      logger=MLFlowLogger(...),
      load_path= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      ...
)

3. Better Communication Computation Overlap in FSDP

Composer now has improved communication/computation overlap in our FSDP code which should improve MFU across several architectures.

4. Python3.11 + Torch2.2 Support

Initial support of Python3.11 + Torch2.2 added in Composer.

5. PEFT LoRA

PEFT LoRA is now supported in the HuggingFaceModel class.

6. Refactored Evaluation

in_context_learning_evaluation.py has a new design with cleaner abstractions and easier interfaces to work wtih.

7. Azure Checkpointing

Composer now supports saving your model in Azure.

8. MLFlow Checkpointing

Composer now supports saving your model in MLFlow.

Bug Fixes

What's Changed

New Contributors

Read more

v0.18.2

01 Feb 04:48
Compare
Choose a tag to compare

Bug Fixes

What's Changed

Full Changelog: v0.18.1...v0.18.2

v0.18.1

30 Jan 23:10
79e3309
Compare
Choose a tag to compare

Bug Fixes

What's Changed

New Contributors

Full Changelog: v0.18.0...v0.18.1

v0.18.0

25 Jan 20:44
Compare
Choose a tag to compare

This release has been yanked, please skip directly to Composer v0.18.1

New Features

1. Improved DTensor Support

Composer now supports elastic saving and loading of DTensors at various mesh sizes.

2. Checkpoint Saving and Loading from Databricks MLFlow

Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.

composer_model = MyComposerModel(...)

trainer = Trainer(
      model=composer_model,
      save_folder= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      logger=MLFlowLogger(...),
      load_path= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      ...
)

Bug Fixes

Deprecations

  • Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in #2827

What's Changed

New Contributors

Full Changelog: v0.17.2...v0.18.0

v0.17.2

14 Dec 20:02
7e0e40a
Compare
Choose a tag to compare

New Features

1. Torch 2.1.1 Support

Composer now supports torch 2.1.1! This new release primarily fixes several small bugs that we had previously monkeypatched in Composer.

2. Faster OCI Upload/Download

Composer now supports multi-part upload/download to OCI, which should speedup object store times.

3. Memory Profiling

We've expanded the torch profiler integration to support memory profiling. Now, when the profile is enabled, you will get a trace showing how memory utilization is broken down by various components on your GPUs.

Bug Fixes

1. FSDP Initialization with Meta

Previously, our FSDP integration had a bug with initializing weights when using device=meta, which resulted in an additional scaling. This has now been fixed, so device and distributed strategies should not affect parallelization strategy.

What's Changed

New Contributors

Full Changelog: v0.17.1...v0.17.2

v0.17.1

27 Nov 22:07
2b3e2a6
Compare
Choose a tag to compare

Bug Fixes

1. MosaicML Logger Robustness (#2728)

We've improved the MosaicML logger to be more robust to faulty serialization.

What's Changed

Full Changelog: v0.17.0...v0.17.1

v0.17.0

16 Nov 00:23
83a40f5
Compare
Choose a tag to compare

What's New

1. Hybrid Sharded Data Parallel (HSDP) Integration (#2648)

Composer now supports Hybrid Sharded Data Parallel (HSDP), where a model is both sharded and replicated across blocks of controllable size. By default, this will shard a model within a node and replicate across nodes, but Composer will accept a tuple of process groups to specify custom shard/replicate sizes. This can be specified in the FSDP config.

  composer_model = MyComposerModel(n_layers=3)

  fsdp_config = {
      'sharding_strategy': 'HYBRID_SHARD',
  }

  trainer = Trainer(
      model=composer_model,
      max_duration='4ba',
      fsdp_config=fsdp_config,
      ...
  )

HYBRID_SHARD will FULL_SHARD a model whereas _HYBRID_SHARD_ZERO2 will SHARD_GRAD_OP within the shard block.

2. Train Loss NaN Monitor (#2704)

Composer has a new callback which will raise a value error if your loss NaNs out. This is very useful to avoid wasting compute if your training run diverges or fails for numerical reasons.

  from composer.callbacks import NaNMonitor

  composer_model = MyComposerModel(n_layers=3)

  trainer = Trainer(
      model=composer_model,
      max_duration='4ba',
      callbacks=NaNMonitor(),
      ...
  )

Bug Fixes

What's Changed

New Contributors

Full Changelog: v0.16.4...v0.17.0

v0.16.4

11 Oct 19:49
1c9d8d1
Compare
Choose a tag to compare

What's New

1. Torch 2.1 Support

Composer officially supports PyTorch 2.1! We support several new features from 2.1, including CustomPolicy which supports granular wrapping with FSDP.

What's Changed

New Contributors

Full Changelog: v0.16.3...v0.16.4

v0.16.3

26 Sep 18:07
c82da77
Compare
Choose a tag to compare

What's New

1. Add pass@k for HumanEval

HumanEval now supports pass@k. We also support first-class integration with the MosaicML platform for secure code evaluation.

2. log_model with MLFlow

The MLFlow integration now supports log_model at the end of the run.

What's Changed

New Contributors

Full Changelog: v0.16.2...v0.16.3

v0.16.2

14 Sep 16:09
130bde5
Compare
Choose a tag to compare

What's New

1. PyTorch Nightly Support

Composer now supports PyTorch Nightly and Cuda 12! Along with new docker images based on nightly PyTorch versions and release candidates, we've updated our PyTorch monkeypatches to support the latest version of PyTorch. These monkeypatches add additional functionality in finer-grain FSDP wrapping and patch bugs related to sharded checkpoints. We are in the process of upstreaming these changes into PyTorch.

Bug Fixes

1. MosaicML Logger Robustness

MosaicML logger now is robust to platform timeouts and other errors. Additionally, it can now be disabled by setting the environment variable MOSAICML_PLATFORM to 'False' when training on the MosaicML platform.

2. GCS Integration

GCS authentication is now supported with HMAC keys, patching a bug in the previous implementation.

3. Optimizer Monitor Norm Calculation (#2531)

Previously, the optimizer monitor incorrectly reduced norms across GPUs. It now correctly computes norms in a distributed setting.

What's Changed

New Contributors

Full Changelog: v0.16.1...v0.16.2