v0.7.5
🚀 Streaming v0.7.5
Streaming v0.7.5
is released! Install via pip
:
pip install --upgrade mosaicml-streaming==0.7.5
💎 New Features
1. Tensor/Sequence Parallelism Support
Using the replication
argument, easily share data samples across multiple ranks, enabling sequence or tensor parallelism.
- Replicating samples across devices (SP / TP enablement) by @knighton in #597
- Expanded replication testing + documentation by @snarayan21 in #607
- Make streaming use the correct number of unique samples with SP/TP by @snarayan21 in #619
2. Overhauled Streaming Documentation
New and improved streaming documentation can be found here -- please submit issues with any feedback.
- Major overhaul of Streaming documentation by @snarayan21 in #636
3. batch_size
is now required for StreamingDataset
As we have seen multiple errors and performance degradations from users not setting the batch_size
argument to StreamingDataset, we are making it a requirement to iterate over the dataset.
- You must set batch size. There is no other way. by @snarayan21 in #624
3. Support for Python 3.11, deprecate Python 3.8
- Add support for Python 3.11 and deprecate Python 3.8 by @karan6181 in #586
🐛 Bug Fixes
- [easy typo fix] fix f-string by @bigning in #596
- Change comparison in partitions to include equals by @JAEarly in #587
- Use type int when initializing SharedMemory size by @bchiang2 in #604
- COCO Dataset fix -- avoids
allow_unsafe_types=True
by @snarayan21 in #647
🔧 Improvements
- Allow writers to overwrite existing data by @JAEarly in #594
- Update careers link by @milocress in #611
- Update license by @b-chu in #568
- Updated documentation for S3-compatible object stores by @AIproj in #592
- Make yamllint consistent with Composer by @b-chu in #583
- Switch linting workflows to ci-testing repo by @b-chu in #616
What's Changed
- Bump uvicorn from 0.26.0 to 0.27.1 by @dependabot in #599
- Bump pytest-split from 0.8.1 to 0.8.2 by @dependabot in #581
- Update ruff to 0.2.2 by @Skylion007 in #608
- Bump fastapi from 0.109.0 to 0.110.0 by @dependabot in #610
- Bump yamllint from 1.33.0 to 1.35.1 by @dependabot in #601
- Bump uvicorn from 0.27.1 to 0.28.0 by @dependabot in #626
- Update moto requirement from <5,>=4.0 to >=4.0,<6 by @dependabot in #580
- Bump furo from 2023.7.26 to 2024.1.29 by @dependabot in #631
- Bump pypandoc from 1.12 to 1.13 by @dependabot in #630
- Bump databricks-sdk from 0.14.0 to 0.22.0 by @dependabot in #629
- Add batch_size to 1 if not provided for regression testing by @karan6181 in #635
- Fixed docstring note for getting sequential sample ordering by @snarayan21 in #632
- Bump pytest and fix failing test by @snarayan21 in #642
- Update pytest-cov requirement from <5,>=4 to >=4,<6 by @dependabot in #638
- Bump pydantic from 2.5.3 to 2.6.4 by @dependabot in #639
- Bump uvicorn from 0.28.0 to 0.29.0 by @dependabot in #640
- Bump databricks-sdk from 0.22.0 to 0.23.0 by @dependabot in #644
- Version bump to 0.7.5 by @snarayan21 in #650
New Contributors
- @bigning made their first contribution in #596
- @JAEarly made their first contribution in #587
- @AIproj made their first contribution in #592
- @milocress made their first contribution in #611
- @bchiang2 made their first contribution in #604
Full Changelog: v0.7.4...v0.7.5