Skip to content

v0.6.1

Compare
Choose a tag to compare
@karan6181 karan6181 released this 18 Oct 21:28
· 189 commits to main since this release
8827d7a

🚀 Streaming v0.6.1

Streaming v0.6.1 is released! Install via pip:

pip install --upgrade mosaicml-streaming==0.6.1

💎 New Features

🚃 Merge meta-data information from sub-directories dataset to form one unified dataset. (#449)

  • Addition of the merge_index() utility method to merge subdirectories index files from an MDS dataset. The subdirectories can be local or any supported cloud provider URL path.
  • Checkout dataset conversion and Spark Dataframe to MDS jupyter notebook for an example in action.

🔁 Retry uploading a file to a cloud provider path. (#448)

  • Added upload retry logic with backoff and jitter during dataset conversion as part of parameter retry in Writer.
from streaming import MDSWriter

with MDSWriter(
               ...,
               retry=3) as out:
    for sample in dataset:
        out.write(sample)

🐛 Bug Fixes

  • Validate Writer arguments and raise a ValueError exception if argument(s) is/are invalid. (#434)
  • Terminate the main process if one of the upload threads receives an Exception during dataset conversion. (#448)

🔧 Improvements

  • More balancing inter-node downloading for the py1e shuffling algorithm by varying shard sample ranges, helping to reduce throughput drops at scale. (#442)

What's Changed

New Contributors

  • @Hubert-Bonisseur made their first contribution in #450

Full Changelog: v0.6.0...v0.6.1