Skip to content

Out-of-memory sorting of large datasets map / reduce style processing

License

Notifications You must be signed in to change notification settings

10XGenomics/rust-shardio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rust-shardio

Crates.io Downloads Crates.io Version Crates.io License Build Status Coverage Status API Docs

Library for out-of-memory sorting of large datasets which need to be processed in multiple map / sort / reduce passes.

You write a stream of items of type T implementing Serialize and Deserialize to a ShardWriter. The items are buffered, sorted according to a customizable sort key, then serialized to disk in chunks with serde + lz4, while maintaining an index of the position and key range of each chunk. You use a ShardReader to stream through a item in a selected interval of the key space, in sorted order.

See Docs for API and examples.

Note: Enable the 'full-test' feature in Release mode to turn on some long-running stress tests.

About

Out-of-memory sorting of large datasets map / reduce style processing

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages