Partition IO-bound tasks by a key and execute each within a coroutine, maintaining FIFO-order per group and increased throughput compared to thread-based approaches.
val partitionQueue = CoroutinePartitionQueue(5) // Configure number of partitions
val processRow: suspend (row: Row) -> Unit = {} // Any suspendable function
for(row in data) {
partitionQueue.enqueue(row.key) {
// This callback will be executed within a coroutine context
proccessRow(row)
}
}
// Now wait for all rows to be processed
partitionQueue.await()
Add the jitpack repository to your build.gradle
repositories {
...
maven { url 'https://jitpack.io' }
}
and then add the repository as a dependency in your project
dependencies {
implementation 'com.github.champloo11:coroutine-partition-queue:v1.0.0'
}
Often (particularly in batch processing) we find ourselves needing to process a batch of records where:
- The records need to be processed in a particular order
- But the records do not need to be processed in a global order
This means each record can be divied up into a group of tasks (partitioned), and as long as all the groups are processed order respective to all the records in the group, we will have properly processed them.
Often (but not always) the records are:
- Being streamed into our code (via a database, object store, or client)
- Being written back out through the network to a database, object store, or client
This library has uses in:
- Extract-Transform-Load (ETL) processing
- Server side request/response queuing
- etc...
This library is NOT meant for processing large amounts of CPU-intensive data, and is better suited towards workloads that send a lot of data over IO.
There are four primary components CoroutinePartitionQueue
Tasks are units of execution that can be partitioned into groups, known as a key.
A thread that loops through each partition queue and pops off a task when that partition is ready for more work, and executes that task in a coroutine.
Task distribution is handled by the high quality, fast, non-cryptographic hash function MurmurHash3 modulus the number of partitions.
A ConcurrentLinkedQueue that can be written to safely while the executor is pulling work off of the queue.