Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SkipTo for iter::Iter #287

Closed
wants to merge 16 commits into from
Closed
16 changes: 16 additions & 0 deletions src/bitmap/container.rs
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,13 @@ impl Container {
}
};
}

pub(crate) fn skip_to_iter(&self, n: u16) -> Iter<'_> {
Iter {
key: self.key,
inner: self.store.skip_to_iter(n),
}
}
}

impl BitOr<&Container> for &Container {
Expand Down Expand Up @@ -313,3 +320,12 @@ impl fmt::Debug for Container {
format!("Container<{:?} @ {:?}>", self.len(), self.key).fmt(formatter)
}
}

impl<'a> Iter<'a> {
pub fn empty() -> Self {
Self {
key: 0,
inner: store::Iter::Empty,
}
}
}
67 changes: 65 additions & 2 deletions src/bitmap/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,17 @@
use core::slice;

use super::container::Container;
use crate::{NonSortedIntegers, RoaringBitmap};
use crate::{NonSortedIntegers, RoaringBitmap, SkipTo};

#[cfg(not(feature = "std"))]
use alloc::vec::Vec;
use crate::bitmap::container;
use crate::bitmap::util::split;

/// An iterator for `RoaringBitmap`.
pub struct Iter<'a> {
// needed to support skip_to
containers: &'a [Container],
inner: iter::Flatten<slice::Iter<'a, Container>>,
size_hint: u64,
}
Expand All @@ -23,7 +27,7 @@
impl Iter<'_> {
fn new(containers: &[Container]) -> Iter {
let size_hint = containers.iter().map(|c| c.len()).sum();
Iter { inner: containers.iter().flatten(), size_hint }
Iter { containers, inner: containers.iter().flatten(), size_hint }
}
}

Expand Down Expand Up @@ -298,3 +302,62 @@
Ok(count)
}
}

impl<'a> SkipTo for Iter<'a> {
type Iter = SkipToIter<'a>;

fn skip_to(self, n: u32) -> Self::Iter {
SkipToIter::new(&self.containers, n)

Check failure on line 310 in src/bitmap/iter.rs

View workflow job for this annotation

GitHub Actions / ci (nightly)

this expression creates a reference which is immediately dereferenced by the compiler
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: using the full unmodified self.containers means this might skip backwards.

To do this optimally, we might need to somewhat re-implement std::iter::Flatten so we can access the slice::Iter directly (from which we can get the remaining slice), and access to the current container iterator. :/

Copy link
Contributor Author

@grelner grelner Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? self.containers is just a reference to RoaringBitmap.containers. It was added in this PR
edit: ah, self.containers is RoaringBitmap.containers here. Then there's something I don't understand here if they can be backwards 😅

I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.
Still, are all these changes worth it to have a double ended iterator and cover the kinda weird case of skip_to to a point the iterator has already passed?

Copy link
Member

@Kerollmops Kerollmops Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @grelner 👋

Sorry for the late reply, I'm in vacation 🌞

To do this optimally, we might need to somewhat re-implement std::iter::Flatten so we can access the slice::Iter directly (from which we can get the remaining slice), and access to the current container iterator.

I agree with what @Dr-Emann explains here. This is, to me, the best way to implement this iterator. Flatten is not hard to implement. We can have a special Flatten type with all the original Flatten type logic but dedicated to slices and bitmaps so that it can skip groups according to the split high group (skipping whole slices or bitmaps if the number we want to find is not in this group).

I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.

Let's forget about the SkipTo wrapper and only implement a non-lazy method like Iterator::advance_by method that directly advances the iterator to the requested number. This method, skip_to, is only available on the bitmap::Iter/IntoIter types, so there is no need to think about std::iter::Rev iterator adaptator.

pub struct SkipToIter<'a> {
container_index: usize,
containers: &'a [Container],
iter: container::Iter<'a>
}

impl<'a> SkipToIter<'a> {
fn new(containers: &'a [Container], n: u32) -> Self {
let (key, v) = split(n);
match containers.binary_search_by_key(&key, |e| e.key) {
Ok(index) | Err(index) => {
if index == containers.len() {
// there are no containers with a key <= n. Return empty
Self {
container_index: 0,
containers: &[],
iter: container::Iter::empty(),
}
} else {
Self {
container_index: 0,
containers: &containers[index..],
iter: (&containers[index]).skip_to_iter(v),

Check failure on line 335 in src/bitmap/iter.rs

View workflow job for this annotation

GitHub Actions / ci (nightly)

this expression borrows a value the compiler would automatically borrow
}
}
}
}
}
}

impl Iterator for SkipToIter<'_> {
type Item = u32;

fn next(&mut self) -> Option<Self::Item> {
if self.containers.is_empty() {
return None
}
let next = self.iter.next();
if next.is_some() {
next
} else if self.container_index == self.containers.len() - 1 {
None
} else {
self.container_index += 1;
self.iter = (&self.containers[self.container_index]).into_iter();
// we can call self.iter.next() directly here if we know that a
// container can't be empty. Do we?
self.next()
}
}
}
43 changes: 42 additions & 1 deletion src/bitmap/store/array_store/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@

#[cfg(not(feature = "std"))]
use alloc::boxed::Box;

use super::bitmap_store::{bit, key, BitmapStore, BITMAP_LENGTH};

#[derive(Clone, Eq, PartialEq)]
Expand Down Expand Up @@ -227,6 +226,18 @@
self.vec.iter()
}

pub fn skip_to_iter(&self, n: u16) -> core::slice::Iter<u16> {
match self.vec.as_slice().binary_search(&n) {
Ok(index) | Err(index) => {
if index == self.vec.len() {
[].iter()
} else {
(&self.vec[index..]).into_iter()

Check failure on line 235 in src/bitmap/store/array_store/mod.rs

View workflow job for this annotation

GitHub Actions / ci (nightly)

this `.into_iter()` call is equivalent to `.iter()` and will not consume the `slice`
}
}
}
}

pub fn into_iter(self) -> alloc::vec::IntoIter<u16> {
self.vec.into_iter()
}
Expand Down Expand Up @@ -434,6 +445,36 @@
}
}

#[test]
fn test_skip_to_iter() {
let mut store = ArrayStore::new();
store.insert_range(0..=16);
store.insert_range(32..=64);

// test direct hits
let mut iter = store.skip_to_iter(32);
for n in 32..=64 {
assert_eq!(iter.next(), Some(&n));
}

// test index in gap
let mut i = store.skip_to_iter(17);
assert_eq!(i.next(), Some(&32));

// test index after end
let mut i = store.skip_to_iter(65);
assert_eq!(i.next(), None);

// test last value
let mut i = store.skip_to_iter(64);
assert_eq!(i.next(), Some(&64));

// test first value
let mut i = store.skip_to_iter(0);
assert_eq!(i.next(), Some(&0));

}

#[test]
#[allow(clippy::reversed_empty_ranges)]
fn test_array_insert_invalid_range() {
Expand Down
61 changes: 61 additions & 0 deletions src/bitmap/store/bitmap_store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,10 @@ impl BitmapStore {
BitmapIter::new(&self.bits)
}

pub(crate) fn skip_to_iter(&self, n: u16) -> BitmapIter<&[u64; BITMAP_LENGTH]> {
BitmapIter::new_from(&self.bits, n)
}

pub fn into_iter(self) -> BitmapIter<Box<[u64; BITMAP_LENGTH]>> {
BitmapIter::new(self.bits)
}
Expand Down Expand Up @@ -421,6 +425,34 @@ impl<B: Borrow<[u64; BITMAP_LENGTH]>> BitmapIter<B> {
bits,
}
}

fn new_from(bits: B, n: u16) -> BitmapIter<B> {
let key = key(n);
let mut value = bits.borrow()[key];

// There is probably a way to calculate value much more efficiently
// using bit(n) and some bit magic, but I haven't figured that one out
let mut prev_value;
loop {
if value == 0 {
prev_value = 0;
break;
}
let index = value.trailing_zeros() as usize;
prev_value = value;
value &= value - 1;
if (64 * key + index) as u16 >= n {
break;
}
}
BitmapIter {
key,
value: prev_value,
key_back: BITMAP_LENGTH - 1,
value_back: bits.borrow()[BITMAP_LENGTH - 1],
bits,
}
}
}

impl<B: Borrow<[u64; BITMAP_LENGTH]>> Iterator for BitmapIter<B> {
Expand Down Expand Up @@ -556,6 +588,35 @@ impl BitXorAssign<&ArrayStore> for BitmapStore {
mod tests {
use super::*;

#[test]
fn test_skip_to_iter() {
let mut store = BitmapStore::new();
store.insert_range(0..=16);
store.insert_range(32..=64);

// test direct hits
let mut iter = store.skip_to_iter(32);
for n in 32..=64 {
assert_eq!(iter.next(), Some(n));
}

// test index in gap
let mut i = store.skip_to_iter(17);
assert_eq!(i.next(), Some(32));

// test index after end
let mut i = store.skip_to_iter(65);
assert_eq!(i.next(), None);

// test last value
let mut i = store.skip_to_iter(64);
assert_eq!(i.next(), Some(64));

// test first value
let mut i = store.skip_to_iter(0);
assert_eq!(i.next(), Some(0));

}
#[test]
fn test_bitmap_remove_smallest() {
let mut store = BitmapStore::new();
Expand Down
14 changes: 14 additions & 0 deletions src/bitmap/store/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ pub enum Iter<'a> {
Vec(vec::IntoIter<u16>),
BitmapBorrowed(BitmapIter<&'a [u64; BITMAP_LENGTH]>),
BitmapOwned(BitmapIter<Box<[u64; BITMAP_LENGTH]>>),
Empty
}

impl Store {
Expand Down Expand Up @@ -218,6 +219,17 @@ impl Store {
Bitmap(_) => self.clone(),
}
}

pub(crate) fn skip_to_iter(&self, n: u16) -> Iter<'_> {
match &self {
Array(a) => {
Iter::Array(a.skip_to_iter(n))
}
Bitmap(bm) => {
Iter::BitmapBorrowed(bm.skip_to_iter(n))
}
}
}
}

impl Default for Store {
Expand Down Expand Up @@ -506,6 +518,7 @@ impl<'a> Iterator for Iter<'a> {
Iter::Vec(inner) => inner.next(),
Iter::BitmapBorrowed(inner) => inner.next(),
Iter::BitmapOwned(inner) => inner.next(),
Iter::Empty => None,
}
}
}
Expand All @@ -517,6 +530,7 @@ impl DoubleEndedIterator for Iter<'_> {
Iter::Vec(inner) => inner.next_back(),
Iter::BitmapBorrowed(inner) => inner.next_back(),
Iter::BitmapOwned(inner) => inner.next_back(),
Iter::Empty => None,
}
}
}
9 changes: 9 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ extern crate byteorder;

#[macro_use]
extern crate alloc;
extern crate core;

use core::fmt;

Expand Down Expand Up @@ -93,3 +94,11 @@ pub trait MultiOps<T>: IntoIterator<Item = T> {
/// The `symmetric difference` between all elements.
fn symmetric_difference(self) -> Self::Output;
}

/// Get an iterator that yields values >= `n`
pub trait SkipTo {
/// The type of iterator that will be returned
type Iter: Iterator;
/// Get an iterator that yields values >= `n`
fn skip_to(self, n: u32) -> Self::Iter;
}
18 changes: 18 additions & 0 deletions tests/skip_to.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
use roaring::{RoaringBitmap, SkipTo};

#[test]
fn basic() {
let bm = RoaringBitmap::from([1,2,3,4,11,12,13,14]);
let mut i = bm.iter().skip_to(10);
for n in 11..=14 {
assert_eq!(i.next(), Some(n))
}
assert_eq!(i.next(), None);

}

#[test]
fn empty() {
let bm = RoaringBitmap::new();
assert_eq!(bm.iter().skip_to(31337).next(), None)
}
Loading