Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode value 1 by 1 to avoid allocating mem for posting lists #267

Merged
merged 1 commit into from
Jan 3, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 6 additions & 10 deletions rs/index/src/ivf/writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -183,19 +183,15 @@ impl<Q: Quantizer, C: IntSeqEncoder + 'static> IvfWriter<Q, C> {
metadata_bytes_written +=
wrap_write(&mut metadata_writer, &num_posting_lists.to_le_bytes())?;
for i in 0..num_posting_lists {
// TODO(tyb): we need to materialize the posting list here since we are
// not sure the whole list is on the same page. Optimize this in a separate PR
let posting_list = ivf_builder
.posting_lists()
.get(i as u32)?
.iter()
.collect::<Vec<_>>();
let posting_list = ivf_builder.posting_lists().get(i as u32)?;
let mut encoder = C::new_encoder(
*posting_list.last().unwrap_or(&0) as usize,
posting_list.len(),
posting_list.last().unwrap_or(0) as usize,
posting_list.elem_count,
);
// Encode to get the length of the encoded data
encoder.encode_batch(&posting_list)?;
for val in posting_list.iter() {
encoder.encode_value(&val)?;
}
// Write the length of the encoded posting list
metadata_bytes_written +=
wrap_write(&mut metadata_writer, &encoder.len().to_le_bytes())?;
Expand Down
Loading