Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump load #15

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Dump load #15

wants to merge 1 commit into from

Conversation

yegct
Copy link
Contributor

@yegct yegct commented Apr 13, 2022

What

This PR allows BitArray to be dumped to disk and subsequently loaded from disk. It also adds the ability to union two bitarrays. And there's a new class, BitArrayFile, allowing for read-only access to a dumped BitArray, providing a means to query the array without having to load it in to RAM.

Tests

$ ruby test/test_bit_array.rb 
Run options: --seed 52069

# Running:

.................

Finished in 0.007989s, 2127.8684 runs/s, 527461.0175 assertions/s.

17 runs, 4214 assertions, 0 failures, 0 errors, 0 skips


$ ruby test/test_bit_array_file.rb 
Run options: --seed 36015

# Running:

....

Finished in 0.003689s, 1084.2876 runs/s, 37950.0675 assertions/s.

4 runs, 140 assertions, 0 failures, 0 errors, 0 skips

Additionally, I tested this with a bitarray consisting of a little over 2 billion bits. I was able to dump, load, and use BitArrayFile to query the bits.

attr_reader :field, :reverse_byte, :size
include Enumerable

VERSION = "1.4.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from 1.3.0.

include Enumerable

VERSION = "1.4.0"
HEADER_LENGTH = 8 + 1 # QC (@size, @reverse_byte)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use this here, but need it for BitArrayFile.


def ==(rhs)
@size == rhs.size && @reverse_byte == rhs.reverse_byte && @field == rhs.field
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the ability to compare two BitArray objects. This makes it a bit easier to add tests.

end

combined
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this because it allows parallelising bloom filter creation. I don't personally need this method, but figured it might be useful for others.

private def byte_position(position)
@reverse_byte ? position : 7 - position
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following two methods, dump and load, are new.

private def seek_to(position)
@io.seek(position + HEADER_LENGTH)
end
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually fairly easy to implement the rest of BitArray, allowing this to be read/write instead of read-only. But it's quite a bit slower, at least for large bitarrays and nobody's likely to care for small bitarrays. Seemed to be unnecessary bloat.

@@ -0,0 +1,65 @@
require "minitest/autorun"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm more familiar with rspec. Please excuse me if these tests aren't how most people would write Minitest tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, Minitest is to RSpec as Sinatra is to Rails. Minitest is far less opinionated and freeform which has upsides and downsides, but a big upside is pretty much any approach that works and fits into a project's own style is all good! :)

@yegct yegct mentioned this pull request Apr 13, 2022
@peterc
Copy link
Owner

peterc commented May 3, 2022

I've merged the previous one but this will take a little bit more review - thanks for now though!

@yegct yegct force-pushed the dump_load branch 2 times, most recently from 73f65aa to 9c070df Compare May 3, 2022 20:37
@yegct
Copy link
Contributor Author

yegct commented May 3, 2022

I've merged the previous one but this will take a little bit more review - thanks for now though!

Rebased off of main. If you don't like/want the union, let me know and I'll drop that (and associated tests). If you want me to do some cleanup, happy to do that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants