Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packed Boolean Arrays #56

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

Davipb
Copy link

@Davipb Davipb commented Feb 25, 2017

Right now, when a boolean array is serialized, every boolean is written to a byte, even though only one bit of the byte is ever used, leading to a waste of space.

This pull request introduces PackAttribute. When a boolean array is marked with [Pack], its contents are "packed" into the bits of the resulting bytes, which means we can now put up to 8 booleans in a single byte (reducing by 8 times the space usage!).

For example, the array {true, false, true, false, true, false, true, false} would normally be serialized into 8 bytes: 0x01 0x00 0x01 0x00 0x01 0x00 0x01 0x00. If marked with [Pack], they now fit into a single byte: 1010 1010 or 0xAA (On Big-Endian mode, see below).

Endianness is respected. On Little-Endian mode, the bools are written from the Least-Significant Bit (LSB) to the Most-Significant Bit (MSB): 0000 000X, 0000 00X0, 0000 0X00, etc. On Big-Endian, they're written from the MSB to the LSB: X000 0000, 0X00 0000, 00X0 0000, etc.

Boolean Arrays not marked with [Pack] do not change, which means this addition is 100% backwards-compatible.

@jefffhaynes
Copy link
Owner

jefffhaynes commented Feb 25, 2017 via email

@Davipb
Copy link
Author

Davipb commented Feb 25, 2017

This is mostly a way to save space when storing a boolean array, not an ad-hoc bitfield creator. A bitfield lets you change a fixed set of flags, but a packed boolean array allows you to store any arbitrary number of booleans in a manner that saves space.

The "bit manipulation" is only done inside the boolean array's serializer, so it doesn't affect anything else in the library: An array with 9 booleans will take 2 bytes, and the deserializer will use the Field Count to extract the proper number of booleans from the bytes.

It can be used, for example, to store a 2D Game Map's tile collision data, which would be a big boolean array, in the hundreds or thousands. Going from 1000 bytes for a 100 x 100 map to 125 bytes is a huge improvement.

As the for the Endianness, it seemed like a good choice of a way to let the user configure in which order the booleans are stored (Since the library's goal is to give you as much control over the output as possible). But the same could be done by adding a property to PackAttribute, or by just defining a standard storage order, so that can easily be changed.

@jefffhaynes
Copy link
Owner

jefffhaynes commented Feb 25, 2017 via email

@Davipb Davipb force-pushed the packed-bools branch 3 times, most recently from 238a66f to d399a46 Compare March 4, 2017 15:32
@Davipb Davipb force-pushed the packed-bools branch 2 times, most recently from 760f928 to ff1e13c Compare March 18, 2017 19:21
@ChrisonSimtian
Copy link

o think about i

Hey, just wondering, are you still considering adding this to your Serializer?
I have an actual usecase where I communicate with a PLC (low level code CPU from Siemens for steering machines and electrical components). Those folks dont have as much memory as we do on a PC, so saving a byte here and there is actually crucial to them. And they also love to do crazy things like having 30 spare bools in their data packets...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants