Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up alignment #115

Open
lwwmanning opened this issue Mar 18, 2024 · 2 comments
Open

Clean up alignment #115

lwwmanning opened this issue Mar 18, 2024 · 2 comments

Comments

@lwwmanning
Copy link
Member

lwwmanning commented Mar 18, 2024

We need to decide to what level of fidelity we want to enforce alignment. Currently, we enforce nothing but then occasionally run into interop issues with Arrow. To work around these, we ensure that Arrow does the allocation (e.g., on the VarBinView builder). This is kludgy and error-prone.

An alternative option would be to require or encourage all buffers allocated by the project to be aligned to some value (e.g., the platform-specific arrow_buffer::alloc::ALIGNMENT, or the max of that and 64/128 bytes, etc).

If we require all of our buffers to be aligned, then that will imply copying unaligned Vecs passed by users (arrow-rs doesn't copy), so we should probably make it obvious & less ergonomic to do so.

@lwwmanning lwwmanning changed the title clean up Alignment clean up alignment Mar 18, 2024
@lwwmanning lwwmanning changed the title clean up alignment Clean up alignment Mar 18, 2024
@lwwmanning
Copy link
Member Author

Related to #7

@gatesn
Copy link
Contributor

gatesn commented Dec 17, 2024

Current plan:

  • VortexReadAt should not assume alignment of the result bytes.
  • MessageReader copies to enforce alignment as necessary. The reader is configurable with minimum alignment (default to 64 bytes), and we should also specify minimum alignment in the Buffer flatbuffer. This means we can support VarBin's buffers having 1 byte alignment and never copying, and FastLanes buffers requiring 128 byte alignment.
  • File/MessageWriter can be configured to include padding or not. With padding included, the read may have slightly fewer copies (though the reader will always check min alignment). This is similar concept to the writer choosing compression codecs that are faster/slower to decode. Writer's choice. Always. Note we also need to be able to pass the underlying file / stream position to the encoder so it can be globally aligned, not just locally.
  • Update vortex-buffer to use Arc<dyn BytesOwner>, and if the performance hit is too large, then use a VTable approach similar to bytes::Bytes. The justification for vortex-buffer is that we want the ability to call into_mut on aligned things. bytes::Bytes has kept its VTable private, so we cannot do this for anything other than Vec, for which we cannot configure larger alignment.
  • Introduce an ArrayBuffer struct that wraps ArrayBuffer{ buffer: Buffer, min_align: usize } so we can capture this information in an ArrayData.
  • Add a new ScalarBuffer for passing around typed buffers.
  • Move IoBuf trait into vortex-io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants