Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IPC] Can we serialize just arrays or scalars? #44736

Open
JerAguilon opened this issue Nov 14, 2024 · 0 comments
Open

[IPC] Can we serialize just arrays or scalars? #44736

JerAguilon opened this issue Nov 14, 2024 · 0 comments
Labels
Component: C++ Type: usage Issue is a user question

Comments

@JerAguilon
Copy link
Contributor

Describe the usage question you have. Please include as many useful details as possible.

I have a use case where I want to send a fairly small arrays/scalars via Arrow IPC. Memory efficiency is quite important for us.

One simple way of doing this would be to put the ints in a Record batch and serialize.

However, I notice that serializing an int, 3 chars, and a bool takes a whopping 512 bytes on my machine:

https://gist.github.com/JerAguilon/8c70107a576c0b04b5846ad8207651e7

STDOUT:
batch: pyarrow.RecordBatch
a: int64
b: string
c: bool, num_rows: 1
buf.size: 512

Things like the schema are well known ahead of time, I'd like to minimize as much data in the payload as possible.

In Arrow IPC (the C++ library works for me), is there some way to serialize just a scalar or array? I see only a RecordBatchWriter exposed:

https://arrow.apache.org/docs/cpp/api/ipc.html#_CPPv416MakeStreamWriterPN2io12OutputStreamERKNSt10shared_ptrI6SchemaEERK15IpcWriteOptions

Component(s)

C++

@JerAguilon JerAguilon added the Type: usage Issue is a user question label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Type: usage Issue is a user question
Projects
None yet
Development

No branches or pull requests

1 participant