-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement iterator for StructArray #593
Comments
I am curious exactly what type of struct you're iterating over and what you're trying to produce? There are a number of strategies for doing this depending on what you need (and if you know at compile time what your types are). |
Here's some code that I specifically crafted in my last video on nanoarrow: Essentially I am going through each stream of a pa.Table, go column-by-column, and then iterate the values within each column. The stream / array value iteration already have C++ iterators, but the column-by-column iteration is a classic loop for (const auto &chunk : array_stream) {
for (decltype(schema->n_children) i = 0; i < schema->n_children; ++i) {
nanoarrow::UniqueArrayView array_view;
ArrowArrayViewInitFromSchema(array_view.get(), schema->children[i], &error);
NANOARROW_THROW_NOT_OK(
ArrowArrayViewSetArray(array_view.get(), chunk.children[i], &error));
for (const auto value :
nanoarrow::ViewArrayAs<int64_t>(array_view.get())) {
// do something with the values of each array here
}
}
} |
I am probably the wrong person to ask here since I don't mind classic loops and the iteration that I usually have to do is to convert between row-oriented systems and Arrow (e.g., database drivers). There is definitely appetite to interact with Arrow from C++ and I'm not sure I have the answers about the scope of that or if nanoarrow is the right place! Tiny nit: you can re-use the same |
Yea if you do row-oriented iteration I think there is less value. Maybe there should be a way to differentiate how you want to iterate? For column iteration, I think something of the form: for (const auto &chunk : array_stream) {
for (const auto& [schema_view, array_view] : chunk.Columns()) {
// maybe do something with the schema here, like init an ArrowDecimal from precision / scale
for (const auto value :
nanoarrow::ViewArrayAs<int64_t>(array_view.get())) {
// do something with the values of each array here
}
}
} would make for an idiomatic C++ solution. |
Right now if you were to use the C++ library with nanoarrow and read in a stream of two dimensional objects, you would:
ViewArrayStream
class to iterate the streamViewArrayAs
class to iterate the individual array viewsNot sure if this falls in the scope of nanoarrow or if its something for sparrow, but I think it would make sense to add an iterator for step 2
The text was updated successfully, but these errors were encountered: