Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support datatype struct reorder #4962

Closed
wants to merge 2 commits into from

Conversation

fansehep
Copy link
Contributor

Which issue does this PR close?

Closes #4908.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 20, 2023
Comment on lines +91 to +94
pub fn reverse(&mut self) {
let new_fields: Vec<FieldRef> = self.iter().rev().map(|f| f.clone() as FieldRef).collect();
self.0 = Arc::new(new_fields);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here only support reverse, but should we provide more function like this?

@fansehep
Copy link
Contributor Author

Looks i break the python binding API. 😢

@@ -41,7 +41,7 @@ use std::sync::Arc;
#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[cfg_attr(feature = "serde", serde(transparent))]
pub struct Fields(Arc<[FieldRef]>);
pub struct Fields(Arc<Vec<FieldRef>>);
Copy link
Contributor

@Folyd Folyd Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think change [FieldRef] to Vec<FieldRef] is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think change [FieldRef] to Vec<FieldRef] is necessary

Rust [] is a fixed size vector in compile time, if we want to push a field, should use ‘Vec’?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary, as a result of unsized coercion - https://doc.rust-lang.org/std/ops/trait.CoerceUnsized.html

https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html provides an easy to use interface on top of this, but Fields also implements FromIterator and so can be collected into directly

@Folyd
Copy link
Contributor

Folyd commented Oct 20, 2023

This PR try to fix the failed CI: #4957

@fansehep
Copy link
Contributor Author

fansehep commented Oct 20, 2023

This PR try to fix the failed CI: #4957

hope it can merge quickly.

self.0 = Arc::new(new_fields);
}

pub fn push(&mut self, field: Field) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be extremely inefficient, performing multiple allocations each time, I would recommend using https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But SchemaBuilder also use Vec<FieldRef>, And Vector is not expanding too often.

Copy link
Contributor

@tustvold tustvold Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal in this PR will add an additional pointer indirection to field access, whilst still requiring atomics for every call to push.

I think the better question is, why not use SchemaBuilder, it will be faster, less complicated, and avoid having to make changes to Fields?

TLDR is I don't think any of the changes in this PR should be necessary for #4908

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better question is, why not use SchemaBuilder, it will be faster, less complicated, and avoid having to make changes to Fields?

The problem is that the transformation of Fields into SchemaBuilder appears to be performance intensive.
Here it look like we can only use

SchemaBuilder::from(self.clone());

The result could be something like this:

    pub fn push(&mut self, field: Field) {
        let mut new_fields = SchemaBuilder::from(self.clone());
        new_fields.push(field);
        *self = new_fields.finish().fields;
    }

I am not sure why ShemaBuilder is better than old?

Copy link
Contributor

@tustvold tustvold Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting not adding these methods to Schema and instead using SchemaBuilder instead where you were intending to use them.

As you've discovered Schema, like the arrays, is not designed to be mutated in place, especially given it is normally wrapped in an Arc itself as a SchemaRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting not adding these methods to Schema and instead using SchemaBuilder instead where you were intending to use them.

Really thanks for you. ❤️ So we need an function like

pub fn push(&mut self, builder: &mut SchemaBuilder, field: Field) { }

When the user want to push a new field, should take the builder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they'd just use https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html#method.push

I'm suggesting not trying to make Schema support in place mutation, it isn't designed for it, SchemaBuilder is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaBuilder has support push, so what else do i need to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, I am suggesting we close this PR, and you work on implementing #4908 using SchemaBuilder

@@ -50,9 +50,13 @@ impl std::fmt::Debug for Fields {
}

impl Fields {
pub fn new(fields: Arc<Vec<FieldRef>>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We provide From implementations that would supercede this

@fansehep fansehep closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support StructArray in Cast Kernel
3 participants