-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support datatype struct reorder #4962
Conversation
Signed-off-by: fan <[email protected]>
pub fn reverse(&mut self) { | ||
let new_fields: Vec<FieldRef> = self.iter().rev().map(|f| f.clone() as FieldRef).collect(); | ||
self.0 = Arc::new(new_fields); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here only support reverse, but should we provide more function like this?
Signed-off-by: fan <[email protected]>
Looks i break the python binding API. 😢 |
@@ -41,7 +41,7 @@ use std::sync::Arc; | |||
#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Hash)] | |||
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] | |||
#[cfg_attr(feature = "serde", serde(transparent))] | |||
pub struct Fields(Arc<[FieldRef]>); | |||
pub struct Fields(Arc<Vec<FieldRef>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think change [FieldRef]
to Vec<FieldRef]
is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think change
[FieldRef]
toVec<FieldRef]
is necessary
Rust [] is a fixed size vector in compile time, if we want to push a field, should use ‘Vec’?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary, as a result of unsized coercion - https://doc.rust-lang.org/std/ops/trait.CoerceUnsized.html
https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html provides an easy to use interface on top of this, but Fields
also implements FromIterator
and so can be collected into directly
This PR try to fix the failed CI: #4957 |
hope it can merge quickly. |
self.0 = Arc::new(new_fields); | ||
} | ||
|
||
pub fn push(&mut self, field: Field) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be extremely inefficient, performing multiple allocations each time, I would recommend using https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But SchemaBuilder
also use Vec<FieldRef>
, And Vector
is not expanding too often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal in this PR will add an additional pointer indirection to field access, whilst still requiring atomics for every call to push.
I think the better question is, why not use SchemaBuilder, it will be faster, less complicated, and avoid having to make changes to Fields?
TLDR is I don't think any of the changes in this PR should be necessary for #4908
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better question is, why not use SchemaBuilder, it will be faster, less complicated, and avoid having to make changes to Fields?
The problem is that the transformation of Fields into SchemaBuilder appears to be performance intensive.
Here it look like we can only use
SchemaBuilder::from(self.clone());
The result could be something like this:
pub fn push(&mut self, field: Field) {
let mut new_fields = SchemaBuilder::from(self.clone());
new_fields.push(field);
*self = new_fields.finish().fields;
}
I am not sure why ShemaBuilder
is better than old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting not adding these methods to Schema and instead using SchemaBuilder instead where you were intending to use them.
As you've discovered Schema, like the arrays, is not designed to be mutated in place, especially given it is normally wrapped in an Arc itself as a SchemaRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting not adding these methods to Schema and instead using SchemaBuilder instead where you were intending to use them.
Really thanks for you. ❤️ So we need an function like
pub fn push(&mut self, builder: &mut SchemaBuilder, field: Field) { }
When the user want to push a new field, should take the builder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they'd just use https://docs.rs/arrow-schema/latest/arrow_schema/struct.SchemaBuilder.html#method.push
I'm suggesting not trying to make Schema support in place mutation, it isn't designed for it, SchemaBuilder is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaBuilder has support push
, so what else do i need to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing, I am suggesting we close this PR, and you work on implementing #4908 using SchemaBuilder
@@ -50,9 +50,13 @@ impl std::fmt::Debug for Fields { | |||
} | |||
|
|||
impl Fields { | |||
pub fn new(fields: Arc<Vec<FieldRef>>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We provide From
implementations that would supercede this
Which issue does this PR close?
Closes #4908.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?