-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44065: [Java] Implement C Data Interface for RunEndEncodedVector #44241
Conversation
ViggoC
commented
Sep 26, 2024
•
edited by github-actions
bot
Loading
edited by github-actions
bot
- GitHub Issue: [Java] Request to support Run-End Encoded Layout in Java C Data interface #44065
@ViggoC it would be better to add a test here as well https://github.com/apache/arrow/blob/main/java/c/src/test/python/integration_tests.py |
@vibhatha Your response is really quickly. I am still trying to understand your PR #41967, as well as how to run tests. If you could give me some guidance, it would be of great help. |
Right, so there are few things we have to do here. For integration tests, you need to make sure to use One more important thing is the If you have any questions, please ask here. And thanks for pushing this effort. |
cc @raulcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only went through one quick pass, I will review again.
@github-actions crossbow submit -g java |
Revision: b34aca3 Submitted crossbow builds: ursacomputing/crossbow @ actions-302154c076 |
java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Override | ||
public void splitAndTransfer(int startIndex, int length) { | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue filed for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have, but we should generally complete split and transfer features before we finalise the C Data interface. IMHO it is important to cover that in this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least this was the approach we took for StringView and ListView before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only implemented the necessary functions to pass through RoundTripTest. I'll implement it in this PR, if you all think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ViggoC if this requires a lot more code or time, let's do it later, it is not a must. But later means as an immediate next feature. I was merely mentioning my practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not time consuming, I'll implement it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll wait for that.
|
||
@Override | ||
public void copyValueSafe(int from, int to) { | ||
this.to.copyFrom(from, to, RunEndEncodedVector.this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this will just throw because we don't implement copyFrom, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just waiting for splitAndTransfer though let me know if you'd prefer to split it out after all.
*/ | ||
@Override | ||
public void splitAndTransfer(int startIndex, int length) { | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll wait for that.
java/vector/src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments.
int physicalEndIndex, | ||
int physicalLength) { | ||
toRunEndVector.setValueCount(physicalLength); | ||
toRunEndVector.getValidityBuffer().setOne(0, toRunEndVector.getValidityBuffer().capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. Could you please take a look how other vectors are doing this for validity buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ListVector, it slice the validityBuffer if startIndex % 8 == 0, and copy data one by one when the first bit starts from the middle of a byte.
But for run end encoded vector, the element of run end vector can never be null so I just set validity buffer of RunEndVector to 1.
What's your concern about this code? The memory of validity buffer should be reused when startIndex % 8 == 0, Or we should not set the bit beyond the physical length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ViggoC not saying it's wrong, I mentioned I wasn't sure.
java/vector/src/test/java/org/apache/arrow/vector/TestRunEndEncodedVector.java
Outdated
Show resolved
Hide resolved
fix nits comment Co-authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
java/vector/src/main/java/org/apache/arrow/vector/complex/RunEndEncodedVector.java
Outdated
Show resolved
Hide resolved
int physicalEndIndex, | ||
int physicalLength) { | ||
toRunEndVector.setValueCount(physicalLength); | ||
toRunEndVector.getValidityBuffer().setOne(0, toRunEndVector.getValidityBuffer().capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine
int physicalLastIndex = physicalLength - 1; | ||
if (toRunEndVector instanceof SmallIntVector) { | ||
byte typeWidth = SmallIntVector.TYPE_WIDTH; | ||
for (int i = 0; i < physicalLastIndex; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the loop go from [0, physicalLength)
or [physicalStartIndex, physicalLastIndex)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it through [0, physicalLength - 1), and handle physicalLength - 1
separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this loop is from [0, physicalLastIndex)
- aren't we "crossing indices" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
physicalLastIndex = physicalLength - 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Sorry, the naming is confusing, IMO - physicalLastIndex
feels like it should go with physicalStartIndex
and not physicalLength
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the naming is kind of confusing, I renamed them, does it make sense for you now?
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit b175463. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |