Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement basic struct handling #91

Merged
merged 18 commits into from
Oct 9, 2024

Conversation

EpsilonPrime
Copy link
Contributor

@EpsilonPrime EpsilonPrime commented Oct 8, 2024

Adds the capability to create structs one level deep using the Spark struct() data frame API.

To assist with this functionality the field handling data structure has been upgraded from a
string reference into a full type (Field) allowing for tracking of additional names today and
precise type tracking in the future.

Future PRs will add arbitrary type adding and will make getField() work on structures.

@EpsilonPrime EpsilonPrime marked this pull request as draft October 8, 2024 04:15
Copy link

github-actions bot commented Oct 8, 2024

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@EpsilonPrime EpsilonPrime marked this pull request as ready for review October 8, 2024 07:11
@EpsilonPrime
Copy link
Contributor Author

@mbrobbel Could you take a look at this PR please?

src/backends/arrow_tools.py Show resolved Hide resolved
src/backends/tests/arrow_tools_test.py Outdated Show resolved Hide resolved
src/backends/tests/arrow_tools_test.py Outdated Show resolved Hide resolved
src/backends/tests/arrow_tools_test.py Show resolved Hide resolved
Comment on lines 150 to 152
if not current_symbol:
raise InternalError(
f'Could not find plan id {self._current_plan_id} constructed earlier.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this lookup code to a separate method? Because it's used in many places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few approaches here. All of this is added just to make pyright happy so I could just remove all of the checks and the code would run fine. I also considered moving the check into the code that does the lookup but there are a few cases where None is appropriate. I can remove these for now and can add back a better solution when a pyright github action is added.

Copy link
Member

@mbrobbel mbrobbel Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I would be in favor of keeping them until we can improve.
Edit: looks like you've already removed them, maybe add a task to track adding these checks again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add an issue for adding pyright, that would require fixing all of the issues (about 100) mostly involving type|None mismatches.

src/gateway/converter/spark_to_substrait.py Show resolved Hide resolved
src/gateway/converter/symbol_table.py Show resolved Hide resolved
@EpsilonPrime EpsilonPrime requested a review from mbrobbel October 9, 2024 06:20
@EpsilonPrime
Copy link
Contributor Author

Thanks for the review @mbrobbel ! Could you merge it please? Thanks!

@mbrobbel mbrobbel merged commit 23e2fd7 into voltrondata:main Oct 9, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants