Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Throw an error if a RecordBatchReader is read more than once #118

Open
wants to merge 1 commit into
base: python_relation_dependencies
Choose a base branch
from

Conversation

Tishj
Copy link
Owner

@Tishj Tishj commented Mar 29, 2024

Since PyArrow RecordBatchReader objects are destructive (because they are stateful), they don't behave the same as native DuckDB tables, or Pandas/Polars DataFrames.

When we detect that a RecordBatchReader is encountered twice in a replacement scan, we throw an error.

Remaining Issues

State is Connection-local

This state is kept inside the PythonContextState, which is created for every separate connection, meaning that this error does not occur when two separate connections read from the same record batch reader.
We should instead make the RecordBatchReaderRegistry global on the module level, using locks to ensure multiple connections can make use of it at the same time.

Canceled Relations

We record the RecordBatchReader the moment it is encountered in the replacement scan, relations could theoretically be canceled, so they never run.
Subsequent relations that reference the RecordBatchReader then cause an error to be thrown.
We should differentiate between found and consumed RecordBatchReaders so this doesn't happen.

Another somewhat related issue to "Canceled Relations", using a LIMIT it could be possible to read from the same record batch reader and have it not be an error, if queries are issued to partially read from the RecordBatchReader

… scan, if the same record batch reader is encountered more than once, we throw an error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant