Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] - DBR 14.3 support - foreachbatch impacts #56

Open
riccamini opened this issue Aug 1, 2024 · 1 comment
Open

[FEATURE] - DBR 14.3 support - foreachbatch impacts #56

riccamini opened this issue Aug 1, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@riccamini
Copy link
Contributor

Is your feature request related to a problem? Please describe.

This issue is related to #33. Goal is to investigate the impacts on foreachbatch calls in the code.

@riccamini riccamini added the enhancement New feature or request label Aug 1, 2024
@riccamini
Copy link
Contributor Author

I have found only one reference to DataFrame.foreachbatch function in the StreamWriter class (spark.writers.stream.py:292)

One additional consideration after reading the DOC

    This function behaves differently in Spark Connect mode. See examples.
    In Connect, the provided function doesn't have access to variables defined outside of it.

    Examples
    --------
    >>> import time
    >>> df = spark.readStream.format("rate").load()
    >>> my_value = -1
    >>> def func(batch_df, batch_id):
    ...     global my_value
    ...     my_value = 100
    ...     batch_df.collect()
    ...
    >>> q = df.writeStream.foreachBatch(func).start()
    >>> time.sleep(3)
    >>> q.stop()
    >>> # if in Spark Connect, my_value = -1, else my_value = 100

This is not happening in Koheesio, but maybe it should be made explicit in the StreamWriter documentation for the field batch_function.

SynchronizeDeltaToSnowflakeTask does not have additional calls to foreachbatch as it reuses StreamWriter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant