-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49191][SS] Add support for reading transformWithState map stat…
…e variables with state data source reader ### What changes were proposed in this pull request? Add support for reading transformWithState map state variables with state data source reader. ### Why are the changes needed? Changes are needed to integrate reading state reading with new operator metadata and state schema format for the map state types used in state variables within transformWithState ### Does this PR introduce _any_ user-facing change? No. Similar way as reading valueState, user can now read mapState state var as: ``` spark .read .format("statestore") .option("operatorId", <operatorId>) .option("stateVarName", <mapStateVarName>) .load(<state path>) ``` The output Dataframe will look like: ``` +----+---------------------------+------------+ |key |map_value |partition_id| +----+---------------------------+------------+ |{k1}|{{v2} -> {5}, {v1} -> {10}}|0 | |{k2}|{{v2} -> {3}} |0 | +----+---------------------------+------------+ ``` Or this if TTL is enabled: ``` +----+------------------------------------------------+------------+ |key |map_value |partition_id| +----+------------------------------------------------+------------+ |{k1}|{{key2} -> {{2}, 61000}, {key1} -> {{1}, 61000}}|0 | +----+------------------------------------------------+------------+ ``` An example schema for output dataframe: ``` root |-- key: struct (nullable = true) # grouping key row | |-- value: string (nullable = true) |-- map_value: map (nullable = true) | |-- key: struct # user key row | | |-- value: string (nullable = true) | |-- value: struct (valueContainsNull = false) # value row in state store | | |-- value: struct (nullable = true) # value row | | | |-- value: integer (nullable = false) | | |-- ttlExpirationMs: long (nullable = true) # ttl column |-- partition_id: integer (nullable = true) ``` ### How was this patch tested? Unit tests in `StateDataSourceTransformWithStateSuite` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48000 from jingz-db/map-state-rebase. Lead-authored-by: jingz-db <[email protected]> Co-authored-by: Jing Zhan <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
- Loading branch information
1 parent
c4a396b
commit 8732528
Showing
3 changed files
with
322 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.