-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-50851][ML][CONNECT][PYTHON] Express ML params with `proto.Expr…
…ession.Literal` ### What changes were proposed in this pull request? Express ML params with `proto.Expression.Literal`: 1, introduce `Literal.SpecializedArray` for large primitive literal arrays (e.g. the initial model coefficients which can be large) ``` message SpecializedArray { oneof value_type { Bools bools = 1; Ints ints = 2; Longs longs = 3; Floats floats = 4; Doubles doubles = 5; Strings strings = 6; } message Bools { repeated bool values = 1; } message Ints { repeated int32 values = 1; } message Longs { repeated int64 values = 1; } message Floats { repeated float values = 1; } message Doubles { repeated double values = 1; } message Strings { repeated string values = 1; } } ``` 2, Replace `proto.Param ` with `proto.Expression` to be consistent with SQL side For `Param[Vector]` and `Param[Matrix]`, apply `proto.Expression.Literal.Struct` with the underlying schema of `VectorUDT` and `MatrixUDT`. E.g. for `Param[Vector]` with value `Vectors.sparse(4, [(1, 1.0), (3, 5.5)])`, the message is like: ``` literal { struct { struct_type { struct { ... <- schema of VectorUDT } } elements { byte: 0 } elements { integer: 4 } elements { specialized_array { ints { values: 1 values: 3 } } } elements { specialized_array { doubles { values: 1 values: 5.5 } } } } ``` ### Why are the changes needed? 1, to optimize large literal arrays, for both ML and SQL (we can apply it in SQL side later) 2, be consistent with SQL side, e.g. the parameterized SQL ``` // (Optional) A map of parameter names to expressions. // It cannot coexist with `pos_arguments`. map<string, Expression.Literal> named_arguments = 4; // (Optional) A sequence of expressions for positional parameters in the SQL query text. // It cannot coexist with `named_arguments`. repeated Expression pos_arguments = 5; ``` 3, to minimize the protobuf change ### Does this PR introduce _any_ user-facing change? no, refactor-only ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #49529 from zhengruifeng/ml_proto_expr. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
- Loading branch information
1 parent
10dd350
commit 2721a50
Showing
23 changed files
with
906 additions
and
871 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.