forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-42907][CONNECT][PYTHON] Implement Avro functions
### What changes were proposed in this pull request? Implement Avro functions ### Why are the changes needed? For function parity ### Does this PR introduce _any_ user-facing change? yes, new APIs ### How was this patch tested? added doctest and manually check ``` (spark_dev) ➜ spark git:(connect_avro_functions) ✗ bin/pyspark --remote "local[*]" --jars connector/avro/target/scala-2.12/spark-avro_2.12-3.5.0-SNAPSHOT.jar Python 3.9.16 (main, Mar 8 2023, 04:29:24) Type 'copyright', 'credits' or 'license' for more information IPython 8.11.0 -- An enhanced Interactive Python. Type '?' for help. 23/03/23 16:28:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.5.0.dev0 /_/ Using Python version 3.9.16 (main, Mar 8 2023 04:29:24) Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. In [1]: >>> from pyspark.sql import Row ...: >>> from pyspark.sql.avro.functions import from_avro, to_avro ...: >>> data = [(1, Row(age=2, name='Alice'))] ...: >>> df = spark.createDataFrame(data, ("key", "value")) ...: >>> avroDf = df.select(to_avro(df.value).alias("avro")) In [2]: avroDf.collect() Out[2]: [Row(avro=bytearray(b'\x00\x00\x04\x00\nAlice'))] ``` Closes apache#40535 from zhengruifeng/connect_avro_functions. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information
1 parent
aacac46
commit 5a56c17
Showing
10 changed files
with
208 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
"""Spark Connect Python Client - Avro Functions""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
""" | ||
A collections of builtin avro functions | ||
""" | ||
|
||
from pyspark.sql.connect.utils import check_dependencies | ||
|
||
check_dependencies(__name__) | ||
|
||
from typing import Dict, Optional, TYPE_CHECKING | ||
|
||
from pyspark.sql.avro import functions as PyAvroFunctions | ||
|
||
from pyspark.sql.connect.column import Column | ||
from pyspark.sql.connect.functions import _invoke_function, _to_col, _options_to_col, lit | ||
|
||
if TYPE_CHECKING: | ||
from pyspark.sql.connect._typing import ColumnOrName | ||
|
||
|
||
def from_avro( | ||
data: "ColumnOrName", jsonFormatSchema: str, options: Optional[Dict[str, str]] = None | ||
) -> Column: | ||
if options is None: | ||
return _invoke_function("from_avro", _to_col(data), lit(jsonFormatSchema)) | ||
else: | ||
return _invoke_function( | ||
"from_avro", _to_col(data), lit(jsonFormatSchema), _options_to_col(options) | ||
) | ||
|
||
|
||
from_avro.__doc__ = PyAvroFunctions.from_avro.__doc__ | ||
|
||
|
||
def to_avro(data: "ColumnOrName", jsonFormatSchema: str = "") -> Column: | ||
if jsonFormatSchema == "": | ||
return _invoke_function("to_avro", _to_col(data)) | ||
else: | ||
return _invoke_function("to_avro", _to_col(data), lit(jsonFormatSchema)) | ||
|
||
|
||
to_avro.__doc__ = PyAvroFunctions.to_avro.__doc__ | ||
|
||
|
||
def _test() -> None: | ||
import os | ||
import sys | ||
from pyspark.testing.utils import search_jar | ||
|
||
avro_jar = search_jar("connector/avro", "spark-avro", "spark-avro") | ||
|
||
print() | ||
print(avro_jar) | ||
print(avro_jar) | ||
print(avro_jar) | ||
print() | ||
|
||
if avro_jar is None: | ||
print( | ||
"Skipping all Avro Python tests as the optional Avro project was " | ||
"not compiled into a JAR. To run these tests, " | ||
"you need to build Spark with 'build/sbt -Pavro package' or " | ||
"'build/mvn -Pavro package' before running this test." | ||
) | ||
sys.exit(0) | ||
else: | ||
existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell") | ||
jars_args = "--jars %s" % avro_jar | ||
os.environ["PYSPARK_SUBMIT_ARGS"] = " ".join([jars_args, existing_args]) | ||
|
||
import doctest | ||
from pyspark.sql import SparkSession as PySparkSession | ||
import pyspark.sql.connect.avro.functions | ||
|
||
globs = pyspark.sql.connect.avro.functions.__dict__.copy() | ||
|
||
globs["spark"] = ( | ||
PySparkSession.builder.appName("sql.connect.avro.functions tests") | ||
.remote("local[4]") | ||
.getOrCreate() | ||
) | ||
|
||
(failure_count, test_count) = doctest.testmod( | ||
pyspark.sql.connect.avro.functions, | ||
globs=globs, | ||
optionflags=doctest.ELLIPSIS | ||
| doctest.NORMALIZE_WHITESPACE | ||
| doctest.IGNORE_EXCEPTION_DETAIL, | ||
) | ||
|
||
globs["spark"].stop() | ||
|
||
if failure_count: | ||
sys.exit(-1) | ||
|
||
|
||
if __name__ == "__main__": | ||
_test() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters