Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45107][PYTHON][DOCS] Refine docstring of explode #42860

Closed

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR refines the docstring of function explode by adding more examples.

Why are the changes needed?

To improve PySpark documentations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

doctest

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang
Copy link
Contributor

could you re-trigger the CI again @allisonwang-db

@allisonwang-db allisonwang-db force-pushed the spark-45107-refine-explode branch from f48f7d7 to b49fd09 Compare September 11, 2023 18:24
>>> from pyspark.sql import Row
>>> df = spark.createDataFrame([Row(a=1, list1=[1, 2], list2=[3, 4])])
>>> df.select(sf.explode(df.list1).alias("list1"), "list2") \
... .select("list1", sf.explode(df.list2).alias("list2")).show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the test failure is related to this one

**********************************************************************
File "/__w/spark/spark/python/pyspark/sql/functions.py", line 286, in pyspark.sql.functions.explode
Failed example:
    df.select(sf.explode(df.list1).alias("list1"), "list2")     ...     .select("list1", sf.explode(df.list2).alias("list2")).show()
Exception raised:
    Traceback (most recent call last):
      File "/usr/local/pypy/pypy3.8/lib/pypy3.8/doctest.py", line 1338, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest pyspark.sql.functions.explode[19]>", line 1
        df.select(sf.explode(df.list1).alias("list1"), "list2")     ...     .select("list1", sf.explode(df.list2).alias("list2")).show()
                                                                    ^
    SyntaxError: invalid syntax
**********************************************************************
   1 of  33 in pyspark.sql.functions.explode
***Test Failed*** 1 failures.
/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py:127: RuntimeWarning: 'pyspark.sql.functions' found in sys.modules after import of package 'pyspark.sql', but prior to execution of 'pyspark.sql.functions'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
/__w/spark/spark/python/pyspark/sql/udtf.py:163: UserWarning: Arrow optimization for Python UDTFs cannot be enabled: PyArrow >= 4.0.0 must be installed; however, it was not found.. Falling back to using regular Python UDTFs.
  warnings.warn(

Had test failures in pyspark.sql.functions with pypy3; see logs.
Error:  running /__w/spark/spark/python/run-tests --modules=pyspark-sql,pyspark-testing --parallelism=1 ; received return code 255
Error: Process completed with exit code 19.

@zhengruifeng
Copy link
Contributor

starting python compilation test...
python compilation succeeded.

starting black test...
black checks failed:
Oh no! 💥 💔 💥 The required version `23.9.1` does not match the running version `22.6.0`!
Please run 'dev/reformat-python' script.
1
Error: Process completed with exit code 1.

please rebase

@zhengruifeng
Copy link
Contributor

thanks, merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants