Skip to content

Commit

Permalink
[SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh…
Browse files Browse the repository at this point in the history
… plot

### What changes were proposed in this pull request?
- Update the documentation for barh plot to clarify the difference between axis interpretation in Plotly and Matplotlib.
- Test multiple columns as value axis.

The parameter difference is demonstrated as below.
```py
>>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]})
>>> df.plot.barh(x='val', y='lab').show()  # plot1

>>> ps.set_option('plotting.backend', 'matplotlib')
>>> import matplotlib.pyplot as plt
>>> df.plot.barh(x='lab', y='val')
>>> plt.show()  # plot2
```

plot1
![newplot (5)](https://github.com/user-attachments/assets/f1b6fabe-9509-41bb-8cfb-0733f65f1643)

plot2
![Figure_1](https://github.com/user-attachments/assets/10e1b65f-6116-4490-9956-29e1fbf0c053)

### Why are the changes needed?
The barh plot’s x and y axis behavior differs between Plotly and Matplotlib, which may confuse users. The updated documentation and tests help ensure clarity and prevent misinterpretation.

### Does this PR introduce _any_ user-facing change?
No. Doc change only.

### How was this patch tested?
Unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #48161 from xinrong-meng/ps_barh.

Authored-by: Xinrong Meng <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
xinrong-meng authored and dongjoon-hyun committed Sep 19, 2024
1 parent f0fb0c8 commit 92cad2a
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 5 deletions.
13 changes: 10 additions & 3 deletions python/pyspark/pandas/plot/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -756,10 +756,10 @@ def barh(self, x=None, y=None, **kwargs):
Parameters
----------
x : label or position, default DataFrame.index
Column to be used for categories.
y : label or position, default All numeric columns in dataframe
x : label or position, default All numeric columns in dataframe
Columns to be plotted from the DataFrame.
y : label or position, default DataFrame.index
Column to be used for categories.
**kwds
Keyword arguments to pass on to
:meth:`pyspark.pandas.DataFrame.plot` or :meth:`pyspark.pandas.Series.plot`.
Expand All @@ -770,6 +770,13 @@ def barh(self, x=None, y=None, **kwargs):
Return an custom object when ``backend!=plotly``.
Return an ndarray when ``subplots=True`` (matplotlib-only).
Notes
-----
In Plotly and Matplotlib, the interpretation of `x` and `y` for `barh` plots differs.
In Plotly, `x` refers to the values and `y` refers to the categories.
In Matplotlib, `x` refers to the categories and `y` refers to the values.
Ensure correct axis labeling based on the backend used.
See Also
--------
plotly.express.bar : Plot a vertical bar plot using plotly.
Expand Down
5 changes: 3 additions & 2 deletions python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,10 @@ def check_barh_plot_with_x_y(pdf, psdf, x, y):
self.assertEqual(pdf.plot.barh(x=x, y=y), psdf.plot.barh(x=x, y=y))

# this is testing plot with specified x and y
pdf1 = pd.DataFrame({"lab": ["A", "B", "C"], "val": [10, 30, 20]})
pdf1 = pd.DataFrame({"lab": ["A", "B", "C"], "val": [10, 30, 20], "val2": [1.1, 2.2, 3.3]})
psdf1 = ps.from_pandas(pdf1)
check_barh_plot_with_x_y(pdf1, psdf1, x="lab", y="val")
check_barh_plot_with_x_y(pdf1, psdf1, x="val", y="lab")
check_barh_plot_with_x_y(pdf1, psdf1, x=["val", "val2"], y="lab")

def test_barh_plot(self):
def check_barh_plot(pdf, psdf):
Expand Down

0 comments on commit 92cad2a

Please sign in to comment.