Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST (string dtype): resolve all xfails in JSON IO tests #60318

Merged
merged 2 commits into from
Nov 15, 2024

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Nov 14, 2024

No description provided.

@WillAyd WillAyd added this to the 2.3 milestone Nov 15, 2024
@jorisvandenbossche jorisvandenbossche changed the title String dtype: enable in JSON IO + resolve all xfails TST (string dtype): resolve all xfails in JSON IO tests Nov 15, 2024
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -710,6 +707,9 @@ def test_series_roundtrip_object(self, orient, dtype, object_series):
if orient != "split":
expected.name = None

if using_string_dtype():
expected = expected.astype(pd.StringDtype(na_value=np.nan))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW you can also use the shorter astype("str") for general cases where you want to ensure the default string dtype is used (and which might additionally also be more forward looking if at some point the default na_value changes, then we would have less work updating the tests)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know - I missed that on the prior SQL PR so will go back and do that too, so we have one consistent method internally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on the sql PR you wanted to construct the actual dtype instance? In that case the above is fine. I am mostly using "str" in the tests.

@jorisvandenbossche
Copy link
Member

While looking at centralizing to_pandas() conversion, I noticed that for JSON we are not yet handling the case of the string dtype for the pyarrow engine (engine, not dtype_backend), so it's a bit surprising actually that all tests are passing here ...

@WillAyd
Copy link
Member Author

WillAyd commented Nov 15, 2024

Interesting - I didn't even realize we had a pyarrow engine for JSON!

Is your expectation that those cases would be returning dtype=object and cause a failure?

@jorisvandenbossche
Copy link
Member

Is your expectation that those cases would be returning dtype=object and cause a failure?

Yes, at least if we have proper testing of it ..

@jorisvandenbossche
Copy link
Member

But let's do that in a next PR, going to merge this one already

@jorisvandenbossche jorisvandenbossche merged commit 9bc88c7 into pandas-dev:main Nov 15, 2024
51 checks passed
@jorisvandenbossche jorisvandenbossche added IO JSON read_json, to_json, json_normalize Strings String extension data type and string data labels Nov 15, 2024
Copy link

lumberbot-app bot commented Nov 15, 2024

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 9bc88c79e6fd146a44970309bacc90490fdec590
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #60318: TST (string dtype): resolve all xfails in JSON IO tests'
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-60318-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #60318 on branch 2.3.x (TST (string dtype): resolve all xfails in JSON IO tests)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@WillAyd WillAyd deleted the string-dtype-json branch November 15, 2024 14:51
@WillAyd
Copy link
Member Author

WillAyd commented Nov 15, 2024

Alright - I'll take a look. I wonder if there's some post-processing in JSON that is masking that...

I'll take care of the backport as well. Thanks for merging

WillAyd added a commit to WillAyd/pandas that referenced this pull request Nov 15, 2024
mroeschke pushed a commit that referenced this pull request Nov 15, 2024
…fails in JSON IO tests) (#60327)

Backport PR #60318: TST (string dtype): resolve all xfails in JSON IO tests

(cherry picked from commit 9bc88c7)
@jorisvandenbossche
Copy link
Member

Manual backport -> #60327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants