Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: JSON Serialization Issue with Polars Time Data Type in Error Reporting #1841

Open
2 of 3 tasks
ccosming opened this issue Oct 26, 2024 · 0 comments
Open
2 of 3 tasks
Labels
bug Something isn't working

Comments

@ccosming
Copy link

Describe the bug

When validating Polars DataFrames containing columns with the Time or pl.Time data type using pandera, a PanicException occurs during error reporting due to an incompatibility with JSON serialization.

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample:

Steps to Reproduce:

  • Define a pandera schema with a column of type pl.Time.
  • Create a Polars DataFrame with data that violates the schema constraint for the Time column.
  • Call validate() on the DataFrame using the defined schema.
from datetime import time, datetime

import polars as pl
import pandera.polars as pa
import pandera.typing.polars as pat

class CalendarModel(pa.DataFrameModel):
    date: pat.Series[pl.Date] = pa.Field(nullable=False,coerce=True,unique=True)
    open: pat.Series[pl.Time] = pa.Field(nullable=False, coerce=True)
    close: pat.Series[pl.Time] = pa.Field(nullable=False,coerce=True)

    @pa.dataframe_check
    def open_lower_than_close(cls, data: pa.PolarsData):
        return data.lazyframe.select(pl.col("close").gt(pl.col("open")))

df = pl.DataFrame({
    "date": [datetime(2025,1,1), datetime(2025,2,1)],
    "open": [time(9,0,0), time(16,30,0)],
    "close": [time(9,0,0), time(8,30,0)]
})

CalendarModel.to_schema().validate(df, lazy=True)

Expected behavior

pandera should gracefully handle the validation error and provide a detailed error report, including information about the failed values.

Actual behavior

A PanicException is raised with the message "not yet implemented: Writing Time64(Nanosecond) to JSON". This is because Polars' internal representation of Time as Time64(Nanosecond) does not have a direct JSON equivalent, and pandera's error reporting mechanism relies on JSON serialization.

Screenshots

image

Workaround:

Currently, a workaround involves catching the PanicException and manually handling the error reporting. However, this approach is not ideal and requires custom code.

Additional Context:

This issue likely stems from the inherent limitation of JSON in representing specific data types like Time. Addressing this within pandera would significantly improve its usability for Polars users.

@ccosming ccosming added the bug Something isn't working label Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant