Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

brayan07 · 2023-12-15T15:00:16Z

In this PR we resolve the issue reported in #1446, where any Pydantic model with a pandera.typing.pyspark.DataFrame or pandera.typing.pyspark_sql.DataFrame always throws a confusing ValidationError.

For clarity, we want to make sure the following leads to the expected behavior:

import pyspark.sql.types as T

from pandera.pyspark import DataFrameModel, Field
from pandera.typing.pyspark_sql import DataFrame
from pydantic import BaseModel
from pyspark.sql import SparkSession


class SampleSchema(DataFrameModel):
    """
    Sample schema model with data checks.
    """

    product: T.StringType() = Field()
    price: T.IntegerType() = Field()


class PydanticContainer(BaseModel):
    """
    Pydantic container with a DataFrameModel as a field.
    """

    data: DataFrame[SampleSchema]

    class Config:
        arbitrary_types_allowed = True

We do this by creating a _PydanticIntegrationMixIn that can be used by both pandera.typing.pyspark_sql.DataFrame and pandera.typing.pyspark.DataFrame.

The content of the mixin is a variation of the methods used in pandera.typing.pandas.DataFrame.

Note:
We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.

Signed-off-by: Brayan Jaramillo <[email protected]>

* Disable irrelevant pylint warnings Signed-off-by: Brayan Jaramillo <[email protected]>

Signed-off-by: Brayan Jaramillo <[email protected]>

cosmicBboy · 2023-12-18T16:25:19Z

Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing.

Signed-off-by: Brayan Jaramillo <[email protected]>

brayan07 · 2023-12-19T15:08:31Z

Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks!

brayan07 · 2023-12-19T15:48:07Z

I'm getting the same failed tests locally for the main branch, as well as for this branch, with make nox-conda. I don't think it's what I added but something in the dev setup. Would it be alright if we ran the CI workflow one more time to help me debug?

cosmicBboy · 2024-04-13T15:57:35Z

Hi @brayan07 sorry for the delayed review on this!

I believe the test errors are coming from from pydantic import GetCoreSchemaHandler. Will need to move that import into the PYDANTIC_V2 conditional

brayan07 added 2 commits December 12, 2023 10:28

Solve problem for both pysparksql and pyspark typing

dc1eabc

Signed-off-by: Brayan Jaramillo <[email protected]>

Implement Pydantic Integration MixIn

2e633c6

Signed-off-by: Brayan Jaramillo <[email protected]>

brayan07 mentioned this pull request Dec 15, 2023

Cannot create a pydantic model with a pandera.typing.pyspark.DataFrame type. #1446

Open

3 tasks

brayan07 added 2 commits December 15, 2023 10:24

Move mix in outside of if statement

d391ea9

* Disable irrelevant pylint warnings Signed-off-by: Brayan Jaramillo <[email protected]>

Update module docstrings

6d97c23

Signed-off-by: Brayan Jaramillo <[email protected]>

brayan07 added 2 commits December 19, 2023 09:55

Fix linting issues

e3cb340

Signed-off-by: Brayan Jaramillo <[email protected]>

Merge branch 'main' into bugfix/1446

cadb145

cosmicBboy closed this Jan 25, 2024

cosmicBboy reopened this Jan 25, 2024

cosmicBboy added 2 commits April 14, 2024 11:46

update pydantic imports

dd8a0d7

Merge branch 'main' into bugfix/1446

2d1a3b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

brayan07 commented Dec 15, 2023 •

edited

Loading

cosmicBboy commented Dec 18, 2023

brayan07 commented Dec 19, 2023 •

edited

Loading

brayan07 commented Dec 19, 2023

cosmicBboy commented Apr 13, 2024

Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame or typing.pyspark_sql.DataFrame Generic #1447

Are you sure you want to change the base?

Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame or typing.pyspark_sql.DataFrame Generic #1447

Conversation

brayan07 commented Dec 15, 2023 • edited Loading

cosmicBboy commented Dec 18, 2023

brayan07 commented Dec 19, 2023 • edited Loading

brayan07 commented Dec 19, 2023

cosmicBboy commented Apr 13, 2024

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

brayan07 commented Dec 15, 2023 •

edited

Loading

brayan07 commented Dec 19, 2023 •

edited

Loading