Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse function of pa.DataFrameModel is called twice #1842

Open
3 tasks done
TimotejPalus opened this issue Oct 28, 2024 · 2 comments
Open
3 tasks done

Parse function of pa.DataFrameModel is called twice #1842

TimotejPalus opened this issue Oct 28, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@TimotejPalus
Copy link

TimotejPalus commented Oct 28, 2024

Describe the bug
Hello,
It seems like the parse function is called twice for a specified column of given pandas dataframe. Please check sample code and sample output.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

Slightly modified example from https://pandera.readthedocs.io/en/stable/parsers.html#parsers-in-dataframemodel

import pandas as pd
import pandera as pa

data = pd.DataFrame({
    "a": [2.0, 4.0, 9.0],
    "b": [2.0, 4.0, 9.0],
    "c": [2.0, 4.0, 9.0],
})

class DFModel(pa.DataFrameModel):
    a: float
    b: float
    c: float


    @pa.parser("b")
    def negate(cls, series):
        print(series)
        return series

DFModel.validate(data)
Printed to console
0    2.0
1    4.0
2    9.0
Name: b, dtype: float64
0    2.0
1    4.0
2    9.0
Name: b, dtype: float64

Expected behavior

From what is printed to the console it is obvious that the negate is run twice. I would expect for the parser to be run once.
I was not able to find in the documentation why this is so. From what i have googled i found similar issue:
#1707

Additional context

pandera version: '0.20.4'

Thank you very much :)

@TimotejPalus TimotejPalus added the bug Something isn't working label Oct 28, 2024
@TimotejPalus
Copy link
Author

TimotejPalus commented Oct 28, 2024

It seems like it is run twice, but in the resultatn pd.Dataframe only the output from the first run of the parser ispresent:

code:

data = pd.DataFrame({
    "a": [2.0, 4.0, 9.0],
    "b": [2.0, 4.0, 9.0],
    "c": [2.0, 4.0, 9.0],
})

class DFModel(pa.DataFrameModel):
    a: float
    b: float
    c: float

    @pa.parser("b")
    def negate(cls, series):
        print('\n -------------',f'\nbefore parsing: {series.tolist()}', f'\nafter parsing: {(series + 1).tolist()}')
        return series + 1


data = DFModel.validate(data)
print('\n -------------',f'\nResulting "b" column in the "data" pd.DataFrame: {data["b"].tolist()}')

console:

 ------------- 
before parsing: [2.0, 4.0, 9.0] 
after parsing: [3.0, 5.0, 10.0]
 ------------- 
before parsing: [3.0, 5.0, 10.0] 
after parsing: [4.0, 6.0, 11.0]
 ------------- 
Resulting "b" column in the "data" pd.DataFrame: [3.0, 5.0, 10.0]

@Girmii
Copy link

Girmii commented Nov 7, 2024

Also mentioned in #1684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants