`float("nan")` not always converted to `pd.NA` inside series with pint dtype #238

scanzy · 2024-06-26T20:41:34Z

Hello,
I am facing this issue while building a pd.Series with pint dtype.

When float("nan") is alone, it remains float("nan").
When float("nan") is with other values, it is converted into pd.NA.

This is not evident printing the series (the formatting shows always nan), but values or tolist() reveal the difference.

import pint as pt
import pandas as pd
import pint_pandas

# case 1: float nan alone
print(pd.Series([float("nan")], dtype="pint[MW]").tolist())
# gives: [<Quantity(nan, 'megawatt')>]

# case 2: float nan with other values
print(pd.Series([float("nan"), 0.0], dtype="pint[MW]").tolist())
# gives: [<Quantity(<NA>, 'megawatt')>, <Quantity(0.0, 'megawatt')>]

I supposed that float("nan") was the default value meaning "not set magnitude".
The fact that nan is converted to pd.NA based on other values in the series looks bit tricky to me: is it intended?

I am looking a way to keep not-set values consistent (either all float("nan"), or all pd.NA), but:

Tying to convert pd.NA to float("nan") has no effect.
If I try to convert float("nan") to pd.NA I get ValueError.

# test 1: trying to convert pd.NA to nan
s = pd.Series([float("nan"), 0.0], dtype="pint[MW]")
print(s.tolist())
# gives: [<Quantity(<NA>, 'megawatt')>, <Quantity(0, 'megawatt')>]

print(s.fillna(float("nan")).tolist())
# gives the same: [<Quantity(<NA>, 'megawatt')>, <Quantity(0, 'megawatt')>]


# test 2: trying to convert nan to pd.NA
s = pd.Series([float("nan")], dtype="pint[MW]")
print(s.tolist())
# gives: [<Quantity(nan, 'megawatt')>]

s.fillna(pd.NA)
# gives: ValueError: float() argument must be a string or a real number, not 'NAType'

versions:
- Python 3.11.2
- pandas 2.2.2
- Pint 0.24.1
- Pint-Pandas 0.6

The text was updated successfully, but these errors were encountered:

andrewgsavage · 2024-07-02T18:00:03Z

The difference is due to the underlying data type:

s = pd.Series([float("nan"), 0.0], dtype="pint[MW]")
s.values.data
<FloatingArray>
[<NA>, 0.0]
Length: 2, dtype: Float64


s = pd.Series([float("nan")], dtype="pint[MW]")
s.values.data
<NumpyExtensionArray>
[nan]
Length: 1, dtype: float64

andrewgsavage · 2024-07-02T18:05:47Z

I think pint-pandas should by:

By default, convert data to a FloatingArray
Have an option to change the conversion to some other dtype
Have an option to prevent conversion, allowing any dtype as the underlying data dtype. In this case, specify the underlying dtype in the pint dtype, eg 'pint[MW][Float64]'

andrewgsavage mentioned this issue Aug 5, 2024

subdtype #247

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`float("nan")` not always converted to `pd.NA` inside series with pint dtype #238

`float("nan")` not always converted to `pd.NA` inside series with pint dtype #238

scanzy commented Jun 26, 2024 •

edited

Loading

andrewgsavage commented Jul 2, 2024

andrewgsavage commented Jul 2, 2024

float("nan") not always converted to pd.NA inside series with pint dtype #238

float("nan") not always converted to pd.NA inside series with pint dtype #238

Comments

scanzy commented Jun 26, 2024 • edited Loading

andrewgsavage commented Jul 2, 2024

andrewgsavage commented Jul 2, 2024

`float("nan")` not always converted to `pd.NA` inside series with pint dtype #238

`float("nan")` not always converted to `pd.NA` inside series with pint dtype #238

scanzy commented Jun 26, 2024 •

edited

Loading