You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug:
Int64 columns in a dataframe are treated as float, "samples" lists float values, as well as all statistics do.
data_label = FLOAIs it possible that "Int64" (capital I) is not supported?
c
To Reproduce:
Expected behavior:
treat Int64 as Int64...
Screenshots:
Additional context:
Can't actually show too much details of the data but I'll try to explain...
I have selected data from a database into a python dataframe. Most of the DB fields are nullable, so they end up in python as float, because it supports NaN.
In order to use the correct data type, I have converted those fields to Int64 in python.
This yields for example:
Name: F*****, Length: 147190, dtype: Int64
For this variable the profile report says, "data_type": "string",
In 'statistics', "min": 1.0 , "max": 4.0, "sum": 117454.0, which looks like float.
The histogram bin_edges are all float, which I wouldn't expect for an int value.
"categorical": true, which it is for all int variables -> could be ok, but in fact in most of the cases these are not categories but simple measurement values.
'null_count': if this is supposed to count NaN-entries, it does not count correctly for Int64. Int64-NaN values seem to be denoted as (couldn't find out yet if there is a way to influence the NaN representation of Int64).
For the variable above the count is not available in the report (as is the case for all Int64 variables).
Hope this helps...
The text was updated successfully, but these errors were encountered:
Please fill out the rest of the issue template with a code snippet to reproduce / example data you are using -- that helps us work through issues a bit quicker. Might be slightly delayed in my responses this week as well but will do my best to keep an eye on any updates here. Best!
General Information:
Describe the bug:
Int64 columns in a dataframe are treated as float, "samples" lists float values, as well as all statistics do.
data_label = FLOAIs it possible that "Int64" (capital I) is not supported?
c
To Reproduce:
Expected behavior:
treat Int64 as Int64...
Screenshots:
Additional context:
Can't actually show too much details of the data but I'll try to explain...
I have selected data from a database into a python dataframe. Most of the DB fields are nullable, so they end up in python as float, because it supports NaN.
In order to use the correct data type, I have converted those fields to Int64 in python.
This yields for example:
Name: F*****, Length: 147190, dtype: Int64
For this variable the profile report says, "data_type": "string",
In 'statistics', "min": 1.0 , "max": 4.0, "sum": 117454.0, which looks like float.
The histogram bin_edges are all float, which I wouldn't expect for an int value.
"categorical": true, which it is for all int variables -> could be ok, but in fact in most of the cases these are not categories but simple measurement values.
'null_count': if this is supposed to count NaN-entries, it does not count correctly for Int64. Int64-NaN values seem to be denoted as (couldn't find out yet if there is a way to influence the NaN representation of Int64).
For the variable above the count is not available in the report (as is the case for all Int64 variables).
Hope this helps...
The text was updated successfully, but these errors were encountered: