Fix deprecated Pandas syntax (up to v2.2.2) #164

tompollard · 2024-06-04T03:35:10Z

Minor changes to update syntax to address warnings raised when running pytest with Pandas v2.2.2:

⏚ [tompollard:~/projects/tableone] [env] main* 5s ± pytest
======================================= test session starts ========================================
platform darwin -- Python 3.9.19, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/tompollard/projects/tableone
collected 30 items                                                                                 

tests/unit/test_tableone.py ..............................                                   [100%]

========================================= warnings summary =========================================
tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <function mean at 0x103d48790> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <function median at 0x104646550> is currently using DataFrameGroupBy.median. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "median" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <built-in function min> is currently using DataFrameGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <built-in function max> is currently using DataFrameGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py::TestTableOne::test_tableone_row_sort_pn
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:486: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
    tableone_rows = pd.unique([x[0] for x in table.tableone.index.values])

tests/unit/test_tableone.py::TestTableOne::test_tableone_row_sort_pn
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:494: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
    tableone_rows = pd.unique([x[0] for x in table.tableone.index.values])

tests/unit/test_tableone.py::TestTableOne::test_string_data_as_continuous_error
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:122: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'could not measure' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    data_mixed.loc[1, 'mixed numeric data'] = 'could not measure'

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
  /Users/tompollard/projects/tableone/tableone/tableone.py:1176: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
  You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
  A typical example is when you are setting values in a column of a DataFrame, like:
  
  df["col"][row_indexer] = value
  
  Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
  
  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
    df[colname.format(p[0], p[1])].loc[v] = smd

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
  /Users/tompollard/projects/tableone/tableone/tableone.py:1190: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
  You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
  A typical example is when you are setting values in a column of a DataFrame, like:
  
  df["col"][row_indexer] = value
  
  Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
  
  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
    df[colname.format(p[0], p[1])].loc[v] = smd  # type: ignore

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:970: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
    smd = t.tableone.loc[k, 'Grouped by MechVent']['SMD (0,1)'][0]

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:998: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
    smd = t.tableone.loc[k, 'Grouped by MechVent']['SMD (0,1)'][0]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

… or np.ndarray is deprecated.

tompollard added 5 commits June 3, 2024 23:15

use string representations of the function names.

f8ccdfb

unique with argument that is not not a Series, Index, ExtensionArray,…

beef5d8

… or np.ndarray is deprecated.

Fix Chained Assignment Error.

f7ebfaa

Use iloc to index by position.

4f049e4

Add note on future deprecation of incompatible dtypes.

5c28200

tompollard merged commit 0ec30a2 into main Jun 4, 2024
2 checks passed

tompollard deleted the tp/bump_pandas_2_2_2 branch June 4, 2024 03:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deprecated Pandas syntax (up to v2.2.2) #164

Fix deprecated Pandas syntax (up to v2.2.2) #164

tompollard commented Jun 4, 2024

Fix deprecated Pandas syntax (up to v2.2.2) #164

Fix deprecated Pandas syntax (up to v2.2.2) #164

Conversation

tompollard commented Jun 4, 2024