Skip to content

Commit

Permalink
Fix unbounded sequence issue in DataFrame constructor (#13811)
Browse files Browse the repository at this point in the history
In `cudf`, we currently have a hang in this scenario:
```python
In [1]: import cudf

In [2]:     class A:
   ...:         def __getitem__(self, key):
   ...:             return 1
   ...: 

In [3]: cudf.DataFrame([A()])
```

This PR introduces additional checks before letting the list-like inputs pass onto `itertools` for transposing.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #13811
  • Loading branch information
galipremsagar authored Aug 3, 2023
1 parent f46cb31 commit 11fd25c
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 0 deletions.
6 changes: 6 additions & 0 deletions python/cudf/cudf/core/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -843,6 +843,12 @@ def _init_from_list_like(self, data, index=None, columns=None):
data = DataFrame.from_pandas(pd.DataFrame(data))
self._data = data._data
else:
if any(
not isinstance(col, (abc.Iterable, abc.Sequence))
for col in data
):
raise TypeError("Inputs should be an iterable or sequence.")

data = list(itertools.zip_longest(*data))

if columns is not None and len(data) == 0:
Expand Down
12 changes: 12 additions & 0 deletions python/cudf/cudf/tests/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -10243,3 +10243,15 @@ def test_dataframe_init_columns_named_index():
pdf = pd.DataFrame(data, columns=columns)

assert_eq(gdf, pdf)


def test_dataframe_constructor_unbounded_sequence():
class A:
def __getitem__(self, key):
return 1

with pytest.raises(TypeError):
cudf.DataFrame([A()])

with pytest.raises(TypeError):
cudf.DataFrame({"a": A()})

0 comments on commit 11fd25c

Please sign in to comment.