[FEA] Support more flexible construction of nested columns in pylibcudf #17192
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
pylibcudf
Issues specific to the pylibcudf package
Python
Affects Python cuDF API.
Is your feature request related to a problem? Please describe.
Currently pylibcudf exposes a subset of the factories in libcudf. When they were added in #15257, we omitted the factories for nested types due to various difficulties around ownership and what columns should be constructible from. We also have not strongly considered how to create pylibcudf columns of list or string types whose underlying data and offset arrays are views into other arrays. This type of construction could be done by manual
column_view
creation in libcudf, but it does require a thorough understanding of Arrow data layouts as well as their implementation in libcudf (especially for strings post the large strings refactor). All of these holes are particularly problematic because strings, lists and structs are the data types for which pylibcudf may have the most to offer: beyond simply providing a higher-performance, low-level API that cudf users could reach for when necessary, for these types pylibcudf can offer various bits of libcudf functionality that simply have no home in cudf at all. Therefore, making it possible to work with these types transparently in pylibcudf is of high importance to satisfy use cases for which we have no satisfactory solution at present.Describe the solution you'd like
We should investigate the best ways to enable construction of pylibcudf columns of nested types, including from other data sources like pairs of cupy arrays, and we should make these constructors as easy to use as possible.
Additional context
Where appropriate, we should consider adding constructors directly to libcudf as well. While it is possible to do everything we need with low-level libcudf APIs, one of the major synergies I anticipate between pylibcudf and libcudf is that pylibcudf will motivate usability improvements in libcudf that might otherwise have little impetus behind them. This is one such case where improving constructors directly in libcudf to help pylibcudf users can help a wider range of users, so we should seize the opportunity if it presents itself.
The text was updated successfully, but these errors were encountered: