You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking for documentation on how to implement an ExtensionArray which supports join functionality.
Particularly, I'd like to join a table which includes a FixedShapeTensorArray column with another table.
Here's a simple example which does not work.
importnumpyasnpimportpyarrowaspa# First dim is the batch dimtensors=np.arange(3*10*10).reshape((3, 10, 10)).astype(np.uint8)
tensor_array=pa.FixedShapeTensorArray.from_numpy_ndarray(tensors)
ids=pa.array([1,2,3], type=pa.uint8())
table=pa.Table.from_arrays([ids, tensor_array], names=["id", "tensor"])
print(table.schema)
classes=pa.array(["one", "two", "three"], type=pa.string())
table_2=pa.Table.from_arrays([ids, classes], names=["id", "name"])
print(table_2.schema)
table.join(table_2, keys=["id"], join_type="full outer")
This raises the error
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
Cell In[42], [line 1](vscode-notebook-cell:?execution_count=42&line=1)
----> [1](vscode-notebook-cell:?execution_count=42&line=1) table.join(table_2, keys=["id"], join_type="full outer")
File ~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/table.pxi:5570, in pyarrow.lib.Table.join()
File ~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:247, in _perform_join(join_type, left_operand, left_keys, right_operand, right_keys, left_suffix, right_suffix, use_threads, coalesce_keys, output_type)
[242](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:242) projection = Declaration(
[243](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:243) "project", ProjectNodeOptions(projections, projected_col_names)
[244](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:244) )
[245](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:245) decl = Declaration.from_sequence([decl, projection])
--> [247](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:247) result_table = decl.to_table(use_threads=use_threads)
[249](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:249) if output_type == Table:
[250](https://file+.vscode-resource.vscode-cdn.net/Users/nw/workspaces/main/pytorc/projects/next_gen_data/next_gen_data/~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/acero.py:250) return result_table
File ~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/_acero.pyx:590, in pyarrow._acero.Declaration.to_table()
File ~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File ~/.pyenv/versions/next_gen_data_38/lib/python3.8/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowInvalid: Data type extension<arrow.fixed_shape_tensor[value_type=uint8, shape=[10,10], permutation=[0,1]]> is not supported in join non-key field tensor
How can I make this work? The individual tensors I want to store are rather small (single-digit-dimensions), but the join may lead to list aggregation of a few hundred rows.
I've tagged this as a python question because I don't know what level of API needs to be adjusted to add this functionality.
Component(s)
Python
The text was updated successfully, but these errors were encountered:
Describe the enhancement requested
I'm looking for documentation on how to implement an ExtensionArray which supports
join
functionality.Particularly, I'd like to join a table which includes a
FixedShapeTensorArray
column with another table.Here's a simple example which does not work.
This raises the error
How can I make this work? The individual tensors I want to store are rather small (single-digit-dimensions), but the join may lead to list aggregation of a few hundred rows.
I've tagged this as a python question because I don't know what level of API needs to be adjusted to add this functionality.
Component(s)
Python
The text was updated successfully, but these errors were encountered: