GH-37484: [Python] Add a FixedSizeTensorScalar class #37533

rok · 2023-09-04T04:10:00Z

Rationale for this change

When working with FixedSizeTensorArray we want to access individual tensors. This would be enabled by adding:

def FixedSizeTensorScalar(pa.ExtensionScalar):
    def to_numpy_ndarray(): ...

See #37484.

What changes are included in this PR?

This adds FixedSizeTensorScalar and tests for it.

Are there any user-facing changes?

Yes, when calling FixedSizeTensorArray[i] we would get back FixedSizeTensorScalar instead of ExtensionScalar.

Closes: [Python] Add a FixedSizeTensorScalar class #37484

github-actions · 2023-09-04T04:10:30Z

⚠️ GitHub issue #37484 has been automatically assigned in GitHub to PR creator.

alippai · 2023-09-04T23:58:38Z

Would the numpy array api or https://data-apis.org/array-api/latest/purpose_and_scope.html add any value here?

rok · 2023-09-05T09:58:43Z

@alippai This PR would effectively implement a __pos__(i) method. Which should per array-api return an array while pyarrow typically returns scalar. I'm not sure how to reconcile this. We should probably have a broader discussion on the mailing list about adopting array API.

AlenkaF

Thanks for working on this! Added two suggestions, otherwise the Pyhton part LGTM.

python/pyarrow/array.pxi

python/pyarrow/types.pxi

AlenkaF · 2023-09-19T09:18:57Z

Would the numpy array api or https://data-apis.org/array-api/latest/purpose_and_scope.html add any value here?

I am very much hoping we could implement DLPack in Arrow: #33984. Specially for the new tensor arrays, it would be very beneficial!

@alippai This PR would effectively implement a __pos__(i) method. Which should per array-api return an array while pyarrow typically returns scalar. I'm not sure how to reconcile this. We should probably have a broader discussion on the mailing list about adopting array API.

+1

AlenkaF · 2023-09-19T09:22:41Z

One more thing, can the change in the C++ code (GetTensor feature) also be reflected in the title of the PR and the description? I am not sure how we want to rename it, so I haven't made any change.

jorisvandenbossche

Didn't yet look in detail, but added some quick drive-by comments. And thanks for working on this!

Can you also add some tests for the new Scalar class?

Currently, for the Python bindings, you added a get_tensor(i) method on the array class, but wouldn't make sense to (also/instead) add a to_tensor() method on the scalar class, since this is to get a Tensor for a single element (scalar) of the array?

cpp/src/arrow/extension/fixed_shape_tensor.cc

python/pyarrow/array.pxi

rok · 2024-02-08T00:34:49Z

Thanks a lot for the update. This is really getting good, just a couple comments on specific points.

Thanks for the helpful review @pitrou, I'm happy to see this moving forward! I've addressed your points, please let me know if more changes are needed.

python/pyarrow/array.pxi

pitrou · 2024-02-08T09:43:59Z

python/pyarrow/array.pxi

-        shape = obj.shape[1:]
-        size = obj.size / obj.shape[0]
+        shape = np.take(obj.shape, permutation)
+        values = np.ravel(obj, order="K")


as_strided can be a later PR if desired. The docstring addition is good for now!

pitrou · 2024-02-08T09:46:03Z

@github-actions crossbow submit -g python -g wheel

pitrou · 2024-02-08T09:46:48Z

It may be nice to later add a doc section for tensors here:
https://arrow.apache.org/docs/python/

github-actions · 2024-02-08T09:49:59Z

Revision: bf2ca0e

Submitted crossbow builds: ursacomputing/crossbow @ actions-2a16c8cab9

Task	Status
test-conda-python-3.10
test-conda-python-3.10-cython2
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest
test-conda-python-3.10-pandas-nightly
test-conda-python-3.10-spark-v3.5.0
test-conda-python-3.10-substrait
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-upstream_devel
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.8
test-conda-python-3.8-pandas-1.0
test-conda-python-3.8-spark-v3.5.0
test-conda-python-3.9
test-conda-python-3.9-pandas-latest
test-cuda-python
test-debian-11-python-3
test-fedora-38-python-3
test-ubuntu-20.04-python-3
test-ubuntu-22.04-python-3
wheel-macos-big-sur-cp310-arm64
wheel-macos-big-sur-cp311-arm64
wheel-macos-big-sur-cp312-arm64
wheel-macos-big-sur-cp38-arm64
wheel-macos-big-sur-cp39-arm64
wheel-macos-catalina-cp310-amd64
wheel-macos-catalina-cp311-amd64
wheel-macos-catalina-cp312-amd64
wheel-macos-catalina-cp38-amd64
wheel-macos-catalina-cp39-amd64
wheel-manylinux-2-28-cp310-amd64
wheel-manylinux-2-28-cp310-arm64
wheel-manylinux-2-28-cp311-amd64
wheel-manylinux-2-28-cp311-arm64
wheel-manylinux-2-28-cp312-amd64
wheel-manylinux-2-28-cp312-arm64
wheel-manylinux-2-28-cp38-amd64
wheel-manylinux-2-28-cp38-arm64
wheel-manylinux-2-28-cp39-amd64
wheel-manylinux-2-28-cp39-arm64
wheel-manylinux-2014-cp310-amd64
wheel-manylinux-2014-cp310-arm64
wheel-manylinux-2014-cp311-amd64
wheel-manylinux-2014-cp311-arm64
wheel-manylinux-2014-cp312-amd64
wheel-manylinux-2014-cp312-arm64
wheel-manylinux-2014-cp38-amd64
wheel-manylinux-2014-cp38-arm64
wheel-manylinux-2014-cp39-amd64
wheel-manylinux-2014-cp39-arm64
wheel-windows-cp310-amd64
wheel-windows-cp311-amd64
wheel-windows-cp312-amd64
wheel-windows-cp38-amd64
wheel-windows-cp39-amd64

rok · 2024-02-08T11:33:49Z

Added an issue for the docs: #39998
Thanks for your reviews @pitrou @jorisvandenbossche @AlenkaF @alippai !

jorisvandenbossche · 2024-02-08T12:32:09Z

python/pyarrow/array.pxi

        and the rest of the dimensions will match the permuted shape of the fixed
        shape tensor.

+        The conversion is zero-copy.


Small nit: this is only if the conversion to numpy is zero-copy (i.e. primitive numeric data without nulls)

Good point, added to VariableShapeTensor PR 8ca3bf7

conbench-apache-arrow · 2024-02-08T14:05:17Z

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 026188e.

There were 9 benchmark results indicating a performance regression:

Commit Run on ursa-i9-9960x at 2024-02-08 11:36:59Z
- tpch (R) with engine=arrow, format=native, language=R, memory_map=False, query_id=TPCH-02, scale_factor=1
- tpch (R) with engine=arrow, format=parquet, language=R, memory_map=False, query_id=TPCH-03, scale_factor=1
and 7 more (see the report linked below)

The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them.

) ### Rationale for this change When working with `FixedSizeTensorArray` we want to access individual tensors. This would be enabled by adding: ```python def FixedSizeTensorScalar(pa.ExtensionScalar): def to_numpy_ndarray(): ... ``` See apache#37484. ### What changes are included in this PR? This adds `FixedSizeTensorScalar` and tests for it. ### Are there any user-facing changes? Yes, when calling `FixedSizeTensorArray[i]` we would get back `FixedSizeTensorScalar` instead of `ExtensionScalar`. * Closes: apache#37484 Lead-authored-by: Rok Mihevc <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

github-actions bot added Component: Python awaiting committer review Awaiting committer review labels Sep 4, 2023

rok force-pushed the 37484 branch from ccf3696 to bbdc50a Compare September 17, 2023 20:15

github-actions bot added the Component: C++ label Sep 17, 2023

rok marked this pull request as ready for review September 17, 2023 20:17

rok force-pushed the 37484 branch from bbdc50a to 2ff0210 Compare September 18, 2023 12:35

AlenkaF reviewed Sep 19, 2023

View reviewed changes

python/pyarrow/array.pxi Outdated Show resolved Hide resolved

python/pyarrow/array.pxi Outdated Show resolved Hide resolved

python/pyarrow/types.pxi Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Sep 24, 2023

rok force-pushed the 37484 branch from 151da9d to bbc2fd1 Compare October 28, 2023 11:20

AlenkaF added this to the 15.0.0 milestone Nov 16, 2023

jorisvandenbossche reviewed Dec 1, 2023

View reviewed changes

cpp/src/arrow/extension/fixed_shape_tensor.cc Outdated Show resolved Hide resolved

cpp/src/arrow/extension/fixed_shape_tensor.cc Outdated Show resolved Hide resolved

python/pyarrow/array.pxi Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 1, 2023

rok force-pushed the 37484 branch from dc9a805 to 81d4996 Compare December 2, 2023 04:35

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 2, 2023

rok force-pushed the 37484 branch from 81d4996 to 80efc48 Compare December 2, 2023 04:37

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Dec 2, 2023

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 7, 2024

change to checked casts

e1a1d28

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 7, 2024

cast to ExtensionArray instead

052eec2

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 8, 2024

rok mentioned this pull request Feb 8, 2024

[Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy #39991

Open

pitrou approved these changes Feb 8, 2024

View reviewed changes

Nit: remove extraneous empty line.

bf2ca0e

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 8, 2024

pitrou merged commit 026188e into apache:main Feb 8, 2024
33 of 34 checks passed

pitrou removed the awaiting change review Awaiting change review label Feb 8, 2024

rok mentioned this pull request Feb 8, 2024

[Docs] Add a doc section for tensor arrays #39998

Open

github-actions bot added the awaiting changes Awaiting changes label Feb 8, 2024

jorisvandenbossche reviewed Feb 8, 2024

View reviewed changes

rok mentioned this pull request Feb 8, 2024

GH-38007: [C++] Add VariableShapeTensor implementation #38008

Open

llama90 mentioned this pull request Feb 18, 2024

[Python] Error when executing the command to build pyarrow #40117

Closed

jorisvandenbossche mentioned this pull request Aug 8, 2024

[Python] FixedShapeTensorArray.to_numpy_ndarray() fails with numpy arrays of type string in Pyarrow-16, but works with v15 #43614

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-37484: [Python] Add a FixedSizeTensorScalar class #37533

GH-37484: [Python] Add a FixedSizeTensorScalar class #37533

rok commented Sep 4, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Sep 4, 2023

alippai commented Sep 4, 2023

rok commented Sep 5, 2023

AlenkaF left a comment

AlenkaF commented Sep 19, 2023

AlenkaF commented Sep 19, 2023

jorisvandenbossche left a comment

rok commented Feb 8, 2024

pitrou Feb 8, 2024

pitrou commented Feb 8, 2024

pitrou commented Feb 8, 2024

github-actions bot commented Feb 8, 2024

rok commented Feb 8, 2024

jorisvandenbossche Feb 8, 2024

rok Feb 8, 2024

conbench-apache-arrow bot commented Feb 8, 2024

GH-37484: [Python] Add a FixedSizeTensorScalar class #37533

GH-37484: [Python] Add a FixedSizeTensorScalar class #37533

Conversation

rok commented Sep 4, 2023 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

github-actions bot commented Sep 4, 2023

alippai commented Sep 4, 2023

rok commented Sep 5, 2023

AlenkaF left a comment

Choose a reason for hiding this comment

AlenkaF commented Sep 19, 2023

AlenkaF commented Sep 19, 2023

jorisvandenbossche left a comment

Choose a reason for hiding this comment

rok commented Feb 8, 2024

pitrou Feb 8, 2024

Choose a reason for hiding this comment

pitrou commented Feb 8, 2024

pitrou commented Feb 8, 2024

github-actions bot commented Feb 8, 2024

rok commented Feb 8, 2024

jorisvandenbossche Feb 8, 2024

Choose a reason for hiding this comment

rok Feb 8, 2024

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Feb 8, 2024

rok commented Sep 4, 2023 •

edited by github-actions bot

Loading