-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Cython's array
to back Py_ssize_t[::1]
#504
Conversation
35da294
to
2544909
Compare
In `Array`, `Py_ssize_t[::1]` objects are currently backed by CPython `array`'s with some internal bits expressed in Cython. However these are not compatible with Python's Limited API and Stable ABI. To address that, switch to Cython's own`array` type. As this is baked into Cython and doesn't use anything special, it is compatible with Python's Limited API and Stable ABI. https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cython-arrays
2544909
to
10af4c2
Compare
The performance of the two approaches can be compared with the following Cython test code # filename: new_arr.pyx
from cpython.array cimport array, newarrayobject
from cpython.mem cimport PyMem_Malloc, PyMem_Free
from cpython.object cimport PyObject
from cython.view cimport array as cvarray
cdef array array_Py_ssize_t = array("q")
cpdef Py_ssize_t[::1] new_Py_ssize_t_array_0(Py_ssize_t n):
return newarrayobject(
(<PyObject*>array_Py_ssize_t).ob_type, n, array_Py_ssize_t.ob_descr
)
cdef sizeof_Py_ssize_t = sizeof(Py_ssize_t)
cpdef Py_ssize_t[::1] new_Py_ssize_t_array_1(Py_ssize_t n):
cdef cvarray a = cvarray((n,), sizeof_Py_ssize_t, b"q", "c", False)
a.data = <char*>PyMem_Malloc(n * sizeof(Py_ssize_t))
a.callback_free_data = PyMem_Free
return a Running it in IPython shows results like this In [1]: from new_arr import new_Py_ssize_t_array_0, new_Py_ssize_t_array_1
In [2]: %timeit new_Py_ssize_t_array_0(1)
179 ns ± 0.952 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [3]: %timeit new_Py_ssize_t_array_1(1)
232 ns ± 1.11 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit new_Py_ssize_t_array_0(5)
178 ns ± 0.74 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: %timeit new_Py_ssize_t_array_1(5)
235 ns ± 1.95 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [6]: %timeit new_Py_ssize_t_array_0(1_000)
213 ns ± 2.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [7]: %timeit new_Py_ssize_t_array_1(1_000)
284 ns ± 5.85 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) This is ~1/3 slower across cases. That said, the runtime difference is ~60ns. So roughly the time of 1-2 attribute access calls on an object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @jakirkham
/merge |
Thanks Mads! 🙏 |
In
Array
,Py_ssize_t[::1]
objects are currently backed by CPythonarray
's with some internal bits expressed in Cython. However these are not compatible with Python's Limited API and Stable ABI. To address that, switch to Cython's ownarray
type. As this is baked into Cython and doesn't use anything special, it is compatible with Python's Limited API and Stable ABI.xref: rapidsai/build-planning#42