Skip to content

v0.16.0

Latest
Compare
Choose a tag to compare
@antonwolfy antonwolfy released this 15 Oct 14:25
· 62 commits to master since this release
cc58db0

Summary

This release reaches an important milestone by making offloading fully asynchronous. Calls to dpnp submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish. The sequential semantics a user comes to expect from execution of Python script is preserved though.

In addition, this release completes implementation of dpnp.fft module and adds several new array manipulation, indexing and elementwise routines. Moreover, it adds support to build dpnp for Nvidia GPUs.

DPNP is now compatible with NumPy 2.0.

Details

Added

  • Added implementation of dpnp.gradient function #1859
  • Added implementation of dpnp.sort_complex function #1864
  • Added implementation of dpnp.fft.fft and dpnp.fft.ifft functions #1879
  • Added implementation of dpnp.isneginf and dpnp.isposinf functions #1888
  • Added implementation of dpnp.fft.fftfreq and dpnp.fft.rfftfreq functions #1898
  • Added implementation of dpnp.fft.fftshift and dpnp.fft.ifftshift functions #1900
  • Added implementation of dpnp.isreal, dpnp.isrealobj, dpnp.iscomplex, and dpnp.iscomplexobj functions #1916
  • Added support to build dpnp for Nvidia GPU #1926
  • Added implementation of dpnp.fft.rfft and dpnp.fft.irfft functions #1928
  • Added implementation of dpnp.nextafter function #1938
  • Added implementation of dpnp.trim_zero function #1941
  • Added implementation of dpnp.fft.hfft and dpnp.fft.ihfft functions #1954
  • Added implementation of dpnp.logaddexp2 function #1955
  • Added implementation of dpnp.flatnonzero function #1956
  • Added implementation of dpnp.float_power function #1957
  • Added implementation of dpnp.fft.fft2, dpnp.fft.ifft2, dpnp.fft.fftn, and dpnp.fft.ifftn functions #1961
  • Added implementation of dpnp.array_equal and dpnp.array_equiv functions #1965
  • Added implementation of dpnp.nan_to_num function #1966
  • Added implementation of dpnp.fix function #1971
  • Added implementation of dpnp.fft.rfft2, dpnp.fft.irfft2, dpnp.fft.rfftn, and dpnp.fft.irfftn functions #1982
  • Added implementation of dpnp.argwhere function #2000
  • Added implementation of dpnp.real_if_close function #2002
  • Added implementation of dpnp.ndim and dpnp.size functions #2014
  • Added implementation of dpnp.append and dpnp.asarray_chkfinite functions #2015
  • Added implementation of dpnp.array_split, dpnp.split, dpnp.hsplit, dpnp.vsplit, and dpnp.dsplit functions #2017
  • Added runtime dependency on intel-gpu-ocl-icd-system package #2023
  • Added implementation of dpnp.ravel_multi_index and dpnp.unravel_index functions #2022
  • Added implementation of dpnp.resize and dpnp.rot90 functions #2030
  • Added implementation of dpnp.require function #2036

Changed

  • Extended pre-commit pylint check to dpnp.fft module #1860
  • Reworked vm vector math backend to reuse dpctl.tensor functions around unary and binary functions #1868
  • Extended dpnp.ndarray.astype method to support device keyword argument #1870
  • Improved performance of dpnp.linalg.solve by implementing a dedicated kernel for its batch implementation #1877
  • Extended dpnp.fabs to support order and out keyword arguments by writing a dedicated kernel for it #1878
  • Extended dpnp.linalg module to support usm_ndarray as input #1880
  • Reworked dpnp.mod implementation to be an alias for dpnp.remainder #1882
  • Removed the legacy implementation of linear algebra functions from the backend #1887
  • Removed the legacy implementation of elementwise functions from the backend #1890
  • Extended dpnp.all and dpnp.any to support out keyword argument #1893
  • Reworked dpnp.repeat to add a explicit type check of input array #1894
  • Improved performance of different functions by adopting asynchronous implementation of dpctl #1897
  • Extended dpnp.fmax and dpnp.fmin to support order and out keyword arguments by writing dedicated kernels for them #1905
  • Removed the legacy implementation of array creation and manipulation functions from the backend #1903
  • Extended dpnp.extract implementation to align with NumPy #1906
  • Reworked backend implementation to align with non-backward compatible changes in DPC++ 2025.0 #1907
  • Removed the legacy implementation of indexing functions from the backend #1908
  • Extended dpnp.take implementation to align with NumPy #1909
  • Extended dpnp.place implementation to align with NumPy #1912
  • Reworked the implementation of indexing functions to avoid unnecessary casting to dpnp_array when input is usm_ndarray #1913
  • Reduced code duplication in the implementation of sorting functions #1914
  • Removed the obsolete dparray interface #1915
  • Improved performance of dpnp.linalg module for BLAS routines by adopting asynchronous implementation of dpctl #1919
  • Relocated dpnp.einsum utility functions to a separate file #1920
  • Improved performance of dpnp.linalg module for LAPACK routines by adopting asynchronous implementation of dpctl #1922
  • Reworked dpnp.matmul to allow larger batch size to be used #1927
  • Removed data synchronization where it is not needed #1930
  • Leveraged dpctl.tensor implementation for dpnp.where to support scalar as input #1932
  • Improved performance of dpnp.linalg.eigh by implementing a dedicated kernel for its batch implementation #1936
  • Reworked dpnp.isclose and dpnp.allclose to comply with compute follows data approach #1937
  • Extended dpnp.deg2rad and dpnp.radians to support order and out keyword arguments by writing dedicated kernels for them #1943
  • dpnp uses pybind11 2.13.1 #1944
  • Extended dpnp.degrees and dpnp.rad2deg to support order and out keyword arguments by writing dedicated kernels for them #1949
  • Extended dpnp.unwrap to support all keyword arguments provided by NumPy #1950
  • Leveraged dpctl.tensor implementation for dpnp.count_nonzero function #1962
  • Leveraged dpctl.tensor implementation for dpnp.diff function #1963
  • Leveraged dpctl.tensor implementation for dpnp.take_along_axis function #1969
  • Reworked dpnp.ediff1d implementation through existing functions instead of a separate kernel #1970
  • Reworked dpnp.unique implementation through existing functions when axis is given otherwise through leveraging dpctl.tensor implementation #1972
  • Improved performance of dpnp.linalg.svd by implementing a dedicated kernel for its batch implementation #1936
  • Leveraged dpctl.tensor implementation for shape.setter method #1975
  • Extended dpnp.ndarray.copy to support compute follow data keyword arguments #1976
  • Reworked dpnp.select implementation through existing functions instead of a separate kernel #1977
  • Leveraged dpctl.tensor implementation for dpnp.from_dlpack and dpnp.ndarray.__dlpack__ functions #1980
  • Reworked dpnp.linalg module backend implementation for BLAS rouitnes to work with OneMKL interfaces #1981
  • Reworked dpnp.ediff1d implementation to reduce code duplication #1983
  • dpnp can be used with any NumPy from 1.23 to 2.0 #1985
  • Reworked dpnp.unique implementation to properly handle NaNs values #1972
  • Removed dpnp.issubcdtype per NumPy 2.0 recommendation #1996
  • Reworked dpnp.unique implementation to align with NumPy 2.0 #1999
  • Reworked dpnp.linalg.solve backend implementation to work with OneMKL Interfaces #2001
  • Reworked dpnp.trapezoid implementation through existing functions instead of falling back on NumPy #2003
  • Added copy keyword to dpnp.array to align with NumPy 2.0 #2006
  • Extended dpnp.heaviside to support order and out keyword arguments by writing dedicated kernel for it #2008
  • dpnp uses pybind11 2.13.5 #2010
  • Added COMPILER_VERSION_2025_OR_LATER flag to be able to run dpnp.fft module with both 2024.2 and 2025.0 versions of the compiler #2025
  • Cleaned up an implementation of dpnp.gradient by removing obsolete TODO which is not going to be done #2032
  • Updated Array Manipulation Routines page in documentation to add missing functions and to remove duplicate entries #2033
  • dpnp uses pybind11 2.13.6 #2041
  • Updated dpnp.fft backend to depend on INTEL_MKL_VERSION flag to ensures that the appropriate code segment is executed based on the version of OneMKL #2035
  • Use dpctl::tensor::alloc_utils::sycl_free_noexcept instead of sycl::free in host_task tasks associated with life-time management of temporary USM allocations #2058
  • Improved implementation of dpnp.kron to avoid unnecessary copy for non-contiguous arrays #2059
  • Updated the test suit for dpnp.fft module #2071
  • Reworked dpnp.clip implementation to align with Python Array API 2023.12 specification #2048
  • Skipped outdated tests for dpnp.linalg.solve due to compatibility issues with NumPy 2.0 #2074
  • Updated installation instructions #2098

Fixed

  • Resolved an issue with dpnp.matmul when an f_contiguous out keyword is passed to the the function #1872
  • Resolved a possible race condition in dpnp.inv #1940
  • Resolved an issue with failing tests for dpnp.append when running on a device without fp64 support #2034
  • Resolved an issue with input array of usm_ndarray passed into dpnp.ix_ #2047
  • Added a workaround to prevent crash in tests on Windows in internal CI/CD (when running on either Lunar Lake or Arrow Lake) #2062
  • Fixed a crash in dpnp.choose caused by missing control of releasing temporary allocated device memory #2063
  • Resolved compilation warning and error while building in debug mode #2066
  • Fixed an issue with asynchronous execution in dpnp.fft module #2067

Full Changelog: 0.15.0...0.16.0