Skip to content

Latest commit

 

History

History
724 lines (580 loc) · 39.3 KB

CHANGELOG.md

File metadata and controls

724 lines (580 loc) · 39.3 KB

CHANGELOG

[1.0.0-beta.6] - 2024-01-10

  • Do not create CPU copy of grad array when calling array.numpy()
  • Fix assert_np_equal() bug
  • Support Linux AArch64 platforms, including Jetson/Tegra devices
  • Add parallel testing runner (invoke with python -m warp.tests, use warp/tests/unittest_serial.py for serial testing)
  • Fix support for function calls in range()
  • matmul adjoints now accumulate
  • Expand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
  • Fix multi-gpu synchronization issue in sparse.py
  • Add depth rendering to OpenGLRenderer, document warp.render
  • Make atomic_min, atomic_max differentiable
  • Fix error reporting using the exact source segment
  • Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
  • Address multiple differentiability issues
  • Fix backpropagation for returning array element references
  • Support passing the return value to adjoints
  • Add point basis space and explicit point-based quadrature for warp.fem
  • Support overriding the LLVM project source directory path using build_lib.py --build_llvm --llvm_source_path=
  • Fix the error message for accessing non-existing attributes
  • Flatten faces array for Mesh constructor in URDF parser

[1.0.0-beta.5] - 2023-11-22

  • Fix for kernel caching when function argument types change
  • Fix code-gen ordering of dependent structs
  • Fix for wp.Mesh build on MGPU systems
  • Fix for name clash bug with adjoint code: NVIDIA/warp#154
  • Add wp.frac() for returning the fractional part of a floating point value
  • Add support for custom native CUDA snippets using @wp.func_native decorator
  • Add support for batched matmul with batch size > 2^16-1
  • Add support for transposed CUTLASS wp.matmul() and additional error checking
  • Add support for quad and hex meshes in wp.fem
  • Detect and warn when C++ runtime doesn't match compiler during build, e.g.: libstdc++.so.6: version `GLIBCXX_3.4.30' not found
  • Documentation update for wp.BVH
  • Documentation and simplified API for runtime kernel specialization wp.Kernel

[1.0.0-beta.4] - 2023-11-01

  • Add wp.cbrt() for cube root calculation
  • Add wp.mesh_furthest_point_no_sign() to compute furthest point on a surface from a query point
  • Add support for GPU BVH builds, 10-100x faster than CPU builds for large meshes
  • Add support for chained comparisons, i.e.: 0 < x < 2
  • Add support for running warp.fem examples headless
  • Fix for unit test determinism
  • Fix for possible GC collection of array during graph capture
  • Fix for wp.utils.array_sum() output initialization when used with vector types
  • Coverage and documentation updates

[1.0.0-beta.3] - 2023-10-19

  • Add support for code coverage scans (test_coverage.py), coverage at 85% in omni.warp.core
  • Add support for named component access for vector types, e.g.: a = v.x
  • Add support for lvalue expressions, e.g.: array[i] += b
  • Add casting constructors for matrix and vector types
  • Add support for type() operator that can be used to return type inside kernels
  • Add support for grid-stride kernels to support kernels with > 2^31-1 thread blocks
  • Fix for multi-process initialization warnings
  • Fix alignment issues with empty wp.struct
  • Fix for return statement warning with tuple-returning functions
  • Fix for wp.batched_matmul() registering the wrong function in the Tape
  • Fix and document for wp.sim forward + inverse kinematics
  • Fix for wp.func to return a default value if function does not return on all control paths
  • Refactor wp.fem support for new basis functions, decoupled function spaces
  • Optimizations for wp.noise functions, up to 10x faster in most cases
  • Optimizations for type_size_in_bytes() used in array construction'

Breaking Changes

  • To support grid-stride kernels, wp.tid() can no longer be called inside wp.func functions.

[1.0.0-beta.2] - 2023-09-01

  • Fix for passing bool into wp.func functions
  • Fix for deprecation warnings appearing on stderr, now redirected to stdout
  • Fix for using for i in wp.hash_grid_query(..) syntax

[1.0.0-beta.1] - 2023-08-29

  • Fix for wp.float16 being passed as kernel arguments
  • Fix for compile errors with kernels using structs in backward pass
  • Fix for wp.Mesh.refit() not being CUDA graph capturable due to synchronous temp. allocs
  • Fix for dynamic texture example flickering / MGPU crashes demo in Kit by reusing ui.DynamicImageProvider instances
  • Fix for a regression that disabled bundle change tracking in samples
  • Fix for incorrect surface velocities when meshes are deforming in OgnClothSimulate
  • Fix for incorrect lower-case when setting USD stage "up_axis" in examples
  • Fix for incompatible gradient types when wrapping PyTorch tensor as a vector or matrix type
  • Fix for adding open edges when building cloth constraints from meshes in wp.sim.ModelBuilder.add_cloth_mesh()
  • Add support for wp.fabricarray to directly access Fabric data from Warp kernels, see https://omniverse.gitlab-master-pages.nvidia.com/usdrt/docs/usdrt_prim_selection.html for examples
  • Add support for user defined gradient functions, see @wp.func_replay, and @wp.func_grad decorators
  • Add support for more OG attribute types in omni.warp.from_omni_graph()
  • Add support for creating NanoVDB wp.Volume objects from dense NumPy arrays
  • Add support for wp.volume_sample_grad_f() which returns the value + gradient efficiently from an NVDB volume
  • Add support for LLVM fp16 intrinsics for half-precision arithmetic
  • Add implementation of stochastic gradient descent, see wp.optim.SGD
  • Add warp.fem framework for solving weak-form PDE problems (see https://nvidia.github.io/warp/_build/html/modules/fem.html)
  • Optimizations for omni.warp extension load time (2.2s to 625ms cold start)
  • Make all omni.ui dependencies optional so that Warp unit tests can run headless
  • Deprecation of wp.tid() outside of kernel functions, users should pass tid() values to wp.func functions explicitly
  • Deprecation of wp.sim.Model.flatten() for returning all contained tensors from the model
  • Add support for clamping particle max velocity in wp.sim.Model.particle_max_velocity
  • Remove dependency on urdfpy package, improve MJCF parser handling of default values

[0.10.1] - 2023-07-25

  • Fix for large multidimensional kernel launches (> 2^32 threads)
  • Fix for module hashing with generics
  • Fix for unrolling loops with break or continue statements (will skip unrolling)
  • Fix for passing boolean arguments to build_lib.py (previously ignored)
  • Fix build warnings on Linux
  • Fix for creating array of structs from NumPy structured array
  • Fix for regression on kernel load times in Kit when using warp.sim
  • Update warp.array.reshape() to handle -1 dimensions
  • Update margin used by for mesh queries when using wp.sim.create_soft_body_contacts()
  • Improvements to gradient handling with warp.from_torch(), warp.to_torch() plus documentation

[0.10.0] - 2023-07-05

  • Add support for macOS universal binaries (x86 + aarch64) for M1+ support
  • Add additional methods for SDF generation please see the following new methods:
    • wp.mesh_query_point_nosign() - closest point query with no sign determination
    • wp.mesh_query_point_sign_normal() - closest point query with sign from angle-weighted normal
    • wp.mesh_query_point_sign_winding_number() - closest point query with fast winding number sign determination
  • Add CSR/BSR sparse matrix support, see warp.sparse module:
    • wp.sparse.BsrMatrix
    • wp.sparse.bsr_zeros(), wp.sparse.bsr_set_from_triplets() for construction
    • wp.sparse.bsr_mm(), wp.sparse_bsr_mv() for matrix-matrix and matrix-vector products respectively
  • Add array-wide utilities:
    • wp.utils.array_scan() - prefix sum (inclusive or exclusive)
    • wp.utils.array_sum() - sum across array
    • wp.utils.radix_sort_pairs() - in-place radix sort (key,value) pairs
  • Add support for calling @wp.func functions from Python (outside of kernel scope)
  • Add support for recording kernel launches using a wp.Launch object that can be replayed with low overhead, use wp.launch(..., record_cmd=True) to generate a command object
  • Optimizations for wp.struct kernel arguments, up to 20x faster launches for kernels with large structs or number of params
  • Refresh USD samples to use bundle based workflow + change tracking
  • Add Python API for manipulating mesh and point bundle data in OmniGraph, see omni.warp.nodes module
    • See omni.warp.nodes.mesh_create_bundle(), omni.warp.nodes.mesh_get_points(), etc.
  • Improvements to wp.array:
    • Fix a number of array methods misbehaving with empty arrays
    • Fix a number of bugs and memory leaks related to gradient arrays
    • Fix array construction when creating arrays in pinned memory from a data source in pageable memory
    • wp.empty() no longer zeroes-out memory and returns an uninitialized array, as intended
    • array.zero_() and array.fill_() work with non-contiguous arrays
    • Support wrapping non-contiguous NumPy arrays without a copy
    • Support preserving the outer dimensions of NumPy arrays when wrapping them as Warp arrays of vector or matrix types
    • Improve PyTorch and DLPack interop with Warp arrays of arbitrary vectors and matrices
    • array.fill_() can now take lists or other sequences when filling arrays of vectors or matrices, e.g. arr.fill_([[1, 2], [3, 4]])
    • array.fill_() now works with arrays of structs (pass a struct instance)
    • wp.copy() gracefully handles copying between non-contiguous arrays on different devices
    • Add wp.full() and wp.full_like(), e.g., a = wp.full(shape, value)
    • Add optional device argument to wp.empty_like(), wp.zeros_like(), wp.full_like(), and wp.clone()
    • Add indexedarray methods .zero_(), .fill_(), and .assign()
    • Fix indexedarray methods .numpy() and .list()
    • Fix array.list() to work with arrays of any Warp data type
    • Fix array.list() synchronization issue with CUDA arrays
    • array.numpy() called on an array of structs returns a structured NumPy array with named fields
    • Improve the performance of creating arrays
  • Fix for Error: No module named 'omni.warp.core' when running some Kit configurations (e.g.: stubgen)
  • Fix for wp.struct instance address being included in module content hash
  • Fix codegen with overridden function names
  • Fix for kernel hashing so it occurs after code generation and before loading to fix a bug with stale kernel cache
  • Fix for wp.BVH.refit() when executed on the CPU
  • Fix adjoint of wp.struct constructor
  • Fix element accessors for wp.float16 vectors and matrices in Python
  • Fix wp.float16 members in structs
  • Remove deprecated wp.ScopedCudaGuard(), please use wp.ScopedDevice() instead

[0.9.0] - 2023-06-01

  • Add support for in-place modifications to vector, matrix, and struct types inside kernels (will warn during backward pass with wp.verbose if using gradients)
  • Add support for step-through VSCode debugging of kernel code with standalone LLVM compiler, see wp.breakpoint(), and walkthrough_debug.py
  • Add support for default values on built-in functions
  • Add support for multi-valued @wp.func functions
  • Add support for pass, continue, and break statements
  • Add missing __sincos_stret symbol for macOS
  • Add support for gradient propagation through wp.Mesh.points, and other cases where arrays are passed to native functions
  • Add support for Python @ operator as an alias for wp.matmul()
  • Add XPBD support for particle-particle collision
  • Add support for individual particle radii: ModelBuilder.add_particle has a new radius argument, Model.particle_radius is now a Warp array
  • Add per-particle flags as a Model.particle_flags Warp array, introduce PARTICLE_FLAG_ACTIVE to define whether a particle is being simulated and participates in contact dynamics
  • Add support for Python bitwise operators &, |, ~, <<, >>
  • Switch to using standalone LLVM compiler by default for cpu devices
  • Split omni.warp into omni.warp.core for Omniverse applications that want to use the Warp Python module with minimal additional dependencies
  • Disable kernel gradient generation by default inside Omniverse for improved compile times
  • Fix for bounds checking on element access of vector/matrix types
  • Fix for stream initialization when a custom (non-primary) external CUDA context has been set on the calling thread
  • Fix for duplicate @wp.struct registration during hot reload
  • Fix for array unot() operator so kernel writers can use if not array: syntax
  • Fix for case where dynamic loops are nested within unrolled loops
  • Change wp.hash_grid_point_id() now returns -1 if the wp.HashGrid has not been reserved before
  • Deprecate wp.Model.soft_contact_distance which is now replaced by wp.Model.particle_radius
  • Deprecate single scalar particle radius (should be a per-particle array)

[0.8.2] - 2023-04-21

  • Add ModelBuilder.soft_contact_max to control the maximum number of soft contacts that can be registered. Use Model.allocate_soft_contacts(new_count) to change count on existing Model objects.
  • Add support for bool parameters
  • Add support for logical boolean operators with int types
  • Fix for wp.quat() default constructor
  • Fix conditional reassignments
  • Add sign determination using angle weighted normal version of wp.mesh_query_point() as wp.mesh_query_sign_normal()
  • Add sign determination using winding number of wp.mesh_query_point() as wp.mesh_query_sign_winding_number()
  • Add query point without sign determination wp.mesh_query_no_sign()

[0.8.1] - 2023-04-13

  • Fix for regression when passing flattened numeric lists as matrix arguments to kernels
  • Fix for regressions when passing wp.struct types with uninitialized (None) member attributes

[0.8.0] - 2023-04-05

  • Add Texture Write node for updating dynamic RTX textures from Warp kernels / nodes
  • Add multi-dimensional kernel support to Warp Kernel Node
  • Add wp.load_module() to pre-load specific modules (pass recursive=True to load recursively)
  • Add wp.poisson() for sampling Poisson distributions
  • Add support for UsdPhysics schema see warp.sim.parse_usd()
  • Add XPBD rigid body implementation plus diff. simulation examples
  • Add support for standalone CPU compilation (no host-compiler) with LLVM backed, enable with --standalone build option
  • Add support for per-timer color in wp.ScopedTimer()
  • Add support for row-based construction of matrix types outside of kernels
  • Add support for setting and getting row vectors for Python matrices, see matrix.get_row(), matrix.set_row()
  • Add support for instantiating wp.struct types within kernels
  • Add support for indexed arrays, slice = array[indices] will now generate a sparse slice of array data
  • Add support for generic kernel params, use def compute(param: Any):
  • Add support for with wp.ScopedDevice("cuda") as device: syntax (same for wp.ScopedStream(), wp.Tape())
  • Add support for creating custom length vector/matrices inside kernels, see wp.vector(), and wp.matrix()
  • Add support for creating identity matrices in kernels with, e.g.: I = wp.identity(n=3, dtype=float)
  • Add support for unary plus operator (wp.pos())
  • Add support for wp.constant variables to be used directly in Python without having to use .val member
  • Add support for nested wp.struct types
  • Add support for returning wp.struct from functions
  • Add --quick build for faster local dev. iteration (uses a reduced set of SASS arches)
  • Add optional requires_grad parameter to wp.from_torch() to override gradient allocation
  • Add type hints for generic vector / matrix types in Python stubs
  • Add support for custom user function recording in wp.Tape()
  • Add support for registering CUTLASS wp.matmul() with tape backward pass
  • Add support for grids with > 2^31 threads (each dimension may be up to INT_MAX in length)
  • Add CPU fallback for wp.matmul()
  • Optimizations for wp.launch(), up to 3x faster launches in common cases
  • Fix wp.randf() conversion to float to reduce bias for uniform sampling
  • Fix capture of wp.func and wp.constant types from inside Python closures
  • Fix for CUDA on WSL
  • Fix for matrices in structs
  • Fix for transpose indexing for some non-square matrices
  • Enable Python faulthandler by default
  • Update to VS2019

Breaking Changes

  • wp.constant variables can now be treated as their true type, accessing the underlying value through constant.val is no longer supported
  • wp.sim.model.ground_plane is now a wp.array to support gradient, users should call builder.set_ground_plane() to create the ground
  • wp.sim capsule, cones, and cylinders are now aligned with the default USD up-axis

[0.7.2] - 2023-02-15

  • Reduce test time for vec/math types
  • Clean-up CUDA disabled build pipeline
  • Remove extension.gen.toml to make Kit packages Python version independent
  • Handle additional cases for array indexing inside Python

[0.7.1] - 2023-02-14

  • Disabling some slow tests for Kit
  • Make unit tests run on first GPU only by default

[0.7.0] - 2023-02-13

  • Add support for arbitrary length / type vector and matrices e.g.: wp.vec(length=7, dtype=wp.float16), see wp.vec(), and wp.mat()
  • Add support for array.flatten(), array.reshape(), and array.view() with NumPy semantics
  • Add support for slicing wp.array types in Python
  • Add wp.from_ptr() helper to construct arrays from an existing allocation
  • Add support for break statements in ranged-for and while loops (backward pass support currently not implemented)
  • Add built-in mathematic constants, see wp.pi, wp.e, wp.log2e, etc.
  • Add built-in conversion between degrees and radians, see wp.degrees(), wp.radians()
  • Add security pop-up for Kernel Node
  • Improve error handling for kernel return values

[0.6.3] - 2023-01-31

  • Add DLPack utilities, see wp.from_dlpack(), wp.to_dlpack()
  • Add Jax utilities, see wp.from_jax(), wp.to_jax(), wp.device_from_jax(), wp.device_to_jax()
  • Fix for Linux Kit extensions OM-80132, OM-80133

[0.6.2] - 2023-01-19

  • Updated wp.from_torch() to support more data types
  • Updated wp.from_torch() to automatically determine the target Warp data type if not specified
  • Updated wp.from_torch() to support non-contiguous tensors with arbitrary strides
  • Add CUTLASS integration for dense GEMMs, see wp.matmul() and wp.matmul_batched()
  • Add QR and Eigen decompositions for mat33 types, see wp.qr3(), and wp.eig3()
  • Add default (zero) constructors for matrix types
  • Add a flag to suppress all output except errors and warnings (set wp.config.quiet = True)
  • Skip recompilation when Kernel Node attributes are edited
  • Allow optional attributes for Kernel Node
  • Allow disabling backward pass code-gen on a per-kernel basis, use @wp.kernel(enable_backward=False)
  • Replace Python imp package with importlib
  • Fix for quaternion slerp gradients (wp.quat_slerp())

[0.6.1] - 2022-12-05

  • Fix for non-CUDA builds
  • Fix strides computation in array_t constructor, fixes a bug with accessing mesh indices through mesh.indices[]
  • Disable backward pass code generation for kernel node (4-6x faster compilation)
  • Switch to linbuild for universal Linux binaries (affects TeamCity builds only)

[0.6.0] - 2022-11-28

  • Add support for CUDA streams, see wp.Stream, wp.get_stream(), wp.set_stream(), wp.synchronize_stream(), wp.ScopedStream
  • Add support for CUDA events, see wp.Event, wp.record_event(), wp.wait_event(), wp.wait_stream(), wp.Stream.record_event(), wp.Stream.wait_event(), wp.Stream.wait_stream()
  • Add support for PyTorch stream interop, see wp.stream_from_torch(), wp.stream_to_torch()
  • Add support for allocating host arrays in pinned memory for asynchronous data transfers, use wp.array(..., pinned=True) (default is non-pinned)
  • Add support for direct conversions between all scalar types, e.g.: x = wp.uint8(wp.float64(3.0))
  • Add per-module option to enable fast math, use wp.set_module_options({"fast_math": True}), fast math is now disabled by default
  • Add support for generating CUBIN kernels instead of PTX on systems with older drivers
  • Add user preference options for CUDA kernel output ("ptx" or "cubin", e.g.: wp.config.cuda_output = "ptx" or per-module wp.set_module_options({"cuda_output": "ptx"}))
  • Add kernel node for OmniGraph
  • Add wp.quat_slerp(), wp.quat_to_axis_angle(), wp.rotate_rodriquez() and adjoints for all remaining quaternion operations
  • Add support for unrolling for-loops when range is a wp.constant
  • Add support for arithmetic operators on built-in vector / matrix types outside of wp.kernel
  • Add support for multiple solution variables in wp.optim Adam optimization
  • Add nested attribute support for wp.struct attributes
  • Add missing adjoint implementations for spatial math types, and document all functions with missing adjoints
  • Add support for retrieving NanoVDB tiles and voxel size, see wp.Volume.get_tiles(), and wp.Volume.get_voxel_size()
  • Add support for store operations on integer NanoVDB volumes, see wp.volume_store_i()
  • Expose wp.Mesh points, indices, as arrays inside kernels, see wp.mesh_get()
  • Optimizations for wp.array construction, 2-3x faster on average
  • Optimizations for URDF import
  • Fix various deployment issues by statically linking with all CUDA libs
  • Update warp.so/warp.dll to CUDA Toolkit 11.5

[0.5.1] - 2022-11-01

  • Fix for unit tests in Kit

[0.5.0] - 2022-10-31

  • Add smoothed particle hydrodynamics (SPH) example, see example_sph.py
  • Add support for accessing array.shape inside kernels, e.g.: width = arr.shape[0]
  • Add dependency tracking to hot-reload modules if dependencies were modified
  • Add lazy acquisition of CUDA kernel contexts (save ~300Mb of GPU memory in MGPU environments)
  • Add BVH object, see wp.Bvh and bvh_query_ray(), bvh_query_aabb() functions
  • Add component index operations for spatial_vector, spatial_matrix types
  • Add wp.lerp() and wp.smoothstep() builtins
  • Add wp.optim module with implementation of the Adam optimizer for float and vector types
  • Add support for transient Python modules (fix for Houdini integration)
  • Add wp.length_sq(), wp.trace() for vector / matrix types respectively
  • Add missing adjoints for wp.quat_rpy(), wp.determinant()
  • Add wp.atomic_min(), wp.atomic_max() operators
  • Add vectorized version of warp.sim.model.add_cloth_mesh()
  • Add NVDB volume allocation API, see wp.Volume.allocate(), and wp.Volume.allocate_by_tiles()
  • Add NVDB volume write methods, see wp.volume_store_i(), wp.volume_store_f(), wp.volume_store_v()
  • Add MGPU documentation
  • Add example showing how to compute Jacobian of multiple environments in parallel, see example_jacobian_ik.py
  • Add wp.Tape.zero() support for wp.struct types
  • Make SampleBrowser an optional dependency for Kit extension
  • Make wp.Mesh object accept both 1d and 2d arrays of face vertex indices
  • Fix for reloading of class member kernel / function definitions using importlib.reload()
  • Fix for hashing of wp.constants() not invalidating kernels
  • Fix for reload when multiple .ptx versions are present
  • Improved error reporting during code-gen

[0.4.3] - 2022-09-20

  • Update all samples to use GPU interop path by default
  • Fix for arrays > 2GB in length
  • Add support for per-vertex USD mesh colors with warp.render class

[0.4.2] - 2022-09-07

  • Register Warp samples to the sample browser in Kit
  • Add NDEBUG flag to release mode kernel builds
  • Fix for particle solver node when using a large number of particles
  • Fix for broken cameras in Warp sample scenes

[0.4.1] - 2022-08-30

  • Add geometry sampling methods, see wp.sample_unit_cube(), wp.sample_unit_disk(), etc
  • Add wp.lower_bound() for searching sorted arrays
  • Add an option for disabling code-gen of backward pass to improve compilation times, see wp.set_module_options({"enable_backward": False}), True by default
  • Fix for using Warp from Script Editor or when module does not have a __file__ attribute
  • Fix for hot reload of modules containing wp.func() definitions
  • Fix for debug flags not being set correctly on CUDA when wp.config.mode == "debug", this enables bounds checking on CUDA kernels in debug mode
  • Fix for code gen of functions that do not return a value

[0.4.0] - 2022-08-09

  • Fix for FP16 conversions on GPUs without hardware support
  • Fix for runtime = None errors when reloading the Warp module
  • Fix for PTX architecture version when running with older drivers, see wp.config.ptx_target_arch
  • Fix for USD imports from __init__.py, defer them to individual functions that need them
  • Fix for robustness issues with sign determination for wp.mesh_query_point()
  • Fix for wp.HashGrid memory leak when creating/destroying grids
  • Add CUDA version checks for toolkit and driver
  • Add support for cross-module @wp.struct references
  • Support running even if CUDA initialization failed, use wp.is_cuda_available() to check availability
  • Statically linking with the CUDA runtime library to avoid deployment issues

Breaking Changes

  • Removed wp.runtime reference from the top-level module, as it should be considered private

[0.3.2] - 2022-07-19

  • Remove Torch import from __init__.py, defer import to wp.from_torch(), wp.to_torch()

[0.3.1] - 2022-07-12

  • Fix for marching cubes reallocation after initialization
  • Add support for closest point between line segment tests, see wp.closest_point_edge_edge() builtin
  • Add support for per-triangle elasticity coefficients in simulation, see wp.sim.ModelBuilder.add_cloth_mesh()
  • Add support for specifying default device, see wp.set_device(), wp.get_device(), wp.ScopedDevice
  • Add support for multiple GPUs (e.g., "cuda:0", "cuda:1"), see wp.get_cuda_devices(), wp.get_cuda_device_count(), wp.get_cuda_device()
  • Add support for explicitly targeting the current CUDA context using device alias "cuda"
  • Add support for using arbitrary external CUDA contexts, see wp.map_cuda_device(), wp.unmap_cuda_device()
  • Add PyTorch device aliasing functions, see wp.device_from_torch(), wp.device_to_torch()

Breaking Changes

  • A CUDA device is used by default, if available (aligned with wp.get_preferred_device())
  • wp.ScopedCudaGuard is deprecated, use wp.ScopedDevice instead
  • wp.synchronize() now synchronizes all devices; for finer-grained control, use wp.synchronize_device()
  • Device alias "cuda" now refers to the current CUDA context, rather than a specific device like "cuda:0" or "cuda:1"

[0.3.0] - 2022-07-08

  • Add support for FP16 storage type, see wp.float16
  • Add support for per-dimension byte strides, see wp.array.strides
  • Add support for passing Python classes as kernel arguments, see @wp.struct decorator
  • Add additional bounds checks for builtin matrix types
  • Add additional floating point checks, see wp.config.verify_fp
  • Add interleaved user source with generated code to aid debugging
  • Add generalized GPU marching cubes implementation, see wp.MarchingCubes class
  • Add additional scalar*matrix vector operators
  • Add support for retrieving a single row from builtin types, e.g.: r = m33[i]
  • Add wp.log2() and wp.log10() builtins
  • Add support for quickly instancing wp.sim.ModelBuilder objects to improve env. creation performance for RL
  • Remove custom CUB version and improve compatibility with CUDA 11.7
  • Fix to preserve external user-gradients when calling wp.Tape.zero()
  • Fix to only allocate gradient of a Torch tensor if requires_grad=True
  • Fix for missing wp.mat22 constructor adjoint
  • Fix for ray-cast precision in edge case on GPU (watertightness issue)
  • Fix for kernel hot-reload when definition changes
  • Fix for NVCC warnings on Linux
  • Fix for generated function names when kernels are defined as class functions
  • Fix for reload of generated CPU kernel code on Linux
  • Fix for example scripts to output USD at 60 timecodes per-second (better Kit compatibility)

[0.2.3] - 2022-06-13

  • Fix for incorrect 4d array bounds checking
  • Fix for wp.constant changes not updating module hash
  • Fix for stale CUDA kernel cache when CPU kernels launched first
  • Array gradients are now allocated along with the arrays and accessible as wp.array.grad, users should take care to always call wp.Tape.zero() to clear gradients between different invocations of wp.Tape.backward()
  • Added wp.array.fill_() to set all entries to a scalar value (4-byte values only currently)

Breaking Changes

  • Tape capture option has been removed, users can now capture tapes inside existing CUDA graphs (e.g.: inside Torch)
  • Scalar loss arrays should now explicitly set requires_grad=True at creation time

[0.2.2] - 2022-05-30

  • Fix for from import * inside Warp initialization
  • Fix for body space velocity when using deforming Mesh objects with scale
  • Fix for noise gradient discontinuities affecting wp.curlnoise()
  • Fix for wp.from_torch() to correctly preserve shape
  • Fix for URDF parser incorrectly passing density to scale parameter
  • Optimizations for startup time from 3s -> 0.3s
  • Add support for custom kernel cache location, Warp will now store generated binaries in the user's application directory
  • Add support for cross-module function references, e.g.: call another modules @wp.func functions
  • Add support for overloading @wp.func functions based on argument type
  • Add support for calling built-in functions directly from Python interpreter outside kernels (experimental)
  • Add support for auto-complete and docstring lookup for builtins in IDEs like VSCode, PyCharm, etc
  • Add support for doing partial array copies, see wp.copy() for details
  • Add support for accessing mesh data directly in kernels, see wp.mesh_get_point(), wp.mesh_get_index(), wp.mesh_eval_face_normal()
  • Change to only compile for targets where kernel is launched (e.g.: will not compile CPU unless explicitly requested)

Breaking Changes

  • Builtin methods such as wp.quat_identity() now call the Warp native implementation directly and will return a wp.quat object instead of NumPy array
  • NumPy implementations of many builtin methods have been moved to warp.utils and will be deprecated
  • Local @wp.func functions should not be namespaced when called, e.g.: previously wp.myfunc() would work even if myfunc() was not a builtin
  • Removed wp.rpy2quat(), please use wp.quat_rpy() instead

[0.2.1] - 2022-05-11

  • Fix for unit tests in Kit

[0.2.0] - 2022-05-02

Warp Core

  • Fix for unrolling loops with negative bounds
  • Fix for unresolved symbol hash_grid_build_device() not found when lib is compiled without CUDA support
  • Fix for failure to load nvrtc-builtins64_113.dll when user has a newer CUDA toolkit installed on their machine
  • Fix for conversion of Torch tensors to wp.arrays() with a vector dtype (incorrect row count)
  • Fix for warp.dll not found on some Windows installations
  • Fix for macOS builds on Clang 13.x
  • Fix for step-through debugging of kernels on Linux
  • Add argument type checking for user defined @wp.func functions
  • Add support for custom iterable types, supports ranges, hash grid, and mesh query objects
  • Add support for multi-dimensional arrays, for example use x = array[i,j,k] syntax to address a 3-dimensional array
  • Add support for multi-dimensional kernel launches, use launch(kernel, dim=(i,j,k), ... and i,j,k = wp.tid() to obtain thread indices
  • Add support for bounds-checking array memory accesses in debug mode, use wp.config.mode = "debug" to enable
  • Add support for differentiating through dynamic and nested for-loops
  • Add support for evaluating MLP neural network layers inside kernels with custom activation functions, see wp.mlp()
  • Add additional NVDB sampling methods and adjoints, see wp.volume_sample_i(), wp.volume_sample_f(), and wp.volume_sample_vec()
  • Add support for loading zlib compressed NVDB volumes, see wp.Volume.load_from_nvdb()
  • Add support for triangle intersection testing, see wp.intersect_tri_tri()
  • Add support for NVTX profile zones in wp.ScopedTimer()
  • Add support for additional transform and quaternion math operations, see wp.inverse(), wp.quat_to_matrix(), wp.quat_from_matrix()
  • Add fast math (--fast-math) to kernel compilation by default
  • Add warp.torch import by default (if PyTorch is installed)

Warp Kit

  • Add Kit menu for browsing Warp documentation and example scenes under 'Window->Warp'
  • Fix for OgnParticleSolver.py example when collider is coming from Read Prim into Bundle node

Warp Sim

  • Fix for joint attachment forces
  • Fix for URDF importer and floating base support
  • Add examples showing how to use differentiable forward kinematics to solve inverse kinematics
  • Add examples for URDF cartpole and quadruped simulation

Breaking Changes

  • wp.volume_sample_world() is now replaced by wp.volume_sample_f/i/vec() which operate in index (local) space. Users should use wp.volume_world_to_index() to transform points from world space to index space before sampling.
  • wp.mlp() expects multi-dimensional arrays instead of one-dimensional arrays for inference, all other semantics remain the same as earlier versions of this API.
  • wp.array.length member has been removed, please use wp.array.shape to access array dimensions, or use wp.array.size to get total element count
  • Marking dense_gemm(), dense_chol(), etc methods as experimental until we revisit them

[0.1.25] - 2022-03-20

  • Add support for class methods to be Warp kernels
  • Add HashGrid reserve() so it can be used with CUDA graphs
  • Add support for CUDA graph capture of tape forward/backward passes
  • Add support for Python 3.8.x and 3.9.x
  • Add hyperbolic trigonometric functions, see wp.tanh(), wp.sinh(), wp.cosh()
  • Add support for floored division on integer types
  • Move tests into core library so they can be run in Kit environment

[0.1.24] - 2022-03-03

Warp Core

  • Add NanoVDB support, see wp.volume_sample*() methods
  • Add support for reading compile-time constants in kernels, see wp.constant()
  • Add support for cuda_array_interface protocol for zero-copy interop with PyTorch, see wp.torch.to_torch()
  • Add support for additional numeric types, i8, u8, i16, u16, etc
  • Add better checks for device strings during allocation / launch
  • Add support for sampling random numbers with a normal distribution, see wp.randn()
  • Upgrade to CUDA 11.3
  • Update example scenes to Kit 103.1
  • Deduce array dtype from np.array when one is not provided
  • Fix for ranged for loops with negative step sizes
  • Fix for 3d and 4d spherical gradient distributions

[0.1.23] - 2022-02-17

Warp Core

  • Fix for generated code folder being removed during Showroom installation
  • Fix for macOS support
  • Fix for dynamic for-loop code gen edge case
  • Add procedural noise primitives, see noise(), pnoise(), curlnoise()
  • Move simulation helpers our of test into warp.sim module

[0.1.22] - 2022-02-14

Warp Core

  • Fix for .so reloading on Linux
  • Fix for while loop code-gen in some edge cases
  • Add rounding functions round(), rint(), trunc(), floor(), ceil()
  • Add support for printing strings and formatted strings from kernels
  • Add MSVC compiler version detection and require minimum

Warp Sim

  • Add support for universal and compound joint types

[0.1.21] - 2022-01-19

Warp Core

  • Fix for exception on shutdown in empty wp.array objects
  • Fix for hot reload of CPU kernels in Kit
  • Add hash grid primitive for point-based spatial queries, see hash_grid_query(), hash_grid_query_next()
  • Add new PRNG methods using PCG-based generators, see rand_init(), randf(), randi()
  • Add support for AABB mesh queries, see mesh_query_aabb(), mesh_query_aabb_next()
  • Add support for all Python range() loop variants
  • Add builtin vec2 type and additional math operators, pow(), tan(), atan(), atan2()
  • Remove dependency on CUDA driver library at build time
  • Remove unused NVRTC binary dependencies (50mb smaller Linux distribution)

Warp Sim

  • Bundle import of multiple shapes for simulation nodes
  • New OgnParticleVolume node for sampling shapes -> particles
  • New OgnParticleSolver node for DEM style granular materials

[0.1.20] - 2021-11-02

  • Updates to the ripple solver for GTC (support for multiple colliders, buoyancy, etc)

[0.1.19] - 2021-10-15

  • Publish from 2021.3 to avoid omni.graph database incompatibilities

[0.1.18] - 2021-10-08

  • Enable Linux support (tested on 20.04)

[0.1.17] - 2021-09-30

  • Fix for 3x3 SVD adjoint
  • Fix for A6000 GPU (bump compute model to sm_52 minimum)
  • Fix for .dll unload on rebuild
  • Fix for possible array destruction warnings on shutdown
  • Rename spatial_transform -> transform
  • Documentation update

[0.1.16] - 2021-09-06

  • Fix for case where simple assignments (a = b) incorrectly generated reference rather than value copy
  • Handle passing zero-length (empty) arrays to kernels

[0.1.15] - 2021-09-03

  • Add additional math library functions (asin, etc)
  • Add builtin 3x3 SVD support
  • Add support for named constants (True, False, None)
  • Add support for if/else statements (differentiable)
  • Add custom memset kernel to avoid CPU overhead of cudaMemset()
  • Add rigid body joint model to warp.sim (based on Brax)
  • Add Linux, MacOS support in core library
  • Fix for incorrectly treating pure assignment as reference instead of value copy
  • Removes the need to transfer array to CPU before numpy conversion (will be done implicitly)
  • Update the example OgnRipple wave equation solver to use bundles

[0.1.14] - 2021-08-09

  • Fix for out-of-bounds memory access in CUDA BVH
  • Better error checking after kernel launches (use warp.config.verify_cuda=True)
  • Fix for vec3 normalize adjoint code

[0.1.13] - 2021-07-29

  • Remove OgnShrinkWrap.py test node

[0.1.12] - 2021-07-29

  • Switch to Woop et al.'s watertight ray-tri intersection test
  • Disable --fast-math in CUDA compilation step for improved precision

[0.1.11] - 2021-07-28

  • Fix for mesh_query_ray() returning incorrect t-value

[0.1.10] - 2021-07-28

  • Fix for OV extension fwatcher filters to avoid hot-reload loop due to OGN regeneration

[0.1.9] - 2021-07-21

  • Fix for loading sibling DLL paths
  • Better type checking for built-in function arguments
  • Added runtime docs, can now list all builtins using wp.print_builtins()

[0.1.8] - 2021-07-14

  • Fix for hot-reload of CUDA kernels
  • Add Tape object for replaying differentiable kernels
  • Add helpers for Torch interop (convert torch.Tensor to wp.Array)

[0.1.7] - 2021-07-05

  • Switch to NVRTC for CUDA runtime
  • Allow running without host compiler
  • Disable asserts in kernel release mode (small perf. improvement)

[0.1.6] - 2021-06-14

  • Look for CUDA toolchain in target-deps

[0.1.5] - 2021-06-14

  • Rename OgLang -> Warp
  • Improve CUDA environment error checking
  • Clean-up some logging, add verbose mode (warp.config.verbose)

[0.1.4] - 2021-06-10

  • Add support for mesh raycast

[0.1.3] - 2021-06-09

  • Add support for unary negation operator
  • Add support for mutating variables during dynamic loops (non-differentiable)
  • Add support for in-place operators
  • Improve kernel cache start up times (avoids adjointing before cache check)
  • Update README.md with requirements / examples

[0.1.2] - 2021-06-03

  • Add support for querying mesh velocities

  • Add CUDA graph support, see warp.capture_begin(), warp.capture_end(), warp.capture_launch()

  • Add explicit initialization phase, warp.init()

  • Add variational Euler solver (sim)

  • Add contact caching, switch to nonlinear friction model (sim)

  • Fix for Linux/macOS support

[0.1.1] - 2021-05-18

  • Fix bug with conflicting CUDA contexts

[0.1.0] - 2021-05-17

  • Initial publish for alpha testing