-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to NonStationaryConvolve2D #466
Comments
@cako can you look at the last point and let me know what you think :) |
Yup, that is a good idea. In fact that is what I first thought the operator did hahaha. So it would be nice to have both, |
@cako I am struggling a bit to implement a bilinear interpolator that is called by all versions (numpy, numba, cuda-numba)... More specifically, letting cuda-numba aside for a moment, I would like to have a function that performs interpolation (
Any smart idea? |
How about this: from math import floor
import numba
import numpy as np
class InterpolateAt:
# A JIT function cannot be a standard class method
# (takes `self`, Numba can't handle) or a classmethod
# (takes `cls`, Numba can't handle). At most, they can
# be staticmethods. So, here are three options:
# 1. Create a non-JIT and a JIT staticmethod, bind
# at runtime the correct one depending on the
# engine.
# 2. Create two classes e.g., InterpolateAtNumpy
# and InterpolateAtNumba, and return the
# appropriate one based on engine upon the
# call to self.__new__.
# See: https://santoshk.dev/posts/2022/__init__-vs-__new__-and-when-to-use-them/
# 3. Bind only the non-JIT functions as staticmethods
# when possible, create all other JIT-ed functions
# when calling self.__init__.
#
# This class follows approach 3.
def __init__(self, i, engine="numpy"):
# Local, possibly JIT-ed interpolation.
_interpolate_local = (
numba.jit(nopython=True)(self._interpolate)
if engine == "numba"
else self._interpolate
)
# This is not necessary but replaces the original
# staticmethod. This replacement is not a staticmethod
# but that probably doesn't matter.
self._interpolate = _interpolate_local
# Alternatively, you can apply staticmethod, but
# you lose the Numba attributes.
# self._interpolate = staticmethod(_interpolate_local)
# Creates a function with bound `i`. This
# allows the compitation using Numba.
def _matvec_rmatvec_local(x):
return _interpolate_local(x, i)
# Attaches the result of a possibly JIT-ed
# `_matvec_rmatvec_local` to a the class, so it can
# be accessible by external or internal methods.
self._matvec_rmatvec = (
numba.jit(nopython=True)(_matvec_rmatvec_local)
if engine == "numba"
else _matvec_rmatvec_local
)
# This staticmethod decorator can be removed if not
# required. Alternatively, _interpolate can be defined
# outside the class if it needs to be used by other classes.
@staticmethod
def _interpolate(ary, i):
ifloor = floor(i)
w = i - ifloor
if ifloor < 0 or ifloor + 1 > len(ary) - 1:
return np.nan
return (1 - w) * ary[ifloor] + w * ary[ifloor + 1]
# Cannot be JIT-ed, but is a thin wrapper over a JIT-ed
# function.
def _matvec(self, x):
return self._matvec_rmatvec(x)
print(InterpolateAt._interpolate)
# <function InterpolateAt._interpolate at 0x7ff8fc2b9360>
print(InterpolateAt._matvec_rmatvec)
# Doesn't exist yet, raises AttributeError
x = np.arange(10)
IOpPy = InterpolateAt(1.5, engine="numpy")
assert (
IOpPy._matvec_rmatvec(x) == IOpPy._matvec(x) == IOpPy._interpolate(x, 1.5)
)
print(IOpPy._interpolate(x, 1.5))
# 1.5
print(type(IOpPy._matvec_rmatvec))
# <class 'function'>
print(type(IOpPy._interpolate))
# <class 'function'>
IOpNB = InterpolateAt(1.5, engine="numba")
assert (
IOpNB._matvec_rmatvec(x) == IOpNB._matvec(x) == IOpNB._interpolate(x, 1.5)
)
print(IOpNB._interpolate(x, 1.5))
# 1.5
print(type(IOpNB._matvec_rmatvec))
# <class 'numba.core.registry.CPUDispatcher'>
print(type(IOpNB._interpolate))
# <class 'numba.core.registry.CPUDispatcher'>
print(IOpNB._matvec_rmatvec.signatures)
# [(array(int64, 1d, C),)] |
Mmmh I am not sure I understand how this fits in the bigger picture. You are suggesting to make a new class just for the interpolation part? And then feed it to the main I just worry that to avoid repeating 20 lines of code between cpu and gpu implementation, we are going to have 200 more that are very complicated to understand and more likely to break in future if any change is introduced to numba... Also, remembering the end goal (have shared interpolate function for numpy, numba, and numba-cuda implementations), are you sure we can easily integrated within the numba-cuda code? I never tried to work with classes in numba-cuda so it may be ok... |
Could you try taking the current |
Yup, I can have a look :) |
First implemenation of non-stationary filter estimation operators (point 4 in PyLops#466).
* feat: Added NonStationaryFilters1D and NonStationaryFilters2D First implemenation of non-stationary filter estimation operators (point 4 in #466). * minor: added tests for nonstatfilters operators * minor: added checks that filter locations are within dims * minor: reintroduced parallel in rmatvec
This issue collects a few improvements for the NonStationaryConvolve2D as raised in #465 (and a couple of others):
Refactor bilinear interpolation
The CUDA bilinear interp is the same as the CPU one. It then makes sense to wrap the same code under a same function. This can be done by coding a pure Python function and then defining a device function from it. For example:
Move CPU to auxiliary file
Since the GPU functions are in another file and taking into account the above refactor, it makes sense to move it to another file.
Improve thread launch parameters and iteration model
Currently, there is a limit to the size of the model, imposed by the number of blocks and threads in the call. There are two approaches to fixing this. The first is by making the total number of threads depend on the model. For example, if the model is 1000 x 1000, launch 1 block of 1024 x 1024. There is a problem which appears when very large models are used: the number of threads that are required exceed the number that the GPU can provide.
Grid stride looping can help with this as allows arbitrarily sized models to be processed by a smaller number of threads. For example, set 1 block of 1024 x 1024 and you can process any model of any size. The issue with naive grid-stride looping is that it may not be optimal in the number of threads. For example, a 32 x 32 model always launches 1024 x 1024 threads.
A solution to that is to combine both techniques: use an input-dependent number of blocks/threads (that are powers of 2), but set a maximum number of total threads launched. This can even be set per GPU, but is probably ok to hardcode for a reasonable number obtained from the
MAX_GRID_DIM_X
attribute (see here).Create operator that takes filters as input
Currently, our operator takes a generic 2d signal as input and filters as parameter. However, an alternative operator that takes the 2d signal as parameter and the filters as inputs is desirable as it could be used to estimate the best non-stationary filters that match a given data and input signal. This is kind of equivalent to
pylops.avo.prestack.PrestackWaveletModelling
for the 1D case. I suggest making a completely new operator and calling itNonStationaryWavelets2D
orNonStationaryFilters2D
.The text was updated successfully, but these errors were encountered: