You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following up a discussion started in #617, our current design of PyLops allows one to solve inverse problems in two modalities: end-to-end CPU and end-to-end GPU.
In practice, when working with arrays that don't fit in the memory of a single machine, the problem of interest can be lifted one level up using pylops-mpi. This is the case for both CPU and GPU solutions.
However, there exist scenarios where one may have a model fits into a single GPU, but the data doesn't. This case cannot be handled in the current version of PyLops. Similarly, one may want to only run part of an operator on the GPU or run a stack of operators on the GPU in a serial manner lifting part of the model/data onto the GPU as needed by each operator.
The same approach can be of course lifted one level up again if the entire data does not fit either on a CPU, so splitting it across machines with pylops-mpi, and then in each machine using the above suggested strategy when the portion of the data that sits on a single machine does not fit in a single GPU.
Solution
I suggest to create a simple operator called ToCupy whose forward simply lifts a numpy array to a cupy array and whose adjoint does the reverse. This operator can be used with a VStack operator in front of each operator (and its adjoint in the back) so that when the VStack operator is invoked the different operators will run sequentially but each of them can internally operator on cupy arrays.
To ease the understanding, I attach visual description of the two currently available use cases and the two new use cases (the first is already working, the second requires some changes in the design choices of some operators and solvers... I am not sure yet whether it is worth).
Motivation
Following up a discussion started in #617, our current design of PyLops allows one to solve inverse problems in two modalities: end-to-end CPU and end-to-end GPU.
In practice, when working with arrays that don't fit in the memory of a single machine, the problem of interest can be lifted one level up using
pylops-mpi
. This is the case for both CPU and GPU solutions.However, there exist scenarios where one may have a model fits into a single GPU, but the data doesn't. This case cannot be handled in the current version of PyLops. Similarly, one may want to only run part of an operator on the GPU or run a stack of operators on the GPU in a serial manner lifting part of the model/data onto the GPU as needed by each operator.
The same approach can be of course lifted one level up again if the entire data does not fit either on a CPU, so splitting it across machines with
pylops-mpi
, and then in each machine using the above suggested strategy when the portion of the data that sits on a single machine does not fit in a single GPU.Solution
I suggest to create a simple operator called
ToCupy
whose forward simply lifts a numpy array to a cupy array and whose adjoint does the reverse. This operator can be used with a VStack operator in front of each operator (and its adjoint in the back) so that when the VStack operator is invoked the different operators will run sequentially but each of them can internally operator on cupy arrays.To ease the understanding, I attach visual description of the two currently available use cases and the two new use cases (the first is already working, the second requires some changes in the design choices of some operators and solvers... I am not sure yet whether it is worth).
A prototype solution is in fbdbcfb with an example use case in https://github.com/PyLops/pylops_notebooks/blob/master/developement-cupy/Kirchhoff.ipynb
The text was updated successfully, but these errors were encountered: