Skip to content

Commit

Permalink
doc: added graphics of cpu-gpu scenarios
Browse files Browse the repository at this point in the history
  • Loading branch information
mrava87 committed Nov 24, 2024
1 parent 331b14c commit e99a54a
Show file tree
Hide file tree
Showing 4 changed files with 169 additions and 147 deletions.
Binary file added docs/source/_static/cupy_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/numpy_cupy_bd_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/numpy_cupy_vs_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
316 changes: 169 additions & 147 deletions docs/source/gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,172 @@ provide data vectors to the solvers, e.g., when using
For JAX, apart from following the same procedure described for CuPy, the PyLops operator must
be also wrapped into a :class:`pylops.JaxOperator`.

See below for a comphrensive list of supported operators and additional functionalities for both the
``cupy`` and ``jax`` backends.


Examples
--------

Let's now briefly look at some use cases.

End-to-end GPU powered inverse problems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First we consider the most common scenario when both the model and data
vectors fit onto the GPU memory. We can therefore simply replace all our
``numpy`` arrays with ``cupy`` arrays and solve the inverse problem of
interest end-to-end on the GPU.

.. image:: _static/cupy_diagram.png
:width: 600
:align: center

Let's first write a code snippet using ``numpy`` arrays, which PyLops
will run on your CPU:

.. code-block:: python
ny, nx = 400, 400
G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = np.ones(nx, dtype=np.float32)
# Create operator
Gop = MatrixMult(G, dtype='float32')
# Create data and invert
y = Gop @ x
xest = Gop / y
Now we write a code snippet using ``cupy`` arrays, which PyLops will run on
your GPU:

.. code-block:: python
ny, nx = 400, 400
G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = cp.ones(nx, dtype=np.float32)
# Create operator
Gop = MatrixMult(G, dtype='float32')
# Create data and invert
y = Gop @ x
xest = Gop / y
The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
PyLops will figure this out.

Similarly, we write a code snippet using ``jax`` arrays which PyLops will run on
your GPU/TPU:

.. code-block:: python
ny, nx = 400, 400
G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
x = jnp.ones(nx, dtype=np.float32)
# Create operator
Gop = JaxOperator(MatrixMult(G, dtype='float32'))
# Create data and invert
y = Gop @ x
xest = Gop / y
# Adjoint via AD
xadj = Gop.rmatvecad(x, y)
Again, the code is almost unchanged apart from the fact that we now use ``jax`` arrays.


Mixed CPU-GPU powered inverse problems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let us now consider a more intricate scenario where we have access to
a GPU-powered operator, however the model and/or data vectors are too large
to fit onto the GPU memory (or VRAM).

For the sake of clarity, we consider a problem where
the operator can be written as a :class:`pylops.BlockDiag` of
PyLops operators. Note how, by simply sandwiching any of the GPU-powered
operator within two :class:`pylops.ToCupy` operators, we are
able to tell PyLops to transfer to the GPU only the part of the model vector
required by a given operator and transfer back the output to the CPU before
forming the combine output vector (i.e., the output vector of the
:class:`pylops.BlockDiag`).

.. image:: _static/numpy_cupy_bd_diagram.png
:width: 1000
:align: center

.. code-block:: python
nops, n = 5, 4
Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
for i in range(nops)]
Ms = [M.T @ M for M in Ms]
# Create operator
Mops = []
for iop in range(nops):
Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
Top = ToCupy(Mop.dims, dtype=dtype)
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
Mop = Top1.H @ Mop @ Top
Mops.append(Mop)
Mops = BlockDiag(Mops, forceflat=True)
# Create data and invert
x = np.ones(n * nops, dtype=dtype)
y = Mops @ x.ravel()
xest = Mops / y
Finally, let us consider a problem where
the operator can be written as a :class:`pylops.VStack` of
PyLops operators and the model vector can be fully transferred to the GPU.
We can use again the :class:`pylops.ToCupy` operator, however this
time we will only use it to move the output of each operator to the CPU.
Since we are now in a special scenario, where the input of the overall
operator sits on the GPU and the output on the
CPU, we need to inform the :class:`pylops.VStack` operator about this.
This can be easily done using the additional ``inoutengine`` parameter. Let's
see this with an example.

.. image:: _static/numpy_cupy_vs_diagram.png
:width: 1000
:align: center

.. code-block:: python
nops, n, m = 3, 4, 5
Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]
# Create operator
Mops = []
for iop in range(nops):
Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
Mop = Top1.H @ Mop
Mops.append(Mop)
Mops = VStack(Mops, inoutengine=("numpy", "cupy"))
# Create data and invert
x = cp.ones(m, dtype=dtype)
y = Mops @ x.ravel()
xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]
These features are currently not available for ``jax`` arrays.


.. note::

More examples for the CuPy and JAX backends be found at `link1 <https://github.com/PyLops/pylops_notebooks/tree/master/developement-cupy>`_
and `link2 <https://github.com/PyLops/pylops_notebooks/tree/master/developement/Basic_JAX.ipynb>`_.


Supported Operators
-------------------

In the following, we provide a list of methods in :class:`pylops.LinearOperator` with their current status (available on CPU,
GPU with CuPy, and GPU with JAX):
Expand Down Expand Up @@ -195,6 +361,7 @@ Smoothing and derivatives:
- |:white_check_mark:|
- |:white_check_mark:|


Signal processing:

.. list-table::
Expand Down Expand Up @@ -322,6 +489,7 @@ Signal processing:
- |:white_check_mark:|
- |:white_check_mark:|


Wave-Equation processing

.. list-table::
Expand Down Expand Up @@ -369,6 +537,7 @@ Wave-Equation processing
- |:red_circle:|
- |:red_circle:|


Geophysical subsurface characterization:

.. list-table::
Expand Down Expand Up @@ -408,150 +577,3 @@ Geophysical subsurface characterization:
in point 1 for the :class:`pylops.signalprocessing.Convolve1D` operator employed
when ``explicit=False``.


Examples
--------

Finally, let's briefly look at some example.

End-to-end GPU powered inverse problems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First we consider the most common scenario when both the model and data
vectors fit onto the GPU memory. We can therefore simply replace all our
``numpy`` arrays with ``cupy`` arrays and solve the inverse problem of
interest end-to-end on the GPU.

Let's first write a code snippet using ``numpy`` arrays, which PyLops
will run on your CPU:

.. code-block:: python
ny, nx = 400, 400
G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = np.ones(nx, dtype=np.float32)
# Create operator
Gop = MatrixMult(G, dtype='float32')
# Create data and invert
y = Gop @ x
xest = Gop / y
Now we write a code snippet using ``cupy`` arrays, which PyLops will run on
your GPU:

.. code-block:: python
ny, nx = 400, 400
G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = cp.ones(nx, dtype=np.float32)
# Create operator
Gop = MatrixMult(G, dtype='float32')
# Create data and invert
y = Gop @ x
xest = Gop / y
The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
PyLops will figure this out.

Similarly, we write a code snippet using ``jax`` arrays which PyLops will run on
your GPU/TPU:

.. code-block:: python
ny, nx = 400, 400
G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
x = jnp.ones(nx, dtype=np.float32)
# Create operator
Gop = JaxOperator(MatrixMult(G, dtype='float32'))
# Create data and invert
y = Gop @ x
xest = Gop / y
# Adjoint via AD
xadj = Gop.rmatvecad(x, y)
Again, the code is almost unchanged apart from the fact that we now use ``jax`` arrays.


Mixed CPU-GPU powered inverse problems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let us now consider a more intricate scenario where we have acess to
a GPU-powered operator, however the model and/or data vectors are too large
to fit onto the memory.

For the sake of clarity, we consider a problem where
the operator can be written as a :class:`pylops.BlockDiag` of
PyLops operators. Note how, by simply sandwitching any of the GPU-powered
operator within two :class:`pylops.ToCupy` operators, we are
able to tell PyLops to transfer to the GPU only the part of the model vector
required by a given operator and transfer back the output to the CPU before
forming the combine output vector (i.e., the output vector of the
:class:`pylops.BlockDiag`)

.. code-block:: python
nops, n = 5, 4
Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
for i in range(nops)]
Ms = [M.T @ M for M in Ms]
# Create operator
Mops = []
for iop in range(nops):
Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
Top = ToCupy(Mop.dims, dtype=dtype)
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
Mop = Top1.H @ Mop @ Top
Mops.append(Mop)
Mops = BlockDiag(Mops, forceflat=True)
# Create data and invert
x = np.ones(n * nops, dtype=dtype)
y = Mops @ x.ravel()
xest = Mops / y
Finally, let us consider a problem where
the operator can be written as a :class:`pylops.VStack` of
PyLops operators and the model vector can be fully transferred to the GPU.
We can use again the :class:`pylops.ToCupy` operator, however this
time we will only use it to move the output of each operator to the CPU.
Since we are now in a special scenario, where the input of the overall
operator sits on the GPU and the output on the
CPU, we need to inform the :class:`pylops.VStack` operator about this.
This can be easily done using the additional ``inoutengine`` parameter. Let's
see this with an example:

.. code-block:: python
nops, n, m = 3, 4, 5
Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]
# Create operator
Mops = []
for iop in range(nops):
Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
Mop = Top1.H @ Mop
Mops.append(Mop)
Mops = VStack(Mops, inoutengine=("numpy", "cupy"))
# Create data and invert
x = cp.ones(m, dtype=dtype)
y = Mops @ x.ravel()
xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]
**Note:**: this feature is currently not available for ``jax`` arrays.


.. note::

More examples for the CuPy and JAX backends be found `here <https://github.com/PyLops/pylops_notebooks/tree/master/developement-cupy>`_
and `here <https://github.com/PyLops/pylops_notebooks/tree/master/developement/Basic_JAX.ipynb>`_.

0 comments on commit e99a54a

Please sign in to comment.