Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing NumPy functions needed for SLAC applications #1116

Open
15 of 22 tasks
syamajala opened this issue Jan 25, 2024 · 14 comments
Open
15 of 22 tasks

Missing NumPy functions needed for SLAC applications #1116

syamajala opened this issue Jan 25, 2024 · 14 comments
Assignees
Labels

Comments

@syamajala
Copy link

syamajala commented Jan 25, 2024

I'm opening this issue so @manopapad and I can keep track of what needs to be implemented for the different cunumeric SLAC applications.

For psana we need:

  • numpy.ma.median
  • np.unique(return_index=True)

@manopapad has a patch that tries to improve single index accesses to arrays although that code will be removed when np.uinque(return_index=True) is implemented. All the kernels for psana are just single GPU and do not need to be distributed.

For HDF5 analysis we need gpu and distributed versions of:

For HDF5 analysis we need the following extensions:

  • nansum not reducing over multiple dimensions
  • nanpercentile falling back to numpy -- fixed; coming in next weekly drop

For a custom curve_fit implementation we need:

  • np.inv
  • np.correlate
@rohany
Copy link
Member

rohany commented Jan 25, 2024

I have a pending PR for np.diff against cuNumeric that can be dusted off and merged.

@syamajala
Copy link
Author

We need scipy.optimize.curve_fit. Under the hood this seems to use minpack. Depending on what options you pass to curve_fit I think it might also need cholesky.

@JosephGuman
Copy link

At present it seems like DeferredArray's unary_reduction() implementation doesn't allow reducing over multiple dimensions, which would probably be needed to average over an arbitrary subset of axes. Is this important to address in this issue, or is it not required for SLAC's application?

@syamajala
Copy link
Author

We do not need to do average over arbitrary subset of axes. Just the 0th axis is enough.

@syamajala
Copy link
Author

We have a need for scipy.curve_fit.

@manopapad
Copy link
Contributor

@syamajala all the functions required for the base HDF5 processing script have been merted

@syamajala
Copy link
Author

Ok. Will give them a try early next week.

@JosephGuman
Copy link

I might took a look at np.unique(return_index=True) if nobody else is working on it right now.

@syamajala
Copy link
Author

We still need to investigate performance issues related to the functions that were implemented in this ticket.

Here is a profile from before the missing functions were implemented:
https://legion.stanford.edu/prof-viewer/?url=https://sapling.stanford.edu/~seshu/xpp/legion_prof/

And a profile from after:
https://legion.stanford.edu/prof-viewer/?url=https://sapling.stanford.edu/~seshu/xpp/legion_prof.1/

@rohany
Copy link
Member

rohany commented May 8, 2024

There might be still some missing functions that correspond to the pieces of high python utilization, but i don't really see a performance issue in this profile other than the problem size is too small (especially for the public Python core).

@manopapad manopapad added the SLAC label Jun 13, 2024
@syamajala
Copy link
Author

syamajala commented Jul 15, 2024

The following functions are missing:

used by custom curve_fit implementation:

  • np.inv
  • np.correlate

use by SLAC code directly:

  • np.fft.fftshift
  • np.round
  • np.unravel_index
  • np.meshgrid

@qldnfox
Copy link

qldnfox commented Jul 16, 2024

Also missing the following:

  • np.fft.ifft

  • np.rot90

@syamajala
Copy link
Author

nansum does not support reducing over multiple dimensions:

   File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_module/math_sum_prod_diff.py", line 951, in nansum
    return a._nansum(
           ^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/array.py", line 3580, in _nansum
    return a._nansum(
           ^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/array.py", line 3580, in _nansum
    return perform_unary_reduction(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/thunk.py", line 233, in perform_unary_reduction
    return perform_unary_reduction(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_array/thunk.py", line
 233, in perform_unary_reduction
    result._thunk.unary_reduction(
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 148, in wrapper
    result._thunk.unary_reduction(
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 148, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 3192, in unary_reduction
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sdf/group/lcls/ds/tools/conda_envs/cunumeric-mec/lib/python3.12/site-packages/cunumeric/_thunk/deferred.py", l
ine 3192, in unary_reduction
    raise NotImplementedError(
NotImplementedError: Need support for reducing multiple dimensions
    raise NotImplementedError(
NotImplementedError: Need support for reducing multiple dimensions

Also for some reason nanpercentile is still falling back to numpy in cunumeric 24.06.00. It looks like it was merged above though?

@syamajala
Copy link
Author

Adding these to the list:

  • histogram2d
  • median

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants