Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's add Numba to the core! #2230

Closed
h-mayorquin opened this issue Nov 20, 2023 · 10 comments
Closed

Let's add Numba to the core! #2230

h-mayorquin opened this issue Nov 20, 2023 · 10 comments
Labels
core Changes to core module discussion General discussions and community feedback packaging Related to packaging/style

Comments

@h-mayorquin
Copy link
Collaborator

Coming to this from this discussion:

#2175

The goal of this issue is to discuss the pros and cons of adding numba to the core.

Numba is good and sweet and has a central place in the scientific software stack for algorithms that are not easy to vectorize. We already use numba with great success to make some of the functionality library really fast. See here and here for some examples of large performance improvements thanks to numba.

However, as it is not part of the core it requires some care with the imports, see here and here for two strategies. This for two reasons, first reducing import time and more importantly not crashing the library at import.

Adding numba the core will reduce some of the pains with the second but we might still have to be careful with the former. We have come a long way from #1597 but we might still want to go further.

So, numba is good, but how much it costs? Let's discuss this. I gathered some statistics that people can run in their machines if they want. Here we have some data for popular packages in the scientific stack with sizes and stand-alone import times:

Package: numpy
  Import Time: 174.2818 ms
  Package Size: 28.666 MB
  Package Info:
    Version: 1.23.5
    Dependencies:
Package: scipy
  Import Time: 15.7995 ms
  Package Size: 84.891 MB
  Package Info:
    Version: 1.10.1
    Dependencies:
      numpy
Package: numba
  Import Time: 190.3138 ms
  Package Size: 17.655 MB
  Package Info:
    Version: 0.56.4
    Dependencies:
      llvmlite
       numpy
       setuptools
Package: llvmlite
  Import Time: 0.0782 ms
  Package Size: 8.705 MB
  Package Info:
    Version: 0.39.1
    Dependencies:
Package: sparse
  Import Time: 72.1004 ms
  Package Size: 0.576 MB
  Package Info:
    Version: 0.14.0
    Dependencies:
      numba
       numpy
       scipy
Package: pandas
  Import Time: 246.9504 ms
  Package Size: 53.281 MB
  Package Info:
    Version: 1.5.3
    Dependencies:
      numpy
       python-dateutil
       pytz

So numba has the import speed of numpy and also is around the same size when you include llvmlite. If there are concerns about import speed we can keep using any of the techniques described above to avoid the import time costs.

So what are the downsides? I remember that @samuelgarcia had some concerns.

To add something to the negative side, I think that people can be too quick to use numba when something simpler could be done with numpy instead. The current state of affairs avoid this somehow but is not a great barrier anyway. And that said, sometimes people do take numpy too far as well.

@h-mayorquin h-mayorquin added core Changes to core module packaging Related to packaging/style discussion General discussions and community feedback labels Nov 20, 2023
@zm711
Copy link
Collaborator

zm711 commented Nov 20, 2023

I'm in favor of adding. I've asked this question a few times myself so happy to hear the concerns. At this point numba is used in so many submodules that it will simplify so much to just add it to core.

@DradeAW
Copy link
Contributor

DradeAW commented Nov 20, 2023

sometimes people do take numpy too far as well.

I feel personally attacked 😅

I am also in favor of adding numba to core.

@h-mayorquin
Copy link
Collaborator Author

@DradeAW
Hahah it was not directed, I have been there as well : P

@samuelgarcia
Copy link
Member

The design from the beginning is that the core expose objects so that third party software can depend on it but without dependencies headache.

So all computation have kept outside the core (unless very simple ones.)
I think and still think this is a good choice.

The computation tool trends are changing very very often and should not affect the core.
For gpu, cupy was very trendy a few years ago, then torch then jax.
numba is the same for cpu, this will change over time I think to keep this out of the core is still a very good choice.

The recent add of aurelien of numba in the core must be an exception and must stay an exception.
And more importantly this should be a weak dependency.
Personnnaly, I choose numba because it is easy to install in the anaconda ecosystem (same team). Maybe it is not the best choice. (pytrhan or pyccel or cython could be other good choices).

So sticking to something in the core is not a long term choice.

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Nov 20, 2023

Numba was founded in 2012, I don't think it is like at all like the other packages that you mentioned at all. Those are newer and in a very different environment (cupy and torch are not even doing the same think). I don't think that there are strong arguments to claim that is a fad, it is just not that popular as any of those other packages because it is not associated with something high status / fancy (deep learning). In fact, in my readings around it is kind of ... impopular. And as you mentioned, it is backed by Continuum Analytics so I don't expect it to go. What would convince you here? I am wondering what it would convince myself that is indeed a fad? Maybe if there were competing packages or if it was a hot field but everything seems quite stable there.

The comparison with cython is more apt but is just an easy one to defend. Numba is way way way easier to write and read for most people which is why we are doing code in python anyway. See Jack Van Der Plas (a cython expert) who was already beating cython with numba in 2013:

https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

So, I do think that if we want to use a non-vectorized optimzer Numba it is the best choice in terms of readability and speed. It is backed by a big player and can be easily installed with pip now (now that people are vendorizing everything).

@zm711
Copy link
Collaborator

zm711 commented Nov 20, 2023

In favor of Sam there were issues with needing the numba and numpy versions to sync up (I think?). Wasn't there a spell where we had to cap numpy and numba to make sure everything would place nice together. I feel like that was an issue when I was first starting to help out with stuff (so I'm not super clear on it).

@JoeZiminski
Copy link
Collaborator

As far as I can tell the key negative of adding Numba to core would be that if Numba install fails it will break the spikeinterface install which could be frustrating for users who have no intention of using the numba-dependent code.

My understanding is typically these compiling optimisers are more difficult to deploy cross-platform and their release typically lags other packages in terms of python version (e.g. here) or new OS releases (e.g. here). However, this is just an impression I have and I'm not sure if Numba is any less robust than other packages that include compiled codes like SciPy, I'm sure I could find many posts with scipy install problems (on that subject this is quite an interesting read).

Anyway if Numba releases are as robust as say SciPy in terms of its release cycle and breadth of supported platforms I cannot see an issue with moving Numba to core and saving the import headache. However I am not sure how best to interrogate this.

@zm711
Copy link
Collaborator

zm711 commented Dec 6, 2023

While trying #2304 locally I accidentally installed python 3.12 and the install originally failed because numba is not ready for 3.12 yet. So I'm not sure how quickly SI wants to move to new python version support, but numba would definitely slow down switching over to new versions of python itself.

@JoeZiminski
Copy link
Collaborator

@h-mayorquin what do you rekon on this? based on @zm711 last comment my thoughts would be to err on not adding it for now and possibly reconsider in future if the numba installation becomes more streamlined.

@h-mayorquin
Copy link
Collaborator Author

I am unconvinced that numba is worse in terms of relases that other libraries but that may be the case. But I guess that after yours and @zm711 comments the burden of proof is on me.

I wanted to have some debate about what is our policy to include something on the core. I think we have two good arguments:

  • We should not add libraries that are unreliable on their installation on the core.
  • We should not jump too quickly into trends.

I think that Sam point that we should not jump into trendy things is a good one but I don't really think that numba is in the same reference category of the other libraries that he mentioned and I would not know what is a good comparison to tell if is trendy.

Anyway, this should be a discussion and not an issue. I will close it as right now I am more interested in wrapping numba properly to reduce import times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module discussion General discussions and community feedback packaging Related to packaging/style
Projects
None yet
Development

No branches or pull requests

5 participants