-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's add Numba to the core! #2230
Comments
I'm in favor of adding. I've asked this question a few times myself so happy to hear the concerns. At this point numba is used in so many submodules that it will simplify so much to just add it to core. |
I feel personally attacked 😅 I am also in favor of adding |
@DradeAW |
The design from the beginning is that the core expose objects so that third party software can depend on it but without dependencies headache. So all computation have kept outside the core (unless very simple ones.) The computation tool trends are changing very very often and should not affect the core. The recent add of aurelien of numba in the core must be an exception and must stay an exception. So sticking to something in the core is not a long term choice. |
Numba was founded in 2012, I don't think it is like at all like the other packages that you mentioned at all. Those are newer and in a very different environment (cupy and torch are not even doing the same think). I don't think that there are strong arguments to claim that is a fad, it is just not that popular as any of those other packages because it is not associated with something high status / fancy (deep learning). In fact, in my readings around it is kind of ... impopular. And as you mentioned, it is backed by Continuum Analytics so I don't expect it to go. What would convince you here? I am wondering what it would convince myself that is indeed a fad? Maybe if there were competing packages or if it was a hot field but everything seems quite stable there. The comparison with cython is more apt but is just an easy one to defend. Numba is way way way easier to write and read for most people which is why we are doing code in python anyway. See Jack Van Der Plas (a cython expert) who was already beating cython with numba in 2013: https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/ So, I do think that if we want to use a non-vectorized optimzer Numba it is the best choice in terms of readability and speed. It is backed by a big player and can be easily installed with pip now (now that people are vendorizing everything). |
In favor of Sam there were issues with needing the numba and numpy versions to sync up (I think?). Wasn't there a spell where we had to cap numpy and numba to make sure everything would place nice together. I feel like that was an issue when I was first starting to help out with stuff (so I'm not super clear on it). |
As far as I can tell the key negative of adding Numba to core would be that if Numba install fails it will break the spikeinterface install which could be frustrating for users who have no intention of using the numba-dependent code. My understanding is typically these compiling optimisers are more difficult to deploy cross-platform and their release typically lags other packages in terms of python version (e.g. here) or new OS releases (e.g. here). However, this is just an impression I have and I'm not sure if Numba is any less robust than other packages that include compiled codes like SciPy, I'm sure I could find many posts with scipy install problems (on that subject this is quite an interesting read). Anyway if Numba releases are as robust as say SciPy in terms of its release cycle and breadth of supported platforms I cannot see an issue with moving Numba to core and saving the import headache. However I am not sure how best to interrogate this. |
While trying #2304 locally I accidentally installed python 3.12 and the install originally failed because numba is not ready for 3.12 yet. So I'm not sure how quickly SI wants to move to new python version support, but numba would definitely slow down switching over to new versions of python itself. |
@h-mayorquin what do you rekon on this? based on @zm711 last comment my thoughts would be to err on not adding it for now and possibly reconsider in future if the |
I am unconvinced that numba is worse in terms of relases that other libraries but that may be the case. But I guess that after yours and @zm711 comments the burden of proof is on me. I wanted to have some debate about what is our policy to include something on the core. I think we have two good arguments:
I think that Sam point that we should not jump into trendy things is a good one but I don't really think that numba is in the same reference category of the other libraries that he mentioned and I would not know what is a good comparison to tell if is trendy. Anyway, this should be a discussion and not an issue. I will close it as right now I am more interested in wrapping numba properly to reduce import times. |
Coming to this from this discussion:
#2175
The goal of this issue is to discuss the pros and cons of adding numba to the core.
Numba is good and sweet and has a central place in the scientific software stack for algorithms that are not easy to vectorize. We already use numba with great success to make some of the functionality library really fast. See here and here for some examples of large performance improvements thanks to numba.
However, as it is not part of the core it requires some care with the imports, see here and here for two strategies. This for two reasons, first reducing import time and more importantly not crashing the library at import.
Adding numba the core will reduce some of the pains with the second but we might still have to be careful with the former. We have come a long way from #1597 but we might still want to go further.
So, numba is good, but how much it costs? Let's discuss this. I gathered some statistics that people can run in their machines if they want. Here we have some data for popular packages in the scientific stack with sizes and stand-alone import times:
So numba has the import speed of numpy and also is around the same size when you include
llvmlite
. If there are concerns about import speed we can keep using any of the techniques described above to avoid the import time costs.So what are the downsides? I remember that @samuelgarcia had some concerns.
To add something to the negative side, I think that people can be too quick to use numba when something simpler could be done with numpy instead. The current state of affairs avoid this somehow but is not a great barrier anyway. And that said, sometimes people do take numpy too far as well.
The text was updated successfully, but these errors were encountered: