-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial cut at benchmarking #198
base: master
Are you sure you want to change the base?
Initial cut at benchmarking #198
Conversation
Add some benchmark tests to compare PintArray performance against NumPy arrays of quantities. Over time we should be able to show specific patterns where PintArrays confer substantial performance advantages over naive use of Quantities with Pandas. Signed-off-by: Michael Tiemann <[email protected]>
Duplicate code from pint/pint/testsuite/conftest.py. Ruff and Black conspired to delete simple imports of relevant fixture definitions. Signed-off-by: Michael Tiemann <[email protected]>
Progress report: There was redundancy/confusion as the benchmarks attempted to mix and match not only the sensible left-hand/right-hand parameters of a particular data type/data length, but also across species of tests. I've separated that out to:
Compare:
The numbers below show how extraordinarily better performance is with Pint-ish arrays than simply fobbing off quantities into ndarrays and Series without any help. Here's the subset that deals with "meters OP kilometers":
Interesting that addition takes 2x-3x the work of multiplication. Also interesting: PintArrays have half the performance of Quantity arrays. Pint Pandas has half the performance of PintArrays. And Numpy and Pandas without PintArrays have 200x the overhead compared with Pandas + PintArrays (and approx 1000x the overhead compared with Quantity arrays). n.b.: I still don't know how to fix the CI/CD problem of getting pytest-benchmark to install properly. I have installed it manually on my system, but don't see the magic to make pyproject.toml or conftest.py do the right thing. |
Properly tease apart species of tests so we can compare Pint vs Numpy vs Pandas each in their own lanes. I still don't know how to get pytest-benchmark to properly install within the CI/CD system. Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
Signed-off-by: Michael Tiemann <[email protected]>
I don't think there's much point comparing to Quantity arrays, that will be more about the overheads involved in the elementwise operations on each pint Quantity/on the pandas side dispatching to them. It's faster, as you'd expect, and there's no reason to use arrays of quantities unless you have to. I do think it would be good to compare a PintArray to a standard pandas array, as it may show operations that could be faster with a better implementation. |
I missed that you'd compared to Q(np.array([1,2]), "m"). That is an interesting comparison! |
Add some benchmark tests to compare PintArray performance against NumPy arrays of quantities. Over time we should be able to show specific patterns where PintArrays confer substantial performance advantages over naive use of Quantities with Pandas.
The
Check dependency specification
is failing because I don't know how to get the unit test bits of pyproject.toml to respect the dependencies of thetest
declaration. Thepytest
install is not installing any of the components I've asked for.pre-commit run --all-files
with no errors