You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It was brought up in #4 about using ndarray and a parallelization library to speed up calculations. It makes sense to use an external crate for matrix features instead of rolling our own. I'm not sure how much performance there is to gain so I'd like to see measurements; the algorithms may not be amenable to easy parallelization.
Benching/Testing
I'm fine with using the nightly cargo bench instead of bringing in criterion for now. Experimentation will need to be done wiring up the benches and figuring out what size image makes sense to iterate on (CC0 images preferred). The benches shouldn't take too long to run but hopefully run long enough to make reasonable deductions on performance changes.
Many of the calculations are operating on nested loop indices and not the matrix collection elements themselves. Based on that detail, I'm not sure if an external matrix library would add more overhead or improve performance. For multi-threading, we need to make sure we're not changing calculations that rely on being computed in an order.
The crate could use some better test coverage but the quantization functions are fairly opaque to me.
Things to do
Avenues of exploration
Add benchmarks, add image file(s), tests
Exclude the image data folder from Cargo.toml
External crates like ndarray and rayon should be behind optional feature gates at first
Investigate where parallelization would help in the quant or quant::utility functions
This comment will be updated with any changes or suggestions.
The text was updated successfully, but these errors were encountered:
I did some profiling and a lot of the time is spent in the bounds checking of the matrix. You could remove most of it by providing a get_row and get_row_mut methods and pre checking bounds of loops and then just get_unchecked
I did some profiling and a lot of the time is spent in the bounds checking of the matrix. You could remove most of it by providing a get_row and get_row_mut methods
That's a good idea. I have a branch where I've added those methods to Matrix2d and rewrote some of the loops to use chunks where possible. There is a small speedup as image resolution increases. The PR is here #11.
I experimented with get_unchecked but it wasn't decisively faster to use, so I didn't push those changes and would prefer to stay with safe for now. I'm not sure where I'd have to sprinkle in asserts or bounds checks to give more hints to the compiler. While there is a lot of time spent in bounds checking, I believe the algorithm overhead is large enough to outweigh it. However, there may be ways to rewrite or optimize parts of the algorithm. More extensive profiling is needed when I get the time.
It was brought up in #4 about using
ndarray
and a parallelization library to speed up calculations. It makes sense to use an external crate for matrix features instead of rolling our own. I'm not sure how much performance there is to gain so I'd like to see measurements; the algorithms may not be amenable to easy parallelization.Benching/Testing
I'm fine with using the nightly cargo bench instead of bringing in criterion for now. Experimentation will need to be done wiring up the benches and figuring out what size image makes sense to iterate on (CC0 images preferred). The benches shouldn't take too long to run but hopefully run long enough to make reasonable deductions on performance changes.
Many of the calculations are operating on nested loop indices and not the matrix collection elements themselves. Based on that detail, I'm not sure if an external matrix library would add more overhead or improve performance. For multi-threading, we need to make sure we're not changing calculations that rely on being computed in an order.
The crate could use some better test coverage but the quantization functions are fairly opaque to me.
Things to do
Avenues of exploration
ndarray
andrayon
should be behind optional feature gates at firstquant
orquant::utility
functionsThis comment will be updated with any changes or suggestions.
The text was updated successfully, but these errors were encountered: