-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow generators and iterators #194
Conversation
@mkborregaard You seemed to be interested in this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look at every detail; it generally looks good to me. Approval of the goal.
Might need a benchmark to detect potential regression.
ImageDistances might get even simplified with this change.
Codecov Report
@@ Coverage Diff @@
## master #194 +/- ##
==========================================
+ Coverage 96.74% 97.97% +1.22%
==========================================
Files 8 8
Lines 675 739 +64
==========================================
+ Hits 653 724 +71
+ Misses 22 15 -7
Continue to review full report at Codecov.
|
I haven't generalized |
This is ready for a thorough review. Additional use cases and tests to be considered would be also welcome. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
8624383
to
85cdb1b
Compare
Given that using |
src/generic.jl
Outdated
colwise(r::AbstractMatrix, metric::PreMetric, | ||
a::AbstractMatrix, b::AbstractMatrix) | ||
|
||
Compute distances between each corresponding columns of `a` and `b` according |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to note is that these methods are inconsistent with the ones treating a
and b
as iterators of columns: a matrix of vectors will be treated differently from a vector of the same vectors. That's probably OK in practice, but that's one of the reasons why I'd like to move to requiring explicitly writing pairwise(d, eachol(a), eachcol(b))
in the longer term. That way we won't need dims
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be very nice, because it redirects the matrix-based method to the iterator-based method, and one could get rid of the matrix-based ones. The only issue I see is that for the specialized *Euclidean
(and a few others) distances, where we do need the underlying matrix for performance reason, I don't seem to be able to unwrap it from the eachcol
generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately JuliaLang/julia#32310 should allow us to retrieve the underlying matrix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I looked at that one a bit today. Will they let it go into v1.6? I wonder how the ecosystem is going to adapt to v1.6 being the new LTS, and how fast packages will really drop 1.6- support. In many cases, there is no hard reason, only soft ones.
src/generic.jl
Outdated
Compute distances between each pair of rows (if `dims=1`) or columns (if `dims=2`) | ||
in `a` and `b` according to distance `metric`. If a single matrix `a` is provided, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to note is that these methods are inconsistent with the ones treating a
and b
as iterators of columns: a matrix of vectors will be treated differently from a vector of the same vectors. That's probably OK in practice, but that's one of the reasons why I'd like to move to requiring explicitly writing pairwise(d, eachol(a), eachcol(b))
in the longer term. That way we won't need dims
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure I understand the inconsistency, actually. Could you please sketch an application case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, pairwise(d, [a, b, c, d])
vs. pairwise(d, reshape([a, b, c, d], 2, 2))
with a
, b
, c
and d
vectors of numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I understood you correctly. Out of the two calls you mentioned, only the first one works. The second one fails because it treats a
, b
, c
and d
like numbers, but then calls like abs(a)
(or whatever necessary) fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But if we were fully consistent, the second call would be equivalent to the first one, since matrices are just one kind of iterator.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Alright, I better stop polishing now. This now fixes a whole bunch of issues and/or adds new features. I believe this is in good shape now. Comments and reviews are welcome. Shall we bump the patch number, or the minor version? It's rather difficult to propagate "next level versions" through the ecosystem, and since this doesn't break any existing functionality, maybe a patch is enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just two small things.
test/test_dists.jl
Outdated
@testset "CartesianIndex" begin | ||
A = reshape(collect(1:9), 3, 3) | ||
inds1 = findall(iseven, A) | ||
inds2 = findall(isodd, A) | ||
@test sum(pairwise(SqEuclidean(), inds1, inds2)) == 52 | ||
@test euclidean(inds1[1], inds1[1]) === 0.0 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cover a real use case? I find it surprising that somebody would want to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a feature request in #177. But we could remove it for now and clarify there what the actual intention and the use are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd leave it out until we have a clearer view of why that would be useful. The fact that CartesianIndex
isn't iterable seems to indicate that it's not intended to be used that way, and it would be absurd to have packages work around that everywhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good if you leave CartesianIndex
out for now. Thanks!
This reverts commit 5be0415.
I'd merge and release today. Any opinions on how to bump the version number? |
I merged the PR as is. We may include some others and then decide about the version bump. |
@dkarrasch thanks you are a legend. With 1 PR closed like 10 issues! |
Thanks, @Datseris. My next projects are world peace and healing cancer... in one PR! 🤣 |
Thanks! Regarding the release number, I think we have two options:
|
This PR gives so many possibilities and thus a new start, so I'm voting for 1.0.0 😆 |
How about tagging 0.10.1, then have a v0.11 that deprecates the dims keyword and redirects to |
Yes, if we anticipate breaking changes soon, better not tag 1.0. Maybe file an issue so that we can discuss these changes? |
This adds support for iterable objects as arguments to
evaluate
,(EDIT: andpairwise
andcolwise
).I haven't touched pair- and colwise stuff, because that is (partially/completely?) addressed by #188.Closes #187. Closes #162. Closes #165. Closes #152. Closes #190. Closes #192. Closes #188.