Allow generators and iterators #194

dkarrasch · 2020-12-05T14:36:41Z

This adds support for iterable objects as arguments to evaluate,(EDIT: and pairwise and colwise). ~~I haven't touched pair- and colwise stuff, because that is (partially/completely?) addressed by #188.~~

Closes #187. Closes #162. Closes #165. Closes #152. Closes #190. Closes #192. Closes #188.

dkarrasch · 2020-12-05T14:46:41Z

@mkborregaard You seemed to be interested in this.

src/generic.jl

test/test_dists.jl

johnnychen94

Didn't look at every detail; it generally looks good to me. Approval of the goal.

Might need a benchmark to detect potential regression.

ImageDistances might get even simplified with this change.

codecov-io · 2020-12-05T17:58:19Z

Codecov Report

Merging #194 (1350d47) into master (be3a901) will increase coverage by 1.22%.
The diff coverage is 99.38%.

@@            Coverage Diff             @@
##           master     #194      +/-   ##
==========================================
+ Coverage   96.74%   97.97%   +1.22%     
==========================================
  Files           8        8              
  Lines         675      739      +64     
==========================================
+ Hits          653      724      +71     
+ Misses         22       15       -7

Impacted Files	Coverage Δ
src/metrics.jl	`97.73% <98.24%> (+0.99%)`	⬆️
src/Distances.jl	`100.00% <100.00%> (ø)`
src/bhattacharyya.jl	`100.00% <100.00%> (+13.04%)`	⬆️
src/bregman.jl	`100.00% <100.00%> (ø)`
src/common.jl	`94.52% <100.00%> (+0.23%)`	⬆️
src/generic.jl	`98.62% <100.00%> (+0.70%)`	⬆️
src/haversine.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be3a901...1350d47. Read the comment docs.

dkarrasch · 2020-12-06T17:34:24Z

I haven't generalized *Mahalanobis, because that looks so linear-algebra-y. Otherwise, I think, all metrics now work with abstract iterables, and are tested for Iterators and Generators. I'll try to do some benchmarks. In a few cases, I have left both a specific AbstractVector version and an unconstrained version. I'll try to benchmark whether there is any benefit from the constrained versions. Any comments or concerns are welcome, of course.

dkarrasch · 2020-12-07T10:33:31Z

This is ready for a thorough review. Additional use cases and tests to be considered would be also welcome.

src/bhattacharyya.jl

src/generic.jl

src/haversine.jl

src/metrics.jl

test/test_dists.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

src/generic.jl

src/bhattacharyya.jl

src/metrics.jl

This reverts commit 0227942.

src/generic.jl

src/metrics.jl

nalimilan · 2020-12-14T21:39:39Z

I got a little obsessive about the "lazy evaluation" approach. If you look closer, then colwise is nothing but map(metric, zip(eachcol(A), eachcol(B)) (and similarly for iterators), and pairwise(..., dims=2) is nothing but map(Iterators.product(eachcol(A), eachcol(B)) (or eachrow for dims=1 and similarly for iterators). Replacing map by Iterators.map makes the whole construct lazy, i.e., a generator. Does anybody see any value in exposing this lazy construct, which could be, by default, eagerly evaluated via collect? One naive use case I can imagine is computing k-nearest neighbors within the data set via brute force. That shouldn't require having the total distance matrix in memory?

Given that using Iterators.map manually is relatively easy, is it really worth providing something else? I guess we could add an argument to pairwise (say, lazy), but that can be done at any time if somebody requests it.

src/generic.jl

nalimilan · 2020-12-14T21:34:45Z

src/generic.jl

+    colwise(r::AbstractMatrix, metric::PreMetric,
+              a::AbstractMatrix, b::AbstractMatrix)
+
+Compute distances between each corresponding columns of `a` and `b` according


Something to note is that these methods are inconsistent with the ones treating a and b as iterators of columns: a matrix of vectors will be treated differently from a vector of the same vectors. That's probably OK in practice, but that's one of the reasons why I'd like to move to requiring explicitly writing pairwise(d, eachol(a), eachcol(b)) in the longer term. That way we won't need dims anymore.

That would be very nice, because it redirects the matrix-based method to the iterator-based method, and one could get rid of the matrix-based ones. The only issue I see is that for the specialized *Euclidean (and a few others) distances, where we do need the underlying matrix for performance reason, I don't seem to be able to unwrap it from the eachcol generator.

Fortunately JuliaLang/julia#32310 should allow us to retrieve the underlying matrix!

Yes, I looked at that one a bit today. Will they let it go into v1.6? I wonder how the ecosystem is going to adapt to v1.6 being the new LTS, and how fast packages will really drop 1.6- support. In many cases, there is no hard reason, only soft ones.

nalimilan · 2020-12-14T21:35:32Z

src/generic.jl

 Compute distances between each pair of rows (if `dims=1`) or columns (if `dims=2`)
 in `a` and `b` according to distance `metric`. If a single matrix `a` is provided,


Something to note is that these methods are inconsistent with the ones treating a and b as iterators of columns: a matrix of vectors will be treated differently from a vector of the same vectors. That's probably OK in practice, but that's one of the reasons why I'd like to move to requiring explicitly writing pairwise(d, eachol(a), eachcol(b)) in the longer term. That way we won't need dims anymore.

I'm not exactly sure I understand the inconsistency, actually. Could you please sketch an application case?

For example, pairwise(d, [a, b, c, d]) vs. pairwise(d, reshape([a, b, c, d], 2, 2)) with a, b, c and d vectors of numbers.

Ok, then I understood you correctly. Out of the two calls you mentioned, only the first one works. The second one fails because it treats a, b, c and d like numbers, but then calls like abs(a) (or whatever necessary) fail.

Yes. But if we were fully consistent, the second call would be equivalent to the first one, since matrices are just one kind of iterator.

src/generic.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

dkarrasch · 2020-12-15T14:53:49Z

Alright, I better stop polishing now. This now fixes a whole bunch of issues and/or adds new features. I believe this is in good shape now. Comments and reviews are welcome. Shall we bump the patch number, or the minor version? It's rather difficult to propagate "next level versions" through the ecosystem, and since this doesn't break any existing functionality, maybe a patch is enough?

nalimilan

Looks good. Just two small things.

nalimilan · 2020-12-15T21:56:18Z

test/test_dists.jl

+@testset "CartesianIndex" begin
+    A = reshape(collect(1:9), 3, 3)
+    inds1 = findall(iseven, A)
+    inds2 = findall(isodd, A)
+    @test sum(pairwise(SqEuclidean(), inds1, inds2)) == 52
+    @test euclidean(inds1[1], inds1[1]) === 0.0
+end


Does this cover a real use case? I find it surprising that somebody would want to do that.

There was a feature request in #177. But we could remove it for now and clarify there what the actual intention and the use are.

Yes, I'd leave it out until we have a clearer view of why that would be useful. The fact that CartesianIndex isn't iterable seems to indicate that it's not intended to be used that way, and it would be absurd to have packages work around that everywhere...

src/generic.jl

nalimilan

Looks good if you leave CartesianIndex out for now. Thanks!

This reverts commit 5be0415.

dkarrasch · 2020-12-17T08:06:03Z

I'd merge and release today. Any opinions on how to bump the version number?

dkarrasch · 2020-12-18T11:36:44Z

I merged the PR as is. We may include some others and then decide about the version bump.

Datseris · 2020-12-18T12:05:05Z

@dkarrasch thanks you are a legend. With 1 PR closed like 10 issues!

dkarrasch · 2020-12-18T12:09:36Z

Thanks, @Datseris. My next projects are world peace and healing cancer... in one PR! 🤣

nalimilan · 2020-12-18T12:44:50Z

Thanks! Regarding the release number, I think we have two options:

tag 0.10.1 to avoid forcing dependencies to update their version requirements (since this isn't breaking)
tag 1.0.0 so that we can more easily bump either the minor or patch version as appropriate in the future

johnnychen94 · 2020-12-18T12:49:41Z

This PR gives so many possibilities and thus a new start, so I'm voting for 1.0.0 😆

dkarrasch · 2020-12-18T12:54:48Z

How about tagging 0.10.1, then have a v0.11 that deprecates the dims keyword and redirects to eachcol/eachrow, and then have a v0.12/v1 for a version that only contains iterator-based methods, except for array-based specializations for improved performance?

nalimilan · 2020-12-18T12:58:37Z

Yes, if we anticipate breaking changes soon, better not tag 1.0. Maybe file an issue so that we can discuss these changes?

dkarrasch commented Dec 5, 2020

View reviewed changes

src/generic.jl Outdated Show resolved Hide resolved

johnnychen94 reviewed Dec 5, 2020

View reviewed changes

test/test_dists.jl Show resolved Hide resolved

johnnychen94 approved these changes Dec 5, 2020

View reviewed changes

nalimilan mentioned this pull request Dec 8, 2020

Add pairwise nalimilan/FreqTables.jl#54

Draft

nalimilan reviewed Dec 8, 2020

View reviewed changes

johnnychen94 mentioned this pull request Dec 11, 2020

Relax type signature for result_type and allow it to act on numbers #192

Closed

dkarrasch mentioned this pull request Dec 12, 2020

Added generic function for abstractvectors #188

Closed

dkarrasch changed the title ~~Allow generators and iterators in evaluate~~ Allow generators and iterators Dec 12, 2020

dkarrasch and others added 12 commits December 12, 2020 17:26

Allow generators and iterators in evaluate

10c5c2b

fix test

d3dd6e4

fix one type-thing

27699ff

include result_type proposal, add hamming tests

e127fb4

include renyi_divergence, haversine, bregman

948291d

include bhattacharyya / hellinger

243b7b0

Update test/test_dists.jl

8101bb3

Co-authored-by: Milan Bouchet-Valat <[email protected]>

include some review comments

8f44a30

relax parameter types

c055d4d

clean up UnionMetric evaluate

0227942

include iterator-based pair- and colwise

9b34ed9

simplify/optimize pairwise

85cdb1b

dkarrasch force-pushed the dk/generators branch from 8624383 to 85cdb1b Compare December 12, 2020 16:28

nalimilan reviewed Dec 13, 2020

View reviewed changes

src/generic.jl Outdated Show resolved Hide resolved

src/generic.jl Outdated Show resolved Hide resolved

nalimilan reviewed Dec 13, 2020

View reviewed changes

src/bhattacharyya.jl Outdated Show resolved Hide resolved

src/metrics.jl Outdated Show resolved Hide resolved

src/metrics.jl Show resolved Hide resolved

dkarrasch added 3 commits December 13, 2020 21:12

include generic result_type tests

46a91c2

Revert "clean up UnionMetric evaluate"

8fb5108

This reverts commit 0227942.

minor UnionMetric edits

5d04ff0

nalimilan reviewed Dec 14, 2020

View reviewed changes

dkarrasch and others added 3 commits December 15, 2020 10:47

Apply suggestions from code review

5b096d4

Co-authored-by: Milan Bouchet-Valat <[email protected]>

simplify _eltype, add a note to colwise docstring

518fd3d

fix typo

18f17af

dkarrasch mentioned this pull request Dec 15, 2020

Why don't you support NTuples? #162

Closed

dkarrasch added 2 commits December 15, 2020 13:46

transpose -> permutedims

8640d36

handle CartesianIndex

5be0415

dkarrasch mentioned this pull request Dec 15, 2020

Euclidean on Cartesian indices #177

Open

increase code coverage

4333bda

nalimilan reviewed Dec 15, 2020

View reviewed changes

fix docstrings

8219ad0

nalimilan approved these changes Dec 16, 2020

View reviewed changes

dkarrasch added 2 commits December 16, 2020 17:49

Revert "handle CartesianIndex"

5402ed8

This reverts commit 5be0415.

rm redundant tests

1350d47

dkarrasch merged commit f6ee353 into JuliaStats:master Dec 18, 2020

dkarrasch deleted the dk/generators branch December 18, 2020 11:36

johnnychen94 mentioned this pull request Mar 3, 2021

Hausdorff distance not working JuliaImages/ImageDistances.jl#59

Closed

theogf mentioned this pull request Mar 16, 2021

Allow iterators for colwise #212

Closed

willtebbutt mentioned this pull request Aug 17, 2021

Qualify pairwise call JuliaGaussianProcesses/KernelFunctions.jl#360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow generators and iterators #194

Allow generators and iterators #194

dkarrasch commented Dec 5, 2020 •

edited

Loading

dkarrasch commented Dec 5, 2020

johnnychen94 left a comment •

edited

Loading

codecov-io commented Dec 5, 2020 •

edited

Loading

dkarrasch commented Dec 6, 2020

dkarrasch commented Dec 7, 2020

nalimilan commented Dec 14, 2020

nalimilan Dec 14, 2020

dkarrasch Dec 15, 2020

nalimilan Dec 15, 2020

dkarrasch Dec 15, 2020

nalimilan Dec 14, 2020

dkarrasch Dec 15, 2020

nalimilan Dec 15, 2020

dkarrasch Dec 16, 2020

nalimilan Dec 16, 2020

dkarrasch commented Dec 15, 2020

nalimilan left a comment

nalimilan Dec 15, 2020

dkarrasch Dec 16, 2020

nalimilan Dec 16, 2020

nalimilan left a comment

dkarrasch commented Dec 17, 2020

dkarrasch commented Dec 18, 2020

Datseris commented Dec 18, 2020

dkarrasch commented Dec 18, 2020

nalimilan commented Dec 18, 2020

johnnychen94 commented Dec 18, 2020

dkarrasch commented Dec 18, 2020

nalimilan commented Dec 18, 2020

		Compute distances between each pair of rows (if `dims=1`) or columns (if `dims=2`)
		in `a` and `b` according to distance `metric`. If a single matrix `a` is provided,

Allow generators and iterators #194

Allow generators and iterators #194

Conversation

dkarrasch commented Dec 5, 2020 • edited Loading

dkarrasch commented Dec 5, 2020

johnnychen94 left a comment • edited Loading

Choose a reason for hiding this comment

codecov-io commented Dec 5, 2020 • edited Loading

Codecov Report

dkarrasch commented Dec 6, 2020

dkarrasch commented Dec 7, 2020

nalimilan commented Dec 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dkarrasch commented Dec 15, 2020

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

dkarrasch commented Dec 17, 2020

dkarrasch commented Dec 18, 2020

Datseris commented Dec 18, 2020

dkarrasch commented Dec 18, 2020

nalimilan commented Dec 18, 2020

johnnychen94 commented Dec 18, 2020

dkarrasch commented Dec 18, 2020

nalimilan commented Dec 18, 2020

dkarrasch commented Dec 5, 2020 •

edited

Loading

johnnychen94 left a comment •

edited

Loading

codecov-io commented Dec 5, 2020 •

edited

Loading