-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A General Optimization Framework for Dynamic Time Warping #16
Comments
Oh, I just saw the code doesn't exist... https://github.com/dderiso/gdtw 😢 |
Thanks for the reference Eric, I had not seen it before (I had never looked at DTW until a few weeks ago). Their approach is probably more principled than the The barycenter algorithm in this package comes from the authors of the repo I forked and is the most simple DTW barycenter algorithm. It's known to produce weird barycenters (but reasonable fast). I'll have a look at the article you linked and see if some of their stuff would fit nicely in here :) |
I've only recently gotten into DTW too. I wanted to do a comparison to see if a 50x speedup was a good sign, so I did some simple benchmarks: julia> using PyCall, DynamicAxisWarping, BenchmarkTools, Distances
julia> const FASTDTW = pyimport_conda("fastdtw", "fastdtw")
PyObject <module 'fastdtw' from '/home/eric/.julia/conda/3/lib/python3.7/site-packages/fastdtw/__init__.py'>
julia> x = rand(20, 100);
julia> y = rand(20, 100);
julia> r = 100
100
julia> xp = permutedims(x);
julia> yp = permutedims(y);
julia> @btime DynamicAxisWarping.dtw($x,$y, $(SqEuclidean()));
88.584 μs (11 allocations: 85.33 KiB)
julia> @btime DynamicAxisWarping.fastdtw($x,$y, $(SqEuclidean()), $r);
88.693 μs (11 allocations: 85.33 KiB)
julia> @btime DynamicAxisWarping.dtw_cost($x,$y, $(SqEuclidean()), $r, s1 = $(zeros(2r+1)), s2 = $(zeros(2r+1)));
128.094 μs (0 allocations: 0 bytes)
julia> @btime FASTDTW.fastdtw($xp, $yp; radius = $r, dist=$(SqEuclidean()));
139.339 ms (861612 allocations: 36.06 MiB)
julia> r = 20
20
julia> @btime DynamicAxisWarping.fastdtw($x,$y, $(SqEuclidean()), $r);
280.930 μs (213 allocations: 179.83 KiB)
julia> @btime DynamicAxisWarping.dtw_cost($x,$y, $(SqEuclidean()), $r, s1 = $(zeros(2r+1)), s2 = $(zeros(2r+1)));
47.091 μs (0 allocations: 0 bytes)
julia> @btime FASTDTW.fastdtw($xp, $yp; radius = $r, dist=$(SqEuclidean()));
256.915 ms (1104904 allocations: 46.42 MiB) A couple things stood out to me: first, we're seeing 3 orders of magnitude perf improvements in DynamicAxisWarping.jl over the python Second, I see worse perf in Third, Anyway, if the 3 orders of magnitude of perf gains over the python package are correct, then only a 50x speedup means maybe their implementation is actually quite slow... However, the algorithm could still be fast when implemented well. |
Oj, thanks for the benchmarks, nice to see we're doing well :)
imin,imax = radiuslimits(20,100,100), plot([imin imax])
dtw(x, y, dist, imin, imax) # Cost eqivalent to dtw_cost(x, y, dist, 20) Note also that some implementations do normalization of the input series before calculating the distance, by default, we do not do that here. |
Ah thanks very much for the clarifications. I didn't realize that difference with Btw, should DynamicAxisWarping.jl/src/dtw.jl Line 28 in 4030604
AbstractVector :).
|
Thanks :)
Good catch! I'm not often a complex kind-of guy, but we should definitely do the right thing here.
True, but I like to think about it like this: If you do have an exotic number type you are probably going to look at a Julia implementation, because what options do you really have? 😄 |
True :). I've run into this exact issue before as well (on Convex.jl, which updated for complex number support over a GSOC a couple years ago but missed an adjoint somewhere, leading to lots of confusing bugs). Looks like there are some extra adjoints in Distances.jl too; I'll try to file an issue or PR next week. By the way, I've been using the following locally for fast zero-allocation euclidean-squared norm DTW, modified from this package and Distances.jl for computing DTW's between a bunch of timeseries all the same size. It's a bit faster than using LinearAlgebra
lastlength(x) = size(x, ndims(x))
# allocation-free version of `out = sum(abs2, in, dims=1)`
function sum_abs2_cols!(out, in)
@inbounds for j = 1:size(in, 2)
outj = zero(eltype(out))
for i = 1:size(in, 1)
outj += abs2(in[i,j])
end
out[j] = outj
end
nothing
end
function pairwise_sq_euclidean!(r::AbstractMatrix, sa2::AbstractVector, sb2::AbstractVector,
a::AbstractMatrix, b::AbstractMatrix)
mul!(r, transpose(a), b)
sum_abs2_cols!(sa2, a)
sum_abs2_cols!(sb2, b)
for j = 1:size(r, 2)
sb = sb2[j]
@simd for i = 1:size(r, 1)
@inbounds r[i, j] = max(sa2[i] + sb - 2 * r[i, j], 0)
end
end
nothing
end
function dtw_cost_matrix!(D::AbstractMatrix{T}, sa2::AbstractVector{T}, sb2::AbstractVector{T}, seq1::AbstractArray{T}, seq2::AbstractArray{T}) where T
# Build the cost matrix
m = lastlength(seq2)
n = lastlength(seq1)
# Initialize first column and first row
pairwise_sq_euclidean!(D, sa2, sb2, seq2, seq1)
# pairwise!(D, dist, seq2, seq1, dims=2)
@assert size(D) == (m,n)
@inbounds for r=2:m
D[r,1] += D[r-1,1]
end
@inbounds for c=2:n
D[1,c] += D[1,c-1]
end
# Complete the cost matrix
@inbounds for c = 2:n
for r = 2:m
D[r, c] += min(D[r-1, c], D[r-1, c-1], D[r, c-1])
end
end
nothing
end
# Holds caches
struct DTW{T} <: Distances.SemiMetric
D::Matrix{T}
sa2::Vector{T}
sb2::Vector{T}
end
DTW{T}(n) where {T} = DTW{T}(zeros(T, n, n), zeros(T, n), zeros(T, n))
DTW(n) = DTW{Float32}(n)
function (dist::DTW)(a,b)
dtw_cost_matrix!(dist.D, dist.sa2, dist.sb2, a, b)
dist.D[end,end]
end |
This code appears to be some 10% faster when run single threaded, but I can imagine it gets quite a bit faster if run with multiple threads since the allocations go away. I'm curious though, what application are you using this on that does not warrant any restrictions at all on the warping path? |
Good question, I'm not really sure the application (comparing spikes in EEG signals) warrants no restrictions at all, but I want to try out different methods and figured I'd start without restrictions. I think some regularization would make sense (which is why I'm a bit interested in this GDTW paper). |
Ah, I recently read some paper where they did more or less that. I think they manually aligned the peaks of the heartbeats first and then used some radius-limited or diamond-limited DTW distance. I guess that it doesn't make much sense comparing two series with different number of heart beats? It feels like restricting the warping path, at least enough so that two beats cannot be matched to one etc. could make sense, and speed it it quite significantly |
So heartbeats is probably ECG (electrocardiogram), I’m looking at EEGs (electroencephalogram), so brain instead of heart, and unfortunately no heartbeats to align. But the segments of recording I'm looking at currently are quite short (like my benchmark example) and centered on features, so it's more about matching up features, and I'm not sure restrictions on the path will give a big speedup (by those benchmarks above). But such restrictions might help get better matches. I am also thinking about other underlying distances other than Euclidean; I feel like a (time domain) optimal transport distance with the right cost matrix might be more meaningful. |
Ah my bad, I often confuse the two ^^ One issue I have is with signals that are not really measures, I have to perform some normalization before applying OT. If it's just normalizing the mass I can live with that, or apply unbalanced transport, but if the signals vary around 0, one would have to offset the signals to be positive, and then possibly normalize, and I feel like this is a much bigger hack. My application is spectrograms, which are positive, but not when looked at in log-power. For such signals, I have to decide on a "floor" which I offset the spectrogram with and truncate to 0 below. I don't know much about EEG, but I can imagine you would face similar issues here? |
Thanks for the links! Yes, I have that issue too, they are zero-centered. So maybe OT isn't really the right approach... |
Hi @baggepinnen,
I came across this paper today https://arxiv.org/abs/1905.12893, and thought it was quite nice; it seemed like a sensible framework for introducing regularization similar to your
transportcost
parameter. I'm also not quite sure how their "time centered mean" corresponds to your barycenters. They also claim 50x speed ups compared to the python FastDTW package, but I'm not sure it would be faster than your code here. I thought I'd mention it here in case it was interesting to you as well.Cheers,
Eric
The text was updated successfully, but these errors were encountered: