A General Optimization Framework for Dynamic Time Warping #16

ericphanson · 2020-05-13T14:12:30Z

I came across this paper today https://arxiv.org/abs/1905.12893, and thought it was quite nice; it seemed like a sensible framework for introducing regularization similar to your transportcost parameter. I'm also not quite sure how their "time centered mean" corresponds to your barycenters. They also claim 50x speed ups compared to the python FastDTW package, but I'm not sure it would be faster than your code here. I thought I'd mention it here in case it was interesting to you as well.

Cheers,
Eric

The text was updated successfully, but these errors were encountered:

ericphanson · 2020-05-13T14:15:58Z

Oh, I just saw the code doesn't exist... https://github.com/dderiso/gdtw 😢

baggepinnen · 2020-05-13T14:23:04Z

Thanks for the reference Eric, I had not seen it before (I had never looked at DTW until a few weeks ago). Their approach is probably more principled than the transportcost which is a completely adhoc addition, but which seems to do what I wanted 👼

The barycenter algorithm in this package comes from the authors of the repo I forked and is the most simple DTW barycenter algorithm. It's known to produce weird barycenters (but reasonable fast). I'll have a look at the article you linked and see if some of their stuff would fit nicely in here :)

ericphanson · 2020-05-13T15:11:10Z

I've only recently gotten into DTW too.

I wanted to do a comparison to see if a 50x speedup was a good sign, so I did some simple benchmarks:

julia> using PyCall, DynamicAxisWarping, BenchmarkTools, Distances

julia> const FASTDTW = pyimport_conda("fastdtw", "fastdtw")
PyObject <module 'fastdtw' from '/home/eric/.julia/conda/3/lib/python3.7/site-packages/fastdtw/__init__.py'>

julia> x = rand(20, 100);

julia> y = rand(20, 100);

julia> r = 100
100

julia> xp = permutedims(x);

julia> yp = permutedims(y);

julia> @btime DynamicAxisWarping.dtw($x,$y, $(SqEuclidean()));
  88.584 μs (11 allocations: 85.33 KiB)

julia> @btime DynamicAxisWarping.fastdtw($x,$y, $(SqEuclidean()), $r);
  88.693 μs (11 allocations: 85.33 KiB)

julia> @btime DynamicAxisWarping.dtw_cost($x,$y, $(SqEuclidean()), $r, s1 = $(zeros(2r+1)), s2 = $(zeros(2r+1)));
  128.094 μs (0 allocations: 0 bytes)

julia> @btime FASTDTW.fastdtw($xp, $yp; radius = $r, dist=$(SqEuclidean()));
  139.339 ms (861612 allocations: 36.06 MiB)

julia> r = 20
20

julia> @btime DynamicAxisWarping.fastdtw($x,$y, $(SqEuclidean()), $r);
  280.930 μs (213 allocations: 179.83 KiB)

julia> @btime DynamicAxisWarping.dtw_cost($x,$y, $(SqEuclidean()), $r, s1 = $(zeros(2r+1)), s2 = $(zeros(2r+1)));
  47.091 μs (0 allocations: 0 bytes)

julia> @btime FASTDTW.fastdtw($xp, $yp; radius = $r, dist=$(SqEuclidean()));
  256.915 ms (1104904 allocations: 46.42 MiB)

A couple things stood out to me: first, we're seeing 3 orders of magnitude perf improvements in DynamicAxisWarping.jl over the python fastdtw! But there's also a ton of allocations there. I wonder if I'm not calling it correctly? (The permutedims are to align with row-majorness of numpy, and are needed to get the same answers). My understanding is PyCall shares memory between Julia arrays and numpy arrays, so it should be pretty efficient, I'd think.

Second, I see worse perf in DynamicAxisWarping.fastdtw and FASTDTW.fastdtw with smaller radius. Is that an algorithmic problem? Seems surprising since the naive algorithm should scale like n*r I thought.

Third, DynamicAxisWarping.dtw_cost is slower than DynamicAxisWarping.dtw at r=100; that seemed strange too to me. (Also, DynamicAxisWarping.dtw doesn't accept a radius argument?).

Anyway, if the 3 orders of magnitude of perf gains over the python package are correct, then only a 50x speedup means maybe their implementation is actually quite slow... However, the algorithm could still be fast when implemented well.

baggepinnen · 2020-05-13T18:31:00Z

Oj, thanks for the benchmarks, nice to see we're doing well :)
A few things come to mind

I know very little about calling python from Julia, but I've also understood that memory should be shared and not copied. Python code is seldom written taking allocations into mind so I'm not surprised their code allocates. For the code in this package, I've seen to that it should be possible to allocate almost nothing. It might also be that the python package is just very slow fastdtw is very slow slaypni/fastdtw#38.
FastDTW can be faster than vanilla DTW, but not always. See https://arxiv.org/pdf/2003.11246.pdf for an analysis. I have't spent much time on the fastdtw implementation, it's possible it can be improved.
dtw_cost should be faster than dtw if the radius is small in relation to the length of the series. If the radius is comparable to the length of the series, I'm not surprised if dtw is as fast or faster, because it uses SIMD to calculate all distances in a batch very efficiently (for >1 dim like your example, BLAS is used). If you need a very large radius, you might actually need a sliding distance ala dtwnn.
dtw is much more flexible than dtw_cost and accepts arbitrary bounds on the warping path. To construct simple radius bounds, try

imin,imax = radiuslimits(20,100,100), plot([imin imax])
dtw(x, y, dist, imin, imax) # Cost eqivalent to dtw_cost(x, y, dist, 20)

Note also that some implementations do normalization of the input series before calculating the distance, by default, we do not do that here.

ericphanson · 2020-05-14T11:22:19Z

Ah thanks very much for the clarifications. I didn't realize that difference with dtw but that makes sense; I see it uses the pairwise methods defined in Distances. And great job on the performance btw!

Btw, should

DynamicAxisWarping.jl/src/dtw.jl

Line 28 in 4030604

    
           Distances.pairwise(d::PreMetric, s1::AbstractVector, s2::AbstractVector; dims=2) = evaluate.(Ref(d), s1, s2')

have a transpose instead of an adjoint? Probably complex-valued data isn't super common but you never know what'll be in an AbstractVector :).

baggepinnen · 2020-05-14T14:53:01Z

Thanks :)

have a transpose instead of an adjoint?

Good catch! I'm not often a complex kind-of guy, but we should definitely do the right thing here.

Probably complex-valued data isn't super common

True, but I like to think about it like this: If you do have an exotic number type you are probably going to look at a Julia implementation, because what options do you really have? 😄

ericphanson · 2020-05-15T18:32:08Z

True :). I've run into this exact issue before as well (on Convex.jl, which updated for complex number support over a GSOC a couple years ago but missed an adjoint somewhere, leading to lots of confusing bugs). Looks like there are some extra adjoints in Distances.jl too; I'll try to file an issue or PR next week.

By the way, I've been using the following locally for fast zero-allocation euclidean-squared norm DTW, modified from this package and Distances.jl for computing DTW's between a bunch of timeseries all the same size. It's a bit faster than dtw while still using BLAS. Putting caches in the distance object feels like a nice pattern since you can just create as many distance objects as you have threads to do it multithreaded. I'm not sure it's worth putting in DynamicAxisWarping though since you might not want to have to make a new distance object for each length of timeseries. (I've only tested it on matrices also). But I figured I'd put it here in case it was useful.

using LinearAlgebra

lastlength(x) = size(x, ndims(x))

# allocation-free version of `out = sum(abs2, in, dims=1)`
function sum_abs2_cols!(out, in)
    @inbounds for j = 1:size(in, 2)
        outj = zero(eltype(out))
        for i = 1:size(in, 1)
            outj += abs2(in[i,j])
        end
        out[j] = outj
    end
    nothing
end

function pairwise_sq_euclidean!(r::AbstractMatrix, sa2::AbstractVector, sb2::AbstractVector,
    a::AbstractMatrix, b::AbstractMatrix)
    mul!(r, transpose(a), b)
    sum_abs2_cols!(sa2, a)
    sum_abs2_cols!(sb2, b)
    for j = 1:size(r, 2)
        sb = sb2[j]
        @simd for i = 1:size(r, 1)
            @inbounds r[i, j] = max(sa2[i] + sb - 2 * r[i, j], 0)
        end
    end
    nothing
end

function dtw_cost_matrix!(D::AbstractMatrix{T}, sa2::AbstractVector{T}, sb2::AbstractVector{T}, seq1::AbstractArray{T}, seq2::AbstractArray{T}) where T
    # Build the cost matrix
    m = lastlength(seq2)
    n = lastlength(seq1)

    # Initialize first column and first row
    pairwise_sq_euclidean!(D, sa2, sb2, seq2, seq1)
    # pairwise!(D, dist, seq2, seq1, dims=2)
    @assert size(D) == (m,n)

    @inbounds for r=2:m
        D[r,1] += D[r-1,1]
    end
    @inbounds for c=2:n
        D[1,c] += D[1,c-1]
    end

    # Complete the cost matrix
    @inbounds for c = 2:n
        for r = 2:m
            D[r, c] += min(D[r-1, c], D[r-1, c-1], D[r, c-1])
        end
    end

    nothing
end

# Holds caches
struct DTW{T}  <: Distances.SemiMetric
    D::Matrix{T}
    sa2::Vector{T}
    sb2::Vector{T}
end

DTW{T}(n) where {T} = DTW{T}(zeros(T, n, n), zeros(T, n), zeros(T, n))

DTW(n) = DTW{Float32}(n)

function (dist::DTW)(a,b)
    dtw_cost_matrix!(dist.D, dist.sa2, dist.sb2, a, b)
    dist.D[end,end]
end

baggepinnen · 2020-05-16T09:45:56Z

This code appears to be some 10% faster when run single threaded, but I can imagine it gets quite a bit faster if run with multiple threads since the allocations go away. I'm curious though, what application are you using this on that does not warrant any restrictions at all on the warping path?

ericphanson · 2020-05-16T15:16:32Z

Good question, I'm not really sure the application (comparing spikes in EEG signals) warrants no restrictions at all, but I want to try out different methods and figured I'd start without restrictions. I think some regularization would make sense (which is why I'm a bit interested in this GDTW paper).

baggepinnen · 2020-05-17T14:29:52Z

Ah, I recently read some paper where they did more or less that. I think they manually aligned the peaks of the heartbeats first and then used some radius-limited or diamond-limited DTW distance. I guess that it doesn't make much sense comparing two series with different number of heart beats? It feels like restricting the warping path, at least enough so that two beats cannot be matched to one etc. could make sense, and speed it it quite significantly

ericphanson · 2020-05-18T13:34:47Z

So heartbeats is probably ECG (electrocardiogram), I’m looking at EEGs (electroencephalogram), so brain instead of heart, and unfortunately no heartbeats to align. But the segments of recording I'm looking at currently are quite short (like my benchmark example) and centered on features, so it's more about matching up features, and I'm not sure restrictions on the path will give a big speedup (by those benchmarks above). But such restrictions might help get better matches. I am also thinking about other underlying distances other than Euclidean; I feel like a (time domain) optimal transport distance with the right cost matrix might be more meaningful.

baggepinnen · 2020-05-19T05:12:41Z

Ah my bad, I often confuse the two ^^
I have experimented a bit with combining DTW and OT, here's an example and here's another one in the form of a notebook.

One issue I have is with signals that are not really measures, I have to perform some normalization before applying OT. If it's just normalizing the mass I can live with that, or apply unbalanced transport, but if the signals vary around 0, one would have to offset the signals to be positive, and then possibly normalize, and I feel like this is a much bigger hack. My application is spectrograms, which are positive, but not when looked at in log-power. For such signals, I have to decide on a "floor" which I offset the spectrogram with and truncate to 0 below.

I don't know much about EEG, but I can imagine you would face similar issues here?

ericphanson · 2020-05-19T10:11:20Z

Thanks for the links! Yes, I have that issue too, they are zero-centered. So maybe OT isn't really the right approach...

ericphanson mentioned this issue Jun 10, 2020

Add GDTW implementation #22

Merged

baggepinnen closed this as completed in #22 Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A General Optimization Framework for Dynamic Time Warping #16

A General Optimization Framework for Dynamic Time Warping #16

ericphanson commented May 13, 2020

ericphanson commented May 13, 2020

baggepinnen commented May 13, 2020

ericphanson commented May 13, 2020

baggepinnen commented May 13, 2020 •

edited

Loading

ericphanson commented May 14, 2020

baggepinnen commented May 14, 2020

ericphanson commented May 15, 2020

baggepinnen commented May 16, 2020

ericphanson commented May 16, 2020

baggepinnen commented May 17, 2020

ericphanson commented May 18, 2020

baggepinnen commented May 19, 2020

ericphanson commented May 19, 2020

A General Optimization Framework for Dynamic Time Warping #16

A General Optimization Framework for Dynamic Time Warping #16

Comments

ericphanson commented May 13, 2020

ericphanson commented May 13, 2020

baggepinnen commented May 13, 2020

ericphanson commented May 13, 2020

baggepinnen commented May 13, 2020 • edited Loading

ericphanson commented May 14, 2020

baggepinnen commented May 14, 2020

ericphanson commented May 15, 2020

baggepinnen commented May 16, 2020

ericphanson commented May 16, 2020

baggepinnen commented May 17, 2020

ericphanson commented May 18, 2020

baggepinnen commented May 19, 2020

ericphanson commented May 19, 2020

baggepinnen commented May 13, 2020 •

edited

Loading