Skip to content

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
method sub-headings
  • Loading branch information
beckyperriment authored Mar 11, 2024
1 parent 2b4fd83 commit 36611c1
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ The current functionality of the software is as follows:

# Mathematical background

## Dynamic time warping

Consider a time series to be a vector of some arbitrary length. Consider that we have $p$ such vectors in total, each possibly differing in length. To find a subset of $k$ clusters within the set of $p$ vectors using MIP formulation, we must first make $\frac{1}{2} {p \choose 2}$ pairwise comparisons between all vectors within the total set and find the `similarity' between each pair. In this case, the similarity is defined as the DTW distance. Consider two time series $x$ and $y$ of differing lengths $n$ and $m$ respectively,

$$
Expand Down Expand Up @@ -83,6 +85,8 @@ Finding the optimal warping arrangement is an optimisation problem that can be s

The final element $c_{n,m}$ is then the total cost, $C_{x,y}$, which provides the comparison metric between the two series $x$ and $y$. \autoref{fig:warping_signals} shows an example of this cost matrix $C$ and the warping path through it.

## Clustering

For the clustering problem, only this final cost for each pairwise comparison is required; the actual warping path (or mapping of each point in one time series to the other) is superfluous for k-medoids clustering. The memory complexity of the cost matrix $C$ is $O(nm)$, so as the length of the time series increases, the memory required increases greatly. Therefore, significant reductions in memory can be made by not storing the entire $C$ matrix. When the warping path is not required, only a vector containing the previous row for the current step of the dynamic programming sub-problem is required (i.e., the previous three values $c_{i-1,j-1}$, $c_{i-1,j}$, $c_{i,j-1}$), as indicated in \autoref{c}.

The DTW distance $C_{x,y}$ is found for each pairwise comparison. As shown in \ref{fig:c_to_d}, pairwise distances are then stored in a separate symmetric matrix, $D^{p\times p}$, where $p$ is the total number of time series in the clustering exercise. In other words, the element $d_{i,j}$ gives the distance between time series $i$ and $j$.
Expand Down

0 comments on commit 36611c1

Please sign in to comment.