Dynamic Time Warping (DTW)
Date: February 24, 2025 3:57 PM
Created by Kaley Joss, 08.15.2024
Sources: https://www.cs.unm.edu/~mueen/DTW.pdf
Contents
Dynamic Time Warping (DTW)
How to write it accurately, and efficiently.
Why use DTW?
Almost always wins on extracting features vs. attempting to create feature vectors to characterize something
Why efficiency?
- Efficiency: You can use DTW to search one billion subsequences in under two minutes, using all the “tricks” shown in this tutorial. However a naïve off-the-shelf recursion- based DTW implementation would take six years.
- Effectiveness: While DTW is quite robust, there are some “silly” things you could do to cripple its effectiveness; not z- normalizing, set the wrong warping window, enforcing the endpoint constraint in certain datasets…
How is DTW Calculated?
- For 2 1D timeseries, Q and c, length len(Q) and len(c), it creates a matrix size len(Q) by len(c) and fills the matrix with the euclidian (or some other distance measure) between every pair of points (not just those at the same time).
- Then, a path touching each timepoint of both timeseries is calculated, with the distance matrix, to sub to the least total distance. (k=1):
Visualization:
Functions to achieve this:
Recursive Function
“Starting at $i,j = (0,0)$, search all possible ‘steps’ at the adjoining cells of the matrix (1,1) (1,0) (0,1) and take the one with the least distance.”
- Using warping constraint w, can weight it towards the diagonal step (1,1) to discourage ‘warping’ away from time-locked movement
- In data mining, ‘seems to make little difference’?
Iterative Function
- Equivalent to recursive function, no logical difference
- Orders of magnitude faster
Choices to Consider when Running DTW
DTW: Time and Space Complexity
- The “off-the-shelf” DTW has
- a time complexity of $O(n^2)$ (with a large constant factor)
- a space complexity of $O(n^2)$
- Efficient DTW version can have
- a space complexity of $O(n)$
- an amortized time complexity of $O(n)$
The Warping Constraint $w$
- The value of w is the maximum amount the warping path is allowed to deviate from the diagonal.
- It is normally expressed as the ratio $w = r/n$ (or as a percentage)
Impact of w:
- The value chosen for w greatly affects accuracy and speed
- The value also is important for clustering
- No warping = timelocked series, no non-time-equal clusters
- Unlimited warping = clusters found by shape, time-independent
- Optimal warping = clusters by time and shape. Time ranges
How to choose $w$?
- Can test every value of w from $w=0$ to $w=20$, and measure accuracy
- Different datasets have different optimal values or ranges of $w$
- The size of the dataset— more data = the error rate should decrease and the optimal size of w should get smaller
- as the dataset gets bigger, DTW = euclidian distance
- though DTW would probably still be faster
- Best value for classification (shortest distance between neighbor and exemplar) may not be the best value for clustering (mutual distance of group)
Endpoints and Unequal Lengths
Possibilities for using DTW on unequal lengths:
- Can compare them at unequal lengths
- Can cut off the longer one so they have equal lengths
- Could resample/interpolate them to have the same length
Possible Fixes for Differently-Patterned Start/End
- Use a better extraction so that subsequences are more similar in their start/end point
- Change the boundary condition
Z-Normalization
Multi-Dimensional DTW (MD DTW)
How to Compute MD DTW: 2 Ways
- Independent: compute the DTW score for each dimension independently, and sum up each score
- Best for times where the coupled variables are related but with time-lags or non-simultaneous impact
- Dependent: Create a single distance matrix that reflects the distance between each corresponding pair of time series, then find the single warping path and distance as per normal
- Best for times where the coupled variables are impacted simultaneously by environmental/3rd factors
- If you’re not sure, you can always try both and compare the error rate
How Many Dimensions?
- Could generalize to 1,000+ dimensions, but unlikely that >3 is useful— more dimensions = less accuracy
- Example: dataset of inertial movement from 36 different places on someone’s body while they perform