Motiflets – Simple and Accurate Detection of Motifs in Time Series
arxiv(2022)
摘要
A time series motif intuitively is a short time series that repeats itself
approximately the same within a larger time series. Such motifs often represent
concealed structures, such as heart beats in an ECG recording, the riff in a
pop song, or sleep spindles in EEG sleep data. Motif discovery (MD) is the task
of finding such motifs in a given input series. As there are varying
definitions of what exactly a motif is, a number of different algorithms exist.
As central parameters they all take the length l of the motif and the maximal
distance r between the motif's occurrences. In practice, however, especially
suitable values for r are very hard to determine upfront, and found motifs show
a high variability even for very similar r values. Accordingly, finding an
interesting motif requires extensive trial-and-error.
In this paper, we present a different approach to the MD problem. We define
k-Motiflets as the set of exactly k occurrences of a motif of length l, whose
maximum pairwise distance is minimal. This turns the MD problem upside-down:
The central parameter of our approach is not the distance threshold r, but the
desired number of occurrence k of the motif, which we show is considerably more
intuitive and easier to set. Based on this definition, we present exact and
approximate algorithms for finding k-Motiflets and analyze their complexity. To
further ease the use of our method, we describe statistical tools to
automatically determine meaningful values for its input parameters. By
evaluation on several real-world data sets and comparison to four SotA MD
algorithms, we show that our proposed algorithm is both quantitatively superior
to its competitors, finding larger motif sets at higher similarity, and
qualitatively better, leading to clearer and easier to interpret motifs without
any need for manual tuning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要