Scalable Time Series Compound Infrastructure

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)(2022)

引用 2|浏览10
暂无评分
摘要
Objects ranging from a patient's history of medical tests to an IoT device's series of sensor maintenance records leave digital traces in the form of big time series. These time series objects do not only span exceedingly long time periods (sometimes years), but are also characterized by intermittent yet interrelated time series measurements punctuated by long gaps of silence. This prevalent data type, which we refer to as Time Series Compound objects (or, TSC), has been largely overlooked in the literature. Unique challenges arise when managing, querying and analyzing repositories of these big TSC objects. These include appropriate similarity semantics with time misalignment resiliency, efficient storage of excessively long and complex objects, and TSC-holistic indexing. We demonstrate that state-of-the-art time series systems, although effective at indexing and searching regular time series data, fail to support such big TSC data. In this work, we introduce the first comprehensive solution for managing TSC objects as first class citizen. We introduce new similarity-match semantics as well as a compact misalignment-resilient representation for TSCs. Upon this foundation, we then design a TSC-aware distributed indexing infrastructure Sloth that supports scalable storage, indexing and querying of TB-scale TSC datasets. Our experimental study demonstrates that for TB-scale datasets, the query response time of Sloth is up to one order of magnitude faster than that of existing systems, while the mean average precision (mAP) for approximate kNN similarity match query results by Sloth is 70% more accurate than existing solutions.
更多
查看译文
关键词
Time Series Compound, Distributed Indexing, Similarity Search, kNN Approximate Query, Sloth
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要