Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams.

IPDPS(2023)

Cited 0|Views23
No score
Abstract
Streaming tensor factorization is an effective tool for unsupervised analysis of time-evolving sparse data, which emerge in many critical domains such as cybersecurity and trend analysis. In contrast to traditional tensors, time-evolving tensors demonstrate extreme sparsity and sparsity variation over time, resulting in irregular memory access and inefficient use of parallel computing resources. Additionally, due to the prohibitive cost of dynamically generating compressed sparse tensor formats, the state-of-the-art approaches process streaming tensors in a raw form that fails to capture data locality and suffers from high synchronization cost. To address these challenges, we propose a new dynamic tensor linearization framework that quickly encodes streaming multi-dimensional data on-the-fly in a compact representation, which has substantially lower memory usage and higher data reuse and parallelism than the original raw data. This is achieved by using a spatial sketching algorithm that keeps all incoming nonzero elements but remaps them into a tensor sketch with considerably reduced multi-dimensional image space. Moreover, we present a dynamic time slicing mechanism that uses variable-width time slices (instead of the traditional fixed-width) to balance the frequency of factor updates and the utilization of computing resources. We demonstrate the efficacy of our framework by accelerating two high-performance streaming tensor algorithms, namely, CP-stream and spCP-stream, and significantly improve their performance for a range of real-world streaming tensors. On a modern 56-core CPU, our framework achieves 10.3- 11x and 6.4- 7.2x geometric-mean speedup for the CP-stream and spCP-stream algorithms, respectively.
More
Translated text
Key words
Sparse tensors,tensor factorization,streaming data,multi-core CPU
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined