High Dimensional Data Stream Clustering using Topological Representation Learning.

SSCI(2022)

引用 0|浏览1
暂无评分
摘要
Due to the high dimensionality of the data, storing the whole set of data during stream processing is impractical. Therefore, only a summary of the input stream is maintained, necessitating the development of specialized data structures that permit incremental summarization of the input stream. The problem becomes more complex when dealing with highdimensional text data due to the high sparsity. In this paper we propose a new topological unsupervised learning approach for high dimensional text data streams. The proposed method simultaneously learns the representation of the stream and cluster the data in a smaller dimension space. The evaluation of the proposed OTTC (Online Topological Text Clustering) approach and the comparison with the state of art methods is done by using the framework MOA (Massive Online Analysis), an open-source benchmarking software for evolving data streams. The proposed approach outperforms the classical methods and the obtained results are very promising for clustering high dimensional text data streams.
更多
查看译文
关键词
Data stream,Clustering,high dimensional data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要