Summarizing Evolving Data Streams Using Dynamic Prefix Trees

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence(2007)

引用 6|浏览0
暂无评分
摘要
In stream data mining it is important to use the most recent data to cope wit h the evolving nature of the underlying patterns. Simply keeping the most recent records offers no flexibility about which data is kept, and does not exploit even minimal redundancies in the data (a first step towards pattern discovery). This paper focuses in how to construct and maintain efficiently (in one pass) a compact summary for data such as web logs and text streams. The resulting structure is a prefix tree, with ordering criterion that changes with time, such as an activity time stamp or attribute frequency. A detailed analysis of the factors that affect its performance is carried out, including empirical evaluations using the well known 20 Newsgroups data set. Guidelines for forgetting and tree pruning are also provided. Finally, we use this data structure to discover evolving topics from the 20 Newsgroups.
更多
查看译文
关键词
association rules,data mining,data structures,frequency,pattern recognition,data structure,navigation,clustering,web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要