The beta-binomial mixture model for word frequencies in documents with applications to information retrieval

EUROSPEECH(1999)

引用 33|浏览15
暂无评分
摘要
This paper describes a continuous-mixture statistical model for word occurrence frequencies in documents, and the application of that model to the DARPA-sponsored TDT topic identification tasks (1). This model was origi- nally proposed in 1990 by L. Gillick (2) as a means to ac- count for variation in word frequencies across documents more accurately than the binomial model. The present paper presents further mathematical development of the model, leading to the implementation of a topic-tracking system. Performance results for this system on the Track- ing Task in the December 1998 DARPA TDT Evaluation will be shown and compared with Dragon's existing, more complex multinomial-model-based system. (Results from other systems applied to this task are available in (3).) We will conclude with plans for further development.
更多
查看译文
关键词
statistical model,tracking system,word frequency,information retrieval,mixture model,binomial model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要