Searching Uncertain Data Represented by Non-axis Parallel Gaussian Mixture Models

Data Engineering(2012)

引用 4|浏览0
暂无评分
摘要
Efficient similarity search in uncertain data is a central problem in many modern applications such as biometric identification, stock market analysis, sensor networks, medical imaging, etc. In such applications, the feature vector of an object is not exactly known but is rather defined by a probability density function like a Gaussian Mixture Model (GMM). Previous work is limited to axis-parallel Gaussian distributions, hence, correlations between different features are not considered in the similarity search. In this paper, we propose a novel, efficient similarity search technique for general GMMs without independence assumption for the attributes, named SUDN, which approximates the actual components of a GMM in a conservative but tight way. A filter-refinement architecture guarantees no false dismissals, due to conservativity, as well as a good filter selectivity, due to the tightness of our approximations. An extensive experimental evaluation of SUDN demonstrates a considerable speed-up of similarity queries on general GMMs and an increase in accuracy compared to existing approaches.
更多
查看译文
关键词
Gaussian processes,data handling,query formulation,vectors,SUDN,feature vector,filter-refinement architecture,non-axis parallel Gaussian mixture models,probability density function,similarity search,uncertain data searching,MLIQ,gaussian mixture model,non-axis parallel GMM,similarity search,uncertain data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要