Density Estimation Based on Mass

Data Mining(2011)

引用 12|浏览1
暂无评分
摘要
Density estimation is the ubiquitous base modelling mechanism employed for many tasks such as clustering, classification, anomaly detection and information retrieval. Commonly used density estimation methods such as kernel density estimator and k-nearest neighbour density estimator have high time and space complexities which render them inapplicable in problems with large data size and even a moderate number of dimensions. This weakness sets the fundamental limit in existing algorithms for all these tasks. We propose the first density estimation method which stretches this fundamental limit to an extent that dealing with millions of data can now be done easily and quickly. We analyze the error of the new estimation (from the true density) using a bias-variance analysis. We then perform an empirical evaluation of the proposed method by replacing existing density estimators with the new one in two current density-based algorithms, namely, DBSCAN and LOF. The results show that the new density estimation method significantly improves the runtime of DBSCAN and LOF, while maintaining or improving their task-specific performances in clustering and anomaly detection, respectively. The new method empowers these algorithms, currently limited to small data size only, to process very large databases - setting a new benchmark for what density-based algorithms can achieve.
更多
查看译文
关键词
dbscan,bias variance analysis,pattern clustering,empirical evaluation,fundamental limit,error analysis,data size,kernel density estimator,density-based algorithms,anomaly detection,density based algorithm,density estimation,ubiquitous computing,k-nearest neighbour density estimator,ubiquitous base modelling mechanism,task specific performance,new density estimation method,lof,existing density estimator,new benchmark,true density,density estimation method,space complexity,information retrieval,kernel density estimate,current density,very large database
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要