Performance Comparison Of Julia Distributed Implementations Of Dirichlet Process Mixture Models

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2019)

引用 3|浏览17
暂无评分
摘要
The Dirichlet process mixture model (DPMM), one of the nonparametric Bayesian mixture models, is receiving more and more attentions from the statistical learning community. It has been demonstrated its great potentials in clustering analysis. When computational complexity increases as numbers of observations and features grow, the serial algorithms of DPMM need long processing time and cannot handle large volume of data on a single machine. To improve the computational efficiency, several parallel methods and implementations were proposed and implemented with C++ and Julia programming languages by different authors and publicly available on GitHub or published as a Julia package for users to download. However, the scalability of multi-cores and multi-node has not been thoroughly evaluated and compared among different implementations, even for multiple implementations of the same proposed distributed PPMM method. We selected two recent Julia implementations of parallel sampler via sub-cluster splits method proposed by Change and Fisher and performed a scalability comparison on supercomputer clusters. This paper presents some insights on the applicability of both implementations in terms of increasing number of dimensions of the feature space and provides some potential improvement strategies on multi-node scalability.
更多
查看译文
关键词
Julia, software profiling, performance analysis, DPMM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要