Using Butterfly-Patterned Partial Sums To Draw From Discrete Distributions

ACM Transactions on Parallel Computing(2017)

引用 3|浏览41
暂无评分
摘要
We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses; from this table, complete partial sums are computed on the fly during a binary search. Measurements using CUBA 7.5 on an NVIDIA Titan Black GPU show that this technique makes an entire machine-learning application that uses a Latent Dirichlet Allocation topic model with 1024 topics about about 13% faster (when using single-precision floating-point data) or about 35% faster (when using double precision floating-point data) than doing a straightforward matrix transposition after using coalesced accesses.
更多
查看译文
关键词
butterfly,coalesced memory access,discrete distribution,GPU,latent Dirichlet allocation,LDA,machine learning,multithreading,memory bottleneck,parallel computing,random sampling,SIMD,transposed memory access
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要