Using Butterfly-Patterned Partial Sums To Draw From Discrete Distributions

ACM Transactions on Parallel Computing（2017）

引用 3|浏览41

暂无评分

摘要

We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses; from this table, complete partial sums are computed on the fly during a binary search. Measurements using CUBA 7.5 on an NVIDIA Titan Black GPU show that this technique makes an entire machine-learning application that uses a Latent Dirichlet Allocation topic model with 1024 topics about about 13% faster (when using single-precision floating-point data) or about 35% faster (when using double precision floating-point data) than doing a straightforward matrix transposition after using coalesced accesses.

查看译文

关键词

butterfly,coalesced memory access,discrete distribution,GPU,latent Dirichlet allocation,LDA,machine learning,multithreading,memory bottleneck,parallel computing,random sampling,SIMD,transposed memory access

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要