WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2018)

引用 21|浏览19
暂无评分
摘要
Hash maps are among the most versatile data structures in computer science because of their compact data layout and expected constant time complexity for insertion and querying. However, associated memory access patterns during the probing phase are highly irregular resulting in strongly memory-bound implementations. Massively parallel accelerators such as CUDA-enabled GPUs may overcome this limitation by virtue of their fast video memory featuring almost one TB/s bandwidth in comparison to main memory modules of state-of-the-art CPUs with less than 100 GB/s. Unfortunately, the size of hash maps supported by existing single-GPU hashing implementations is restricted by the limited amount of available video RAM. Hence, hash map construction and querying that scales across multiple GPUs is urgently needed in order to support structured storage of bigger datasets at high speeds. In this paper, we introduce WarpDrive - a scalable, distributed single-node multi-GPU implementation for the construction and querying of billions of key-value pairs. We propose a novel subwarp-based probing scheme featuring coalesced memory access over consecutive memory regions in order to mitigate the high latency of irregular access patterns. Our implementation achieves 1.4 billion insertions per second in single-GPU mode for a load factor of 0.95 thereby outperforming the GPU-cuckoo implementation of the CUDPP library by a factor of 2.8 on a P100. Furthermore, we present transparent scaling to multiple GPUs within the same node with up to 4.3 billion operations per second for high load factors on four P100 GPUs connected by NVLink technology. WarpDrive is free software and can be downloaded at https://github.com/sleeepyjack/warpdrive.
更多
查看译文
关键词
hash table,distributed,GPU,CUDA,NVLINK
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要