Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs.

ICPP(2021)

引用 3|浏览27
暂无评分
摘要
Texture identification has been developed recently to support oneto-one verification and one-to-many search, which provides much broader support than texture classification in real-life applications. It has demonstrated great potentials to enable product traceability by identifying the unique texture information on the surface of the targeted objects. However, existing hardware acceleration schemes are not enough to support a large-scale texture identification, especially for the search task, where the number of texture images being searched can reach millions, creating enormous compute and memory demands and making real-time texture identification infeasible. To address these problems, we propose a comprehensive toolset with jointly optimization strategies from both hardware and software to deliver optimized GPU acceleration and leverage large-scale texture identification with real-time responses. Novel technologies include: 1) a highly-optimized cuBLAS implementation for efficiently running 2-nearest neighbors algorithm; 2) a hybrid cache design to incorporate host memory for streaming data toward GPUs, which delivers a 5x larger memory capacity while running the targeted workloads; 3) a batch process to fully exploit the data reuse opportunities by considering available compute resources and memory bandwidth constraints. 4) an asymmetric local feature extraction to reduce the memory footprint for keeping feature matrices of reference texture images. To the best of our knowledge, this work is the first implementation to provide realtime large-scale texture identification on GPUs. By exploring the co-optimizations from both hardware and software, we can deliver 31x faster search and 20x larger feature cache capacity compared to a conventional CUDA implementation. We also demonstrate our proposed designs by proposing a distributed texture identification system with 14 Nvidia Tesla P100 GPUs which can complete 872,984 texture similarity comparisons in just one second.
更多
查看译文
关键词
texture identification, GPU acceleration, cuBLAS, batching, hybrid cache, nearest neighbor, feature extraction, SIFT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要