Compute cache for data parallel acceleration

Proceedings of the 12th International Workshop on Network on Chip Architectures(2019)

引用 0|浏览0
暂无评分
摘要
The talk will start with an overview of our work Neural Cache architecture which is capable of fully executing convolutional, fully connected, pooling layers in-cache and also supports quantization in-cache. Then I will present a versatile Compute Cache architecture named Duality Cache, which re-purposes cache structures to transform them into massively parallel compute units capable of running arbitrary data parallel workloads including Deep Neural Networks. Our work presents a holistic approach to building Compute Cache system stack with techniques of performing in-cache floating-point and fixed-point arithmetic, transcendental functions, enabling SIMT execution model, designing a compiler that accepts existing CUDA programs, and providing flexibility in adapting for various workload characteristics. Exposure to massive parallelism that exists in Duality Cache architecture improves performance of GPU benchmarks by 3.6x and OpenACC benchmarks by 3.2x over server class GPU. Re-purposing existing caches provides 72.6x better performance for CPU with only 3.5% of area cost. Duality Cache reduces energy by 5.2x over GPU and 20x over CPU..
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要