Accelerating Hybrid Quantized Neural Networks on Multi-tenant Cloud FPGA

2022 IEEE 40th International Conference on Computer Design (ICCD)(2022)

引用 0|浏览13
暂无评分
摘要
The increasing adoption of Field-Programmable Gate Arrays (FPGA) into cloud and data center systems opens the way to the unprecedented acceleration of Machine Learning applications. Convolutional Neural Networks (CNN) have largely been adopted as algorithms for image classification and object detection. As we head towards FPGA multi-tenancy in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multitenant FPGAs cloud Infrastructure. In this work, we propose an FPGA architecture and a design flow that support efficient integration of CNN applications into a cloud infrastructure that exposes multi-tenancy to cloud developers. We prototype the proposed approach on randomly allocated virtual regions to tenants. We study how space-sharing of a single device between multiple cloud tenants influence the design flow, the allocation of resources, and the performance in term of resource utilization and overall latency compared to single-tenant deployments. Prototyping results show a latency at most 8% lower than that of single-tenant deployment while achieving higher resource utilization. We also record a maximum frequency of up to 12% higher in multi-tenant implementations.
更多
查看译文
关键词
FPGAs,Multi-tenancy,CNN Acceleration,Distributed Inference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要