The Architectural Implications of Facebook's DNN-based Personalized Recommendation.

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)(2019)

引用 261|浏览4
暂无评分
摘要
The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.
更多
查看译文
关键词
data centers,personalized recommendation,content ranking,deep neural networks,recommendation systems,production-scale DNNs,open-source workloads,recommendation models,Facebook,deep learning,latency-bounded throughput
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要