Deep Learning Inference Service At Microsoft

Jonathan Soifer,Jason Li,Mingqin Li, Jeffrey Zhu,Yingnan Li,Yuxiong He,Elton Zheng,Adi Oltean, Maya Mosyak, Chris Barnes, Thomas Liu,Junhua Wang

PROCEEDINGS OF THE 2019 USENIX CONFERENCE ON OPERATIONAL MACHINE LEARNING(2019)

Cited 2|Views62
No score
Abstract
This paper introduces the Deep Learning Inference Service, an online production service at Microsoft for ultra-low-latency deep neural network model inference. We present the system architecture and deep dive into core concepts such as intelligent model placement, heterogeneous resource management, resource isolation, and efficient routing. We also present production scale and performance numbers.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined