Efficient Data Representation Learning in Google-scale Systems

Derek Zhiyuan Cheng,Ruoxi Wang,Wang-Cheng Kang,Benjamin Coleman,Yin Zhang,Jianmo Ni, Jonathan Valverde,Lichan Hong,Ed H. Chi

PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023（2023）

引用 0|浏览37

暂无评分

摘要

"Garbage in, Garbage out" is a familiar maxim to ML practitioners and researchers, because the quality of a learned data representation is highly crucial to the quality of any ML model that consumes it as an input. To handle systems that serve billions of users at millions of queries per second (QPS), we need representation learning algorithms with significantly improved efficiency. At Google, we have dedicated thousands of iterations to develop a set of powerful techniques that efficiently learn high quality data representations. We have thoroughly validated these methods through offline evaluation, online A/B testing, and deployed these in over 50 models across major Google products. In this paper, we consider a generalized data representation learning problem that allows us to identify feature embeddings and crosses as common challenges. We propose two solutions, including: 1. Multi-size Unified Embedding to learn high-quality embeddings; and 2. Deep Cross Network V2 for learning effective feature crosses. We discuss the practical challenges we encountered and solutions we developed during deployment to production systems, compare with SOTA methods, and report offline and online experimental results. This work sheds light on the challenges and opportunities for developing next-gen algorithms for web-scale systems.

查看译文

关键词

Data Representation Learning,Feature Cross,Embedding Learning,Efficiency,Scalability,Search,Ads,Recommendation Systems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要