bigg2vec: Fast and Memory-Efficient Representation Learning for Billion-Scale Graphs on a Single Machine.

Big Data(2022)

引用 0|浏览11
暂无评分
摘要
Node embeddings obtained from information networks have been widely adopted for representing knowledge and driving various information retrieval and machine learning tasks. However, training node embeddings is computationally intensive, making it difficult to scale to larger graphs. Most existing works have addressed the scalability challenge by simply adding more hardware resources. For example, a common approach to speed up the training process is to distribute model computation across multiple machines and GPUs. This paper takes an orthogonal approach towards scalability by addressing the problem of computation complexity in training embeddings. We present bigg2vec for scaling up the embedding training process. bigg2vec introduces a novel polar coordinate-based system for internal representation and computation. It provides the following benefits: (a) It significantly reduces compute and memory requirements while improving embedding quality and (b) uses a novel graph organization to generate high-quality negative samples (this reduces the number of negative samples needed for training, which is especially beneficial f or skewed graphs). We have deployed bigg2vec to generate embeddings for multiple AI models within Visa. Our Global Personalized Restaurant Recommender System (GPR) is one such project that uses bigg2vec to periodically generate embeddings for over 450 million nodes connected by more than 3 billion edges. bigg2vec generates higher quality embeddings while training them faster than state-of-the-art methods on a single CPU-based machine.
更多
查看译文
关键词
graph representation learning,scalability,embeddings,billion scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要