Performance Characterization of Large Language Models on High-Speed Interconnects

2023 IEEE Symposium on High-Performance Interconnects (HOTI)(2023)

引用 0|浏览13
暂无评分
摘要
Large Language Models (LLMs) have recently gained significant popularity due to their ability to generate human-like text and perform a wide range of natural language processing tasks. Training these models usually requires a large amount of computational resources and is often done in a distributed manner. The use of high-speed interconnects can significantly influence the efficiency of distributed training. Therefore, there poses a need for systematic studies to explore the distributed training characteristics of these models on high-speed interconnects. This paper presents a comprehensive performance characterization of representative large language models: GPT, BERT, and T5. We evaluate their training performance in terms of iteration time, interconnect utilization, and scalability, over different high-speed interconnects and communication protocols, including TCP/IP, IPoIB, and RDMA. We observe that interconnects play a vital role in LLM training. Specifically, RDMA-100 Gbps outperforms IPoIB-100 Gbps and TCP/IP-10 Gbps by an average of 2.51x and 4.79x regarding training iteration time, and scores the highest interconnect utilization (up to 60 Gbps) in both strong and weak scaling, compared to IPoIB with up to 20 Gbps and TCP/IP with up to 9 Gbps, leading to the shortest training time. We also observe that larger models tend to have higher requirements for communication bandwidth, especially for AllReduce during backward propagation, which can take up to 91.12% of training time. Through our evaluation, we envision opportunities to improve the communication time for better training performance of LLMs. We extensively explore and summarize the role communication plays in distributed LLM training.
更多
查看译文
关键词
Large language models, Characterization, Transformer, GPT, BERT, T5
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要