InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
CoRR(2024)
摘要
Large language models (LLMs) with long sequences begin to power more and more
fundamentally new applications we use every day. Existing methods for
long-sequence LLM training are neither efficient nor compatible with
commonly-used training algorithms such as FlashAttention. We design Buff to
address these issues. Buff decouples all of the sharding dimensions into a new
hierarchical space, and systematically analyzes the memory and communication
cost of LLM training. Then, it generates an effective hybrid parallelism
strategy. We design a new selective overlap mechanism to mitigate the
communication overhead introduced by the hybrid parallelism. We also implement
memory management techniques to reduce GPU memory fragmentation. Evaluation
results show that Buff generates parallelization strategies that match or
outperform existing methods in model FLOPs utilization.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要