Quantifying Multilingual Performance of Large Language Models Across Languages
arxiv(2024)
摘要
The training process of Large Language Models (LLMs) requires extensive text
corpus. However, these data are often unevenly distributed in different
languages. As a result, LLMs perform well on common languages, such as English,
German, and French, but perform poorly on low-resource languages. However,
currently there is no work to quantitatively measure the performance of LLMs in
low-resource languages. To fill this gap, we proposed the Language Ranker that
aims to benchmark and rank different languages according to the performance of
LLMs on those languages. We employ the LLM's performance on the English corpus
as a baseline to compare the performances of different languages and English.
We have the following three findings: 1. The performance rankings of different
LLMs in all languages are roughly the same. 2. LLMs with different sizes have
the same partial order of performance. 3. There is a strong correlation between
LlaMa2's performance in different languages and the proportion of the
pre-training corpus. These findings illustrate that the Language Ranker can be
used as an indicator to measure the language performance of LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要