Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models
CoRR(2024)
摘要
We present Gyan AI Paramanu ("atom"), a family of novel language models for
Indian languages. It is a collection of auto-regressive monolingual, bilingual,
and multilingual Indic language models pretrained from scratch on a single GPU
for 10 Indian languages (Assamese, Bangla, Hindi, Konkani, Maithili, Marathi,
Odia, Sanskrit, Tamil, Telugu) across 5 scripts (Bangla, Devanagari, Odia,
Tamil, Telugu) of varying sizes ranging from 13.29M to 367.5M.The models are
pretrained with a context size of 1024 on a single GPU. The models are very
efficient, small, fast, and powerful. We have also developed an efficient most
advanced Indic tokenizer that can even tokenize unseen languages. In order to
avoid the "curse of multi-linguality" in our multilingual mParamanu model, we
pretrained on comparable corpora by typological grouping using the same script.
We performed human evaluation of our pretrained models for open end text
generation on grammar, coherence, creativity, and factuality metrics for
Bangla, Hindi, and Sanskrit. Our Bangla, Hindi, and Sanskrit models
outperformed GPT-3.5-Turbo (ChatGPT), Bloom 7B, LLaMa-2 7B, OPT 6.7B, GPT-J 6B,
GPTNeo 1.3B, GPT2-XL large language models (LLMs) by a large margin despite
being smaller in size by 66 to 20 times compared to standard 7B LLMs. To run
inference on our pretrained models, CPU is enough, and GPU is not needed. We
also instruction-tuned our pretrained Bangla, Hindi, Marathi, Tamil, and Telugu
models on 23k instructions in respective languages. Our pretrained and
instruction-tuned models which are first of its kind, most powerful efficient
small generative language models ever developed for Indic languages, and the
various results lead to the conclusion that high quality generative language
models are possible without high amount of compute power and humongous number
of parameters. We plan to release our models at https://www.bharatgpts.com.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要