Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
CoRR(2024)
摘要
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal
language models trained from scratch by Reka. Reka models are able to process
and reason with text, images, video, and audio inputs. This technical report
discusses details of training some of these models and provides comprehensive
evaluation results. We show that Reka Edge and Reka Flash are not only
state-of-the-art but also outperform many much larger models, delivering
outsized values for their respective compute class. Meanwhile, our most capable
and largest model, Reka Core, approaches the best frontier models on both
automatic evaluations and blind human evaluations. On image question answering
benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V.
Meanwhile, on multimodal chat, Core ranks as the second most preferred model
under a blind third-party human evaluation setup, outperforming other models
such as Claude 3 Opus. On text benchmarks, Core not only performs competitively
to other frontier models on a set of well-established benchmarks (e.g. MMLU,
GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question
answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped
in production at http://chat.reka.ai . A showcase of non cherry picked
qualitative examples can also be found at http://showcase.reka.ai .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要