M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览3
暂无评分
摘要
Speech and language models are advancing towards universality. A single model can now handle translations across 200 languages and transcriptions for over 100 languages. Universal models simplify development, deployment, and importantly, transfer knowledge to less-resourced languages or modes. This paper introduces M2BART, a streamlined multilingual and multimodal framework for encoderdecoder models. It employs a self-supervised speech tokenizer, bridging speech and text, and is pre-trained with a unified objective for both unimodal and multimodal, unsupervised and supervised data. When tested on Spanish-to-English and English-to-Hokkien translations, M2BART consistently surpassed competitors. We also showcase an innovative translation model enabling zero-shot transfers even without labeled data.
更多
查看译文
关键词
Speech translation,multimodal,pre-training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要