M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation

Peng-Jen Chen,Bowen Shi, Kelvin Niu,Ann Lee,Wei-Ning Hsu

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览3

暂无评分

摘要

Speech and language models are advancing towards universality. A single model can now handle translations across 200 languages and transcriptions for over 100 languages. Universal models simplify development, deployment, and importantly, transfer knowledge to less-resourced languages or modes. This paper introduces M2BART, a streamlined multilingual and multimodal framework for encoderdecoder models. It employs a self-supervised speech tokenizer, bridging speech and text, and is pre-trained with a unified objective for both unimodal and multimodal, unsupervised and supervised data. When tested on Spanish-to-English and English-to-Hokkien translations, M2BART consistently surpassed competitors. We also showcase an innovative translation model enabling zero-shot transfers even without labeled data.

查看译文

关键词

Speech translation,multimodal,pre-training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要