Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS(2023)

引用 0|浏览15
暂无评分
摘要
In this paper, we introduce s2s-ft , a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.
更多
查看译文
关键词
Pretrained models,Transformer,Natural language generation,Document summarization,Question generation,Multilingual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要