BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97(2019)

引用 271|浏览257
暂无评分
摘要
Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or 'projected attention layers', we match the performance of separately fine-tuned models on the GLUE benchmark with approximate to 7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要