Less is more: Pruning BERTweet architecture in Twitter sentiment analysis

Ricardo Moura,Jonnathan Carvalho, Alexandre Plastino,Aline Paes

INFORMATION PROCESSING & MANAGEMENT(2024)

引用 0|浏览0
暂无评分
摘要
Transformer -based models have been scaled up to account for absorbing more information and improve their performances. However, several studies have called attention to their overparametrization and the costs of experimenting with such huge models. This paper investigates the overparametrization of BERTweet, a transformer -based model trained with Twitter data, focusing on the prevalent task of tweets sentiment analysis. The paper contributes with a pruning method that reduces BERTweet size before tuning it to a downstream task. Using twenty-two datasets of tweets, the experiments evaluated several obtained pruned models, which achieved even superior performance after the finetuning procedure than when tuning the complete model. After applying the method on BERTweet, the pruned model with the best overall predictive performance was the result of pruning 47.22% of all heads (68 from 144 heads). In the generalization check, the time spent to finetune this pruned model was reduced by at least 10% while achieving the same or better predictive performance than the original model with a significance level of 0.05. The proposed pruning method can also be applied to other transformer -based models or tasks to find pruned models that perform similarly to the complete one. The execution of a straightforward version of our method yielded a highly pruned model, with a 74.31% reduction (107 out of 144 heads) while reaching high predictive performance.
更多
查看译文
关键词
Sentiment analysis,Twitter,Language model,Transformer,Model pruning,Finetuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要