EL-CodeBert: Better Exploiting CodeBert to Support Source Code-Related Classification Tasks

Asia-Pacific Symposium on Internetware (Internetware)(2022)

引用 3|浏览5
暂无评分
摘要
With the development of deep learning and natural language processing techniques, the performance of many source code-related tasks can be improved by using pre-trained models. Of these pre-trained models, CodeBert is a bi-modal pre-trained model for programming languages and natural languages, which has been successfully used in current source code-related tasks. These previous studies mainly use the output vector of CodeBert's last layer as the code semantic representation for fine-tuning downstream source code-related tasks. However, this setting may miss the valuable representational information, which may be captured by other layers of CodeBert. To better exploit the representational information in each layer of CodeBert for fine-tuning downstream source code-related tasks, we propose an approach EL-CodeBert. Our approach first extracts the representational information in each layer of CodeBert and views them as a representational information sequence. Then our approach learns the importance of representational information in each layer through the bidirectional recurrent neural network (i.e., Bi-LSTM) and the attention mechanism. To verify the effectiveness of our proposed approach, we select four downstream source code-related classification tasks (i.e., code smell classification, code language classification, technical debt classification, and code comment classification). After compared with state-of-the-art baselines for these tasks, EL-CodeBert can achieve better performance in most performance measures. Finally, we also conduct ablation studies to verify the rationality of the component setting in our proposed approach.
更多
查看译文
关键词
Source code-related task, Pre-trained model, CodeBert, Fine-tuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要