Interpreting Pretrained Source-code Models using Neuron Redundancy Analyses

CoRR(2023)

引用 0|浏览13
暂无评分
摘要
Neural code intelligence models continue to be 'black boxes' to the human programmer. This opacity limits their application towards code intelligence tasks, particularly for applications like vulnerability detection where a model's reliance on spurious correlations can be safety-critical. We introduce a neuron-level approach to interpretability of neural code intelligence models which eliminates redundancy due to highly similar or task-irrelevant neurons within these networks. We evaluate the remaining important neurons using probing classifiers which are often used to ascertain whether certain properties have been encoded within the latent representations of neural intelligence models. However, probing accuracies may be artificially inflated due to repetitive and deterministic nature of tokens in code datasets. Therefore, we adapt the selectivity metric originally introduced in NLP to account for probe memorization, to formulate our source-code probing tasks. Through our neuron analysis, we find that more than 95\% of the neurons are redundant wrt. our code intelligence tasks and can be eliminated without significant loss in accuracy. We further trace individual and subsets of important neurons to specific code properties which could be used to influence model predictions. We demonstrate that it is possible to identify 'number' neurons, 'string' neurons, and higher level 'text' neurons which are responsible for specific code properties. This could potentially be used to modify neurons responsible for predictions based on incorrect signals. Additionally, the distribution and concentration of the important neurons within different source code embeddings can be used as measures of task complexity, to compare source-code embeddings and guide training choices for transfer learning over similar tasks.
更多
查看译文
关键词
neuron redundancy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要