Data Augmentation via Back-translation for Aspect Term Extraction.

IJCNN(2023)

引用 0|浏览15
暂无评分
摘要
We tackle Aspect Term Extraction (ATE), a task that automatically recognizes aspect terms conditioned on the understanding of word-level semantics. Due to the capacity of enriching linguistic phenomena for learning, data augmentation contributes to the establishment of robust ATE models. In this paper, we propose to leverage back translation to augment the training data for ATE. It is grounded on the potential advantages that the backtranslated instances generally appear as paraphrases, providing diverse pragmatic modes for learning when semantics remains unchanged. This helps to enhance ATE models in recognizing aspect terms when varied contexts and morphologically-different words occur during test. In our experiments, we apply an off-theshelf Neural Machine Translation (NMT) model for back translation, using French, Chinese and German as interlanguages, respectively. Besides, word alignment is conducted to designate aspect terms in the back-translated cases. Experimental results on SemEval benchmarks show that retraining with the augmented data produces substantial improvements, reaching up to 3.46% at best. In addition, the experiments suggest that 1) family languages are more beneficial than non-family for the aforementioned data augmentation, and 2) selective sampling produces positive effects in the low-resource settings. It is noteworthy that back translation has been explored for data augmentation in other fields, with the aim to enhance neural language modeling. Nevertheless, it hasn't yet been systematically studied towards the ATE task. Although a vest-pocket method is provided in this paper, the comprehensive analysis is conducted, including that on interlanguage selection, low-resource application, as well as compatibility with both conventional and pretrained neural models, besides that in the common comparison and ablation experiments. All the models and codes in the experiments will be made publicly available to support reproducible research.
更多
查看译文
关键词
aspect term extraction,back-translation,data augmentation,word alignment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要