A multi-granularity semantic space learning approach for cross-lingual open domain question answering

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS(2021)

引用 0|浏览7
暂无评分
摘要
Cross-lingual Open Domain Question Answering (Cross-lingual Open-QA) has been developed since it was proposed in the mid-1990s. It can be divided into two mainstream tasks according to the training corpus used in the answer extraction stage. One is that both of the training and testing data are in the target language. The other is that the training data is in the source language, and the testing data is in the target language. For a long time, the former has been studied mainly through translation based approaches. Until 2019, the latter appeared and non-translation based approaches become available thanks to multilingual BERT model. Therefore, the two tasks have been discussed separately, which encourages our work on whether it is possible to achieve these two tasks simultaneously without any additional transformation. It is observed that the existence of the multilingual BERT model makes a solution to establish a unified framework. However, there are two problems with using the multilingual BERT model directly. The one is in the document retrieval stage, directly working multilingual pretraining model for similarity calculation will result in insufficient retrieval accuracy. The other is in the answer extraction stage, the answers will involve different levels of abstraction related to retrieved documents, which needs deep exploration. This paper puts forward a multi-granularity semantic space learning based approach for cross-lingual Open-QA. It consists of the Match-Retrieval module and the Multi-granularity-Extraction module. The matching network in the retrieval module makes heuristic adjustment and expansion on the learned features to improve the retrieval quality. In the answer extraction module, the reuse of deep semantic features is realized at the network structure level through cross-layer concatenation, and it enables us to learn multi-granularity semantic space. Experimental results on two public cross-lingual Open-QA datasets show the superiority of our proposed approach over the state-of-the-art methods.
更多
查看译文
关键词
Cross-lingual open-QA,Multi-granularity semantic feature,Heuristic adjustment,Cross-layer concatenation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要