A Novel Pipeline for Domain Detection and Selecting In-domain Sentences in Machine Translation Systems
computational linguistics in the netherlands(2021)
摘要
General-domain corpora are becoming increasingly available for Machine Translation
(MT) systems. However, using those that cover the same or comparable domains
allow achieving high translation quality of domain-specific MT. It is often the case that
domain-specific corpora are scarce and cannot be used in isolation to effectively train
(domain-specific) MT systems. This work aims to improve in-domain MT by (i) a novel unsupervised pipeline for
identifying distributions of different domains within a corpus and (ii) a data selection
technique that leverages in-domain monolingual or parallel data to select
domain-specific sentences from general corpora according to the distribution defined
in (i).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要