A Large Spanish-Catalan Parallel Corpus Release for Machine Translation.

COMPUTING AND INFORMATICS(2014)

引用 23|浏览39
暂无评分
摘要
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.
更多
查看译文
关键词
Catalan-Spanish parallel corpus,machine translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要