TDDC - Timely Disclosure Documents Corpus.

LREC(2020)

引用 0|浏览31
暂无评分
摘要
In this paper, we describe the details of the Timely Disclosure Documents Corpus (TDDC). TDDC was manually organized by aligning the sentences from past Japanese and English timely disclosure documents in PDF format published by companies listed on the Tokyo Stock Exchange. TDDC consists of approximately 1.4 million parallel sentences in Japanese and English. TDDC was used as the official dataset for the 6th Workshop on Asian Translation to encourage the advancement of machine translation.
更多
查看译文
关键词
Parallel corpus, Machine translation, Asian language, Stock exchange, Investor Relations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要