OBC306 - A Large-Scale Oracle Bone Character Recognition Dataset.

ICDAR(2019)

引用 22|浏览12
暂无评分
摘要
The oracle bone script from ancient China is among the worldu0027s most famous ancient writing systems. Identifying and deciphering oracle bone scripts is one of the most important topics in oracle bone study and requires a deep familiarity with the culture of ancient China. This task remains very challenging for two reasons. The first is that it is executed mainly by humans and requires a high level of experience, aptitude, and commitment. The second is due to the scarcity of domain-specific data, which hinders the advancement of automatic recognition research. A collection of well-labeled oracle-bone data is necessary to bridge the oracle bone and information processing fields; however, such a dataset has not yet been presented. Hence, in this paper, we construct a new large-scale dataset of oracle bone characters called OBC306. We also present the standard deep convolutional neural network-based evaluation for this dataset to serve as a benchmark. Through statistical and visual analyses, we describe the inherent difficulties of oracle bone recognition and propose future challenges for and extensions of oracle bone study using information processing. This dataset contains more than 300,000 character-level samples cropped from oracle-bone rubbings or images. It covers 306 glyph classes and is the largest existing raw oracle-bone character set, to the best of our knowledge. It is anticipated the publication of this dataset will facilitate the development of oracle bone research and lead to optimal algorithmic solutions.
更多
查看译文
关键词
oracle bone script,dataset,character-level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要