How Much Do Synthetic Datasets Matter in Handwritten Text Recognition?

ICONIP(2021)

引用 0|浏览2
暂无评分
摘要
This paper explores synthetic image generators in dataset preparation to train models that allow human handwritten character recognition. We examined the most popular deep neural network architectures and presented a method based on autoencoder architecture and a schematic character generator. As a comparative model, we used a classifier trained on the whole NIST set of handwritten letters from the Latin alphabet. Our experiments showed that the 80% synthetic images in the training dataset achieved very high model accuracy, almost the same level as the 100% handwritten images in the training dataset. Our results prove that we can reduce the costs of creating, gathering, and describing human handwritten datasets five times over – with only a 5% loss in accuracy. Our method appears to be beneficial for a part of the training process and avoids unnecessary manual annotation work.
更多
查看译文
关键词
Handwritten text,Pattern recognition,Image processing,Data augmentation,Synthetic dataset,Deep learning,Autoencoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要