The Adaptability of a Transformer-Based OCR Model for Historical Documents.

ICDAR Workshops (1)(2023)

引用 0|浏览4
暂无评分
摘要
We tested the capabilities of Transformer-based text recognition technology when dealing with (multilingual) real-world datasets. This is a crucial aspect for libraries and archives that must digitise various sources. The digitisation process cannot rely solely on manual transcription due to the complexity and diversity of historical materials. Therefore, text recognition models must be able to adapt to various printed texts and manuscripts, especially regarding different handwriting styles. Our findings demonstrate that Transformer-based models can recognise text from printed and handwritten documents, even in multilingual environments. These models require minimal training data and are a suitable solution for digitising libraries and archives. However, it is essential to note that the quality of the recognised text can be affected by the handwriting style.
更多
查看译文
关键词
ocr model,transformer-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要