Chrome Extension
WeChat Mini Program
Use on ChatGLM

An Optical Character Recognition Post-processing Method for technical documents

Anais Estendidos da XXXVI Conference on Graphics, Patterns and Images (SIBRAPI Estendido 2023)(2023)

Cited 0|Views4
No score
Abstract
Methods for correcting errors generated by Optical Character Recognition (OCR) system are being developed for a long time, with interesting results in their applications. However, these methods tend to work only on data with words that are part of an existing language and with a large semantic relationship between each word in the text. In this work, an error correction method is proposed that focuses on types of documents without these large semantic relationships inside their text. Instead, we focus on sparse text that tends to have little semantic relationship between the words found within itself. The proposed method uses machine learning to train classifiers capable of finding errors in the OCR output and run an isolated execution of the OCR system to fix the error. The final results indicate a good accuracy of 88.24% for error detection and an improvement of the character error rate (CER) of 14.2%.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined