Extracting Descriptive Words from Untranscribed Handwritten Images.

Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA)(2022)

引用 1|浏览6
暂无评分
摘要
Extracting descriptive text from manuscripts to be included in the manuscript metadata is an important task that is generally performed in archives and libraries by experts with a wealth of knowledge on the manuscripts contents. Unfortunately, many manuscript collections are so vast that it is not feasible to rely solely on experts to perform this task. To our knowledge, this is the first work aiming at automatic extraction of descriptive text from untranscribed text images. To attempt dealing with such a task, a first step would be to transcribe the handwritten images into text - but achieving sufficiently accurate transcripts is generally unfeasible for large sets of historical manuscripts. We propose new approaches to automatically extract descriptive words which do not rely on any explicit image transcripts. They are based on "probabilistic indexing" , a relatively novel technology which allows to effectively represent the intrinsic word-level uncertainty generally exhibited by handwritten text images. We assess the performance of this approach on samples of a large collection of complex manuscripts from the Spanish Archivo General de Indias. Since no standard metrics exist for the novel task considered in this work, we propose two new evaluation measures which aim at measuring the quality of the detected descriptive words in terms close to practical usage of these words. Using these metrics we report promising preliminary results.
更多
查看译文
关键词
Descriptive words,Content-based image retrieval,Historical manuscripts
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要