Segmentation of Large Historical Manuscript Bundles into Multi-page Deeds.

IbPRIA(2023)

Cited 0|Views2
No score
Abstract
Archives around the world have vast uncatalogued series of image bundles of digitized historical manuscripts containing, among others, notarial records also known as “deeds” or “acts”. One of the first steps to provide metadata which describe the contents of those bundles is to segment these bundles into their individual deeds. Even if deeds are page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps: first, the probabilities that each bundle image is an “initial”, “middle” or “final” page of a deed are estimated, and then an optimal sequence of page labels is computed at the whole bundle level. Empirical results are reported which show that this approach achieves almost perfect segmentation of bundles of a massive Spanish series of historical notarial records.
More
Translated text
Key words
historical manuscript bundles,segmentation,multi-page
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined