Transcribing Medieval Manuscripts for Machine Learning

arxiv(2023)

引用 0|浏览12
暂无评分
摘要
In the early twentieth century, many scholars focused on the preparation of editions and translations of texts previously available only to the few specialists able to read archaic hands and privileged enough to travel to work in person with them in manuscript. Valuable scholarship in its own right, the preparation of these editions and translations for particular texts deemed important enough to justify the effort and time, laid the foundation for generations of scholarship in medieval studies. On the other hand, for many materials in historical archival collections, including already digitised collections, medievalists have only had the time to create partial transcriptions, if any at all. Access to textual material from the medieval period has increased greatly in recent years with digitisation, and we are able to imagine many new research projects in decades to come. What challenges do new frontiers of automation in the archives raise with respect to medieval studies and in particular to the ways we transcribe? In this article, we argue that if medievalists hope to pursue the kinds of analysis that goes on in advanced computational research, we will need new kinds of transcriptions, intentionally theorized not only for human reading, but also for machine processing. We already have mature methods for remediating generations of editions of medieval works such as Optical Character Recognition (OCR), but we can ask ourselves if these are the kinds of text we want to use for future computational analysis. We suggest instead that one way forward is by going back to the scriptorium.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要