🔬 About Project

The overall goal of the present project is to develop a cost- and time-effective solution for the transcription, integration, and subsequent usage of historical population sources with a regular tabular structure. The key element in this endeavour is the development of a sustainable and robust instrument capable of providing automatic handwritten text recognition (HTR) of parish records and censuses from the modern era. The methodological underpinnings of the project will adhere to the best practices in the field, upholding the standards of reproducibility, transparency, and accessibility. Machine learning algorithms enabling HTR of Romanian, Hungarian, German, and Latin tabular documents written in either Latin, Cyrillic or Kurrentschrift will be trained within the framework of extant commercially developed tools (Transkribus), while customized proprietary solutions will be developed in parallel, for certain specific aspects which go beyond the capabilities of current instruments (i.e. vertical text, untypical overlaps between text and columns, etc.). The present project will demonstrate measurable impact foremost in the fields of historical demography and digital humanities, by providing standards, best practices, and typical workflows for HTR for historical population sources. It will also redefine the parameters of public engagement for such scholarly endeavours by providing a publicly accessible, transparent, and highly advanced platform for HTR.

The project is highly interdisciplinary, with scholars in the humanities and computer scientists working together in the emerging discipline of Digital Humanities. The tool constructed for handwriting recognition will facilitate the transcription of various documents and their integration, for example allowing the incorporation of consistent original demographic information in HPDT, covering a high number of localities across Transylvania, but also integrating economic, financial and administrative decisions that will better explain the realities in the past. The solution developed for automatic transcription of vital records will be open access. More than that, the project involves the construction of a customized and highly functional HTR for specific tabular structure documents such as parish records. This can then be further adapted to fit the necessities of semi- or non-structured documents, and therefore employed in other institutional milieus such as archives or libraries, where extensive document collections may become rapidly available to a much wider audience than before.

🎯 Objectives

Scroll to Top