My doctoral research focuses on computational linguistic analysis of the Syrohexaplaric version of Ben Sira. While the transcription of the complete text (50 chapters from the 8th-century Codex Ambrosianus) has already been almost completed manually, I recently conducted a side-project evaluating Handwritten Text Recognition (HTR) tools for Syriac Estrangelo manuscripts. The question was simple: could HTR technology offer a viable alternative for similar future projects, potentially saving the many hours of tedious work that manual transcription typically requires?

I evaluated two major HTR platforms: Transkribus, using the Vienna Syriac New Testament model, and Kraken, using the Sophro Mhiro models developed through the 2023 Princeton Transcribathon (Beth Mardutho / Princeton University / Université PSL). The test corpus was Ben Sira 12:18–14:17, which I carefully transcribed as ground truth (2,851 characters). Performance was measured using Character Error Rate (CER), where lower percentages indicate better accuracy.

At first, with standard-resolution images, the results were promising for Transkribus but quite disappointing for Kraken. Transkribus achieved 40.83% CER after manual adjustments, while Kraken reached 71.94% CER, which is too high to be really useful for research. I also tested general-purpose Vision LLMs (GPT-4 and Claude to be precise), but they proved entirely unsuitable, reaching 77-82% CER with severe hallucinations: mixing multiple writing systems, inventing theological content, and producing outputs that bore little resemblance to the actual manuscript text. At the current stage, these hallucinations alone disqualify these tools for academic purposes.

The breakthrough came when Constantijn Sikkel generously advised and provided high-resolution images extracted from the Ceriani Syrohexaplaric edition. Retesting both tools with these images produced transformative results. Transkribus’s error rate dropped from 40.83% to 21.71%. But this improvement came with an important warning: Transkribus’s default “reading order” algorithm failed on high-resolution images of multi-column manuscripts. Lines were recognized accurately but ordered incorrectly, producing an initially poor 54% CER despite good character recognition.

The solution turned out to be quite simple. A configuration change in Transkribus’s layout settings (switching from row-based to column-based reading order, with right-to-left direction) reduced errors by 31 percentage points. This single parameter adjustment proved as impactful as the choice of HTR tool itself, a finding that suggests published HTR comparisons may need to document configuration details more carefully.

Most striking, however, was Kraken’s performance with high-resolution images. The open-source tool went from 71.94% CER to 28.10% CER. High-resolution imaging transformed Kraken from an unusable tool into a genuine alternative, now only 6 percentage points behind Transkribus. This is particularly significant for projects with limited budgets or requiring complete data sovereignty.

The evaluation also revealed some important pitfalls to avoid. JPEG format causes a 27-point degradation with Transkribus compared to PNG. Row-based layout algorithms fail on multi-column manuscripts in high resolution. And Vision LLMs, despite their impressive capabilities in other domains, remain fundamentally unsuited for precise manuscript transcription. In the face of a difficult and low resource language manuscript, they tend to hallucinate rather than accurately transcribe.

 

Summary of Results

The following table summarizes the key findings for practitioners working with Syriac manuscripts:

Configuration CER Best suited for
Transkribus HD (PNG, column-based layout) 21.71% Maximum accuracy
Kraken HD (PNG, 1-column model) 28.10% Good results and data sovereignty as it can be run locally
Transkribus Standard image quality (PNG) 40.83% If high-resolution images are unavailable
Kraken Standard image quality (PNG) 71.94% Not recommended
Vision LLMs (GPT-4, Claude) 77-82% Not suitable (hallucination issues)

 

For future Syriac manuscript projects, a fully automated Transkribus HD workflow could significantly reduce manual transcription work. Projects with tighter budgets, or with a need to have complete control over the process, can also consider Kraken as a viable free alternative, accepting a modest increase in error rate (roughly 28% versus 22%) in exchange for zero cost and technical autonomy.

This research was conducted as part of my joint doctorate (cotutelle) between Université de Lorraine and VU Amsterdam, supervised by Prof. Claire Placial, Prof. Frédérique Rey and Prof. Willem van Peursen. Special thanks to Constantijn Sikkel for providing high-resolution manuscript
images, to George Kiraz and Christine Roughan for developing and sharing the Sophro Mhiro Kraken models (Princeton Transcribathon 2023, Beth Mardutho), to Daniele Reano for guiding me through Transkribus configuration and sharing insights from the Vienna Syriac model development (HTR Winter School, Austrian Academy of Sciences), to Ephrem Aboud Ishac for the introduction to the Vienna HTR team, and to the ETCBC team for their support and feedback. Full technical documentation of this evaluation, including all test results and reproducible workflows, is available upon request.

Matthias Benabdellah is a doctoral researcher at the Eep Talstra Centre for Bible and Computer (ETCBC) and Université de Lorraine, working on computational linguistic analysis of the Syrohexaplaric version of Ben Sira.