Beyond 2022 releases the Transkribus English language model for handwritten text Recognition created by Dr David Brown.

(This blog first appeared on the readcoop.eu website, 30 June 2021)

‘B2022 English M4

English with some Latin characters.  By the Beyond 2022 project,
Trinity College Dublin. Early 17th to late 19th century, 40 hands
using Transkribus base model and 750,000 word ground truth. The model
is particularly effective on secretary and copperplate texts and can
be used as a base model for more cursive scripts.’


As part of our commemorations to mark the 99th anniversary of the
destruction of the Public Record Office of Ireland on 30 June 1922,
Beyond 2022 is today making publicly available our ground truth that
enables the automatic transcription of handwritten historical texts.


A ground truth, in this context, is a collection of images of
handwriting samples, with their corresponding transcriptions, taken
from many styles of document across a range of media. The images and
transcriptions are used in Transkribus, a machine learning system
developed by the University of Innsbruck, to train a computer to
recognise and transcribe a broad range of historical documents in
English, 1600-1900.


A 750,000 word ground truth for early modern text was developed by
Beyond 2022s archival discovery team, and this has been combined with
a 1 million word corpus of 19th century text, provided by Transkribus,
to create an accurate and flexible ‘model’ that will automatically
recognise and transcribe most early official documents.


Beyond 2022 has been working with the READ consortium, a project
funded by the European Research Council to develop the Transkribus
platform, since 2018. In 2020, The Library of Trinity College Dublin
helped to sustain this project by joining the READ cooperative, the
research COOPerative that maintains and further develops Transkribus,
and providing vital financial support as READ transitioned from a
European Research Council funded project to its present incarnation.


Transkribus can be downloaded from https://readcoop.eu/. Once you are
set up and have uploaded your scanned images, you can select our
model, B2022 English M4, in the RUN HTR menu on the application.