Working on Digitised Manuscripts with Transkribus
This course will explain and demonstrate the Handwritten Text Recognition (HTR) platform Transkribus, a popular tool since its release for making historical documents more readable and accessible. Currently, Transkribus has over 1,700 regular users, representing 80 institutions, and is regularly utilised in crowdsourcing projects on a range of collections. These two sessions will ensure that pitfalls in using automatic transcription are avoided and untethered creativity can emerge in your work without error concerns.
The first part will cover how HTR technologies have served to fill the gaps left by Optical Character Recognition (OCR) and how they differ as tools. This session will focus on how to upload documents to Transkribus; how to identify text zones and section off images from text, how to segment these text blocks into lines to be transcribed, how work with distorted or broken text to accurately segment the material, and how to transcribe some written material by hand in the transcription panel underneath, including difficult characters and abbreviations.
In the second session, we will use the material we created in the first session to train and run an HTR model to automatically produce transcripts. We will go through the process of achieving a suitable Character Error Rate (CER), how to improve CER and the model, and tips for manually correcting the output. Other powerful functions of Transkribus will be highlighted, such as the keyword spotting tool, and further resources will be provided.
Transkribus is a community project at heart and any transcriptions made, even a few pages, furthers their effort in producing accurate software. Engaging in these sessions will not only introduce you to an essential transcription tool but enable others to improve their own projects on the back of your effort also.
This is a beginner level workshop. No previous knowledge on the topic is required but if you want to familiarise yourself with other techniques of automatic character recognition you can have a look at Image to Tech: Introduction to Text Extraction.
Those who have registered to take part will receive an email with full details on how to get ready for the course.
After taking part in this event, you may decide that you need some further help in applying what you have learnt to your research. If so, you can book a Data Surgery meeting with one of our training fellows.
If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event.
Digital Scholarship Centre
Digital Scholarship Centre, 6th floor
University of Edinburgh
Edinburgh EH8 9LJ