From image to text: experience of an Optical Character Recognition Intern


Since April, PhD student Ash Charlton has been an Intern with the University of Edinburgh’s Cultural Heritage Digitisation Service (CHDS) and the Centre for Data, Culture and Society (CDCS), looking into text extraction processes at the University through Optical Character Recognition (OCR).

This internship saw Ash working with the CHDS and examining their past and current uses of OCR software, and what their solutions may be for the future in the Library and University Collections (L&UC). As part of her project Ash has also designed a workshop on text extraction for the Centre, focusing on an introducing text recognition and its history, important considerations during the process, and how you can convert images of text to machine-readable text yourself.

“My internship with CDCS and CHDS has been a fantastic opportunity to work directly with teams in the cultural heritage sector as those creating the text, and on the digital scholarship teaching and learning side, in looking at how we engage with the texts, or carry out text extraction ourselves as researchers.” - Ash Charlton

This internship builds on previous work with CDCS to create a training pathway for Managing Digitised documents which can be found here. Through internships like these CDCS is proud to support our community in gaining new skills, exploring how digital methods can be applied in real world contexts and sharing knowledge across the University and beyond. You can find out more about past internships on our blog. CDCS also offers support in many other forms, through funding opportunities like our bursary. We offer project support from expert members of the team and helped to develop a number of projects through our sandpit. We also relish opportunities to collaborate and aim to foster an environment where researchers can exchange ideas and support freely. If you have an idea for a collaborative internship focused on data or digital methods, drop us a line!

Read all about Ash’s experience at the CDCS blog here.