Working with OCR
In Person
Optical character recognition or optical character reader (OCR) is a technique to convert images of text (typically printed text, but also typed and handwritten text) into machine-encoded text.
Starting from a scan or a photo of a piece of text, OCR technologies will allow you to generate a digitally searchable version of the text.
This two-class course will give you a step-by-step introduction to this technique. During the first class, we are going to look at the principles of OCR with a focus on good practices and possible pitfalls. The second class will be more hands-on and we are going to demo how to process images of text both with Python and with R.
This is an intermediate-level course and some elementary knowledge of coding will be needed. Attendees do not need to know both R and Python but some basic understanding of coding will be required in order to follow the course. No previous knowledge of OCR is required.
Those who have registered to take part will receive an email with full details on how to get ready for this course.
After taking part in this event, you may decide that you need some further help in applying what you have learnt to your research. If so, you can book a Data Surgery meeting with one of our training fellows.
More details about Data Surgeries.
If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event.
If you're interested in other training on digitised documents, have a look at the following:
- Silent Disco. Cleaning OCR’d Data with Regex
- Silent Disco: Linked Open Data
- Can you just Digitise? Introduction to digitised documents
- Working on digitised Manuscripts with Transkribus
Return to the Training Homepage to see other available events.
Digital Scholarship Centre
Digital Scholarship Centre, 6th floor
Main Library
University of Edinburgh
Edinburgh EH8 9LJ