From Images to Text: Working with OCR

Book Now
BOOK NOW
The background shows handwritten text overlaid on stacked paper. A large teal ampersand featuring an illustration of Ada Lovelace is placed on the left. The logo of the Centre for Data, Culture & Society (DCS) appears in the top right corner.

In Person 

Do you need to analyse text in your research, but only have images or scanned PDFs to work with? This 2-session course is an introduction to extracting text from images using Optical Character Recognition (OCR) techniques.  

In the course, we will give you an overview of text extraction, demonstrate how OCR works in practice, using free, ready-to-use tools that are widely available. In addition, we will discuss potential challenges you might face performing OCR and explore ways to address these challenges using R and Python OCR modules. Participants will leave the workshop with skills to use OCR in their own projects. 

 

This course will be taught by Joy Lan. 

After taking part in this event, you may decide that you need some further help in applying what you have learnt to your research. If so, you can book a Data Surgery meeting with one of our training fellows.  

More details about Data Surgeries.  

Those who have registered to take part will receive an email with full details on how to get ready for this course.  

If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event.  

 

Level   

This is a beginner-friendly course. No previous knowledge of the topic is required, and the trainer will cover the basics of the method.  

If you are new to R and Python, do not worry, we have created an easy-to-use link that will allow you to visualise and play around with the code for this course. If you want to learn the basics of Python or R, you can take our introductory courses Getting Started with Python for Research and Getting Started with R for Research.

 

Learning Outcomes  

  • Demonstrate an understanding of how OCR works and its key features
  • Apply ready-to-use OCR tools to extract text from images and PDFs
  • Recognise potential challenges in OCR projects and employ appropriate strategies to address them

     

Skills   

By attending this course, you will familiarise yourself with the following skills  

  • Extracting text from images and scanned PDFs
  • Preparing and cleaning OCR data for analysis
  • Using free, out-of-the-box OCR tools for research
  • Perform basic image/file OCR processing with R or Python 

 

Explore More Training

 

Return to the Training Homepage to see other available events

  

 

 

 

Room 4.35, Edinburgh Futures Institute

This room is on Level 4, in the North East side of the building.

When you enter via the level 2 East entrance on Middle Meadow Walk, the room will be on the 4th floor straight ahead.

When you enter via the level 2 North entrance on Lauriston Place underneath the clock tower, the room will be on the 4th floor to your left.

When you enter via the level 0 South entrance on Porters Walk (opposite Tribe Yoga), the room will be on the 4th floor to your right.

You might be interested in

image of head

CDCS Digital Research Prizes Award Ceremony

image of people drinking coffee

CDCS May Fika

An illustrative collage with & symbol and old graphs

Getting Started with Regression in R

An illustrative collage with & symbol and an old photograph

Explainable Machine Learning (XAI)

An illustrative collage with & symbol and a historical item

Getting Started with Bayesian Statistics

An illustrative collage with & symbol and an old photograph

Building Personal and Project Websites

An illustrative collage with & symbol and a maths graph

Linear Mixed Effects Modelling