Silent Disco: Introduction to LDA Topic Modelling

17 Mar 2026, 14:00 – 16:00

Book Now

https://www.events.ed.ac.uk/index.cfm?event=book&scheduleID=82675

A sepia photograph of people working at desks in a large hall with overhead lamps. A large green ampersand featuring an illustration of Ada Lovelace is placed on the left. The logo of the Centre for Data, Culture & Society (DCS) appears in the top right corner.

Online

In this Silent Disco session, we focus on topic modelling using the Latent Dirichlet Allocation (LDA) method, a widely used unsupervised probabilistic model for uncovering thematic patterns in text. LDA assumes that documents are mixtures of latent topics and that each topic is characterised by a distribution over words. This allows the model to assign probabilistic weights indicating how strongly particular words or documents are linked to specific themes. There are a number of use cases where data-driven methodologies like LDA for assigning themes/topics can add an advantageous feature in conjunction with other approaches, particularly for cases where corpus or document data sizes are large or ambiguous.

In this silent disco session, participants will follow a guided script that walks them through the core steps of LDA-based topic modelling. The session begins with essential text-preprocessing routines, moves on to approaches for selecting an appropriate number of topics, and then introduces the application of LDA to extract themes from a corpus. The script concludes with guidance on how to read, interpret, and critically assess the resulting topics. By the end, participants will have worked through the full workflow independently, at their own pace, while gaining a clearer sense of how LDA differs from other text-analysis techniques and what it can—and cannot—reveal about textual data.

This course will be taught by Aybuke Atalay and Somya Iqbal.

After taking part in this event, you may decide that you need some further help in applying what you have learnt to your research. If so, you can book a Data Surgery meeting with one of our training fellows.

More details about Data Surgeries.

Those who have registered to take part will receive an email with full details on how to get ready for this course.

If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event.

Level

This workshop requires the following pre-knowledge:

Participants should be familiar with the basics of R/Python and working with these on platforms such as Jupyter Notebook/Google Colab or the University service Noteable

Learning Outcomes

Understand the core principles of LDA topic modelling
Apply the key steps of text pre-processing and implement an LDA model on a given corpus
Interpret topic outputs and assess model choices, including the selection of an appropriate number of topics.

Skills

An ability to pre-process data for LDA analysis
Application of LDA in a programmatic workflow (either R/Python)
Competently interpret outputs from LDA-based analyses

Silent Disco: Introduction to LDA Topic Modelling

Online

Level

Learning Outcomes

Skills

Explore More Training

Return to the Training Homepage to see other available events