Silent Disco: Introduction to LDA Topic Modelling

Online
In this Silent Disco session, we focus on topic modelling using the Latent Dirichlet Allocation (LDA) method, a widely used unsupervised probabilistic model for uncovering thematic patterns in text. LDA assumes that documents are mixtures of latent topics and that each topic is characterised by a distribution over words. This allows the model to assign probabilistic weights indicating how strongly particular words or documents are linked to specific themes. There are a number of use cases where data-driven methodologies like LDA for assigning themes/topics can add an advantageous feature in conjunction with other approaches, particularly for cases where corpus or document data sizes are large or ambiguous.
In this silent disco session, participants will follow a guided script that walks them through the core steps of LDA-based topic modelling. The session begins with essential text-preprocessing routines, moves on to approaches for selecting an appropriate number of topics, and then introduces the application of LDA to extract themes from a corpus. The script concludes with guidance on how to read, interpret, and critically assess the resulting topics. By the end, participants will have worked through the full workflow independently, at their own pace, while gaining a clearer sense of how LDA differs from other text-analysis techniques and what it can—and cannot—reveal about textual data.
This course will be taught by Aybuke Atalay and Somya Iqbal.
After taking part in this event, you may decide that you need some further help in applying what you have learnt to your research. If so, you can book a Data Surgery meeting with one of our training fellows.
More details about Data Surgeries.
Those who have registered to take part will receive an email with full details on how to get ready for this course.
If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event.
Level
This workshop requires the following pre-knowledge:
- Participants should be familiar with the basics of R/Python and working with these on platforms such as Jupyter Notebook/Google Colab or the University service Noteable
Learning Outcomes
- Understand the core principles of LDA topic modelling
- Apply the key steps of text pre-processing and implement an LDA model on a given corpus
- Interpret topic outputs and assess model choices, including the selection of an appropriate number of topics.
Skills
- An ability to pre-process data for LDA analysis
Application of LDA in a programmatic workflow (either R/Python)
Competently interpret outputs from LDA-based analyses
Explore More Training
- Comparing Sentiment Analysis Models in R
- Digital Method of the Month: Text Analysis
- Getting Started with Text Analysis with Python
- Modelling Unstructured Data with Bert
- Silent Disco: Understanding and Creating Word Embeddings
Return to the Training Homepage to see other available events












