Silent Disco: Introduction to LDA Topic Modelling

A sepia photograph of people working at desks in a large hall with overhead lamps. A large green ampersand featuring an illustration of Ada Lovelace is placed on the left. The logo of the Centre for Data, Culture & Society (DCS) appears in the top right corner.

 

Online 

In this Silent Disco session, we focus on topic modelling using the Latent Dirichlet Allocation (LDA) method, a widely used unsupervised probabilistic model for uncovering thematic patterns in text. LDA assumes that documents are mixtures of latent topics and that each topic is characterised by a distribution over words. This allows the model to assign probabilistic weights indicating how strongly particular words or documents are linked to specific themes. There are a number of use cases where data-driven methodologies like LDA for assigning themes/topics can add an advantageous feature in conjunction with other approaches, particularly for cases where corpus or document data sizes are large or ambiguous. 

In this silent disco session, participants will follow a guided script that walks them through the core steps of LDA-based topic modelling. The session begins with essential text-preprocessing routines, moves on to approaches for selecting an appropriate number of topics, and then introduces the application of LDA to extract themes from a corpus. The script concludes with guidance on how to read, interpret, and critically assess the resulting topics. By the end, participants will have worked through the full workflow independently, at their own pace, while gaining a clearer sense of how LDA differs from other text-analysis techniques and what it can—and cannot—reveal about textual data.  

 

This course will be taught by Aybuke Atalay and Somya Iqbal

 

After taking part in this event, you may decide that you need some further help in applying what you have learnt to your research. If so, you can book a Data Surgery meeting with one of our training fellows. 

More details about Data Surgeries. 

Those who have registered to take part will receive an email with full details on how to get ready for this course. 

If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event. 

 

Level  

This workshop requires the following pre-knowledge:   

  • Participants should be familiar with the basics of R/Python and working with these on platforms such as Jupyter Notebook/Google Colab or the University service Noteable

  

Learning Outcomes 

  • Understand the core principles of LDA topic modelling
  • Apply the key steps of text pre-processing and implement an LDA model on a given corpus
  • Interpret topic outputs and assess model choices, including the selection of an appropriate number of topics. 

 

Skills  

  • An ability to pre-process data for LDA analysis 
  • Application of LDA in a programmatic workflow (either R/Python) 

  • Competently interpret outputs from LDA-based analyses 

 

Explore More Training

 

Return to the Training Homepage to see other available events 

 

 

You might be interested in

UoE archive image with title of the training event

Foundations of Machine Learning

An illustrative collage with & symbol and some patterns in squares

Modelling Unstructured Data with Bert

A collage image of historical material

Beyond Social Networks: Advanced Uses of Gephi in Humanities Research

UoE archival image with training event title

Systematic Literature Review with R

An illustrative collage with & symbol and an old photograph

Building Personal and Project Websites

An illustrative collage with & symbol and a historical item

Getting Started with Bayesian Statistics

Historical UoE image with title of the event

Good Data Visualisation with R

An illustrative collage with & symbol and an old photograph

Explainable Machine Learning (XAI)

A collage image of historical material

Getting Started with Text Analysis with Python

A collage image of historical material

A Gentle Introduction to Causal Inference