Is Open Data Really Open? Web-scraping the Hansard for 'UK Parliament Discourse on Immigration (2007-2024)'

Over the past two decades, a series of sociopolitical events have led to the increased displacement of people, contributing to political rhetoric about immigration. This phenomenon can be observed through the recent rise in right-wing and populist parties that often characterise immigrants and asylum seekers as a threat to society. In his project 'UK Parliament Discourse on Immigration (2007-2024) in a Time of Populist and Post-Truth Politics' Dr Kenneth Fordyce, Co-Head of the Institute for Language Education and Lecturer in Language Education at Moray House, has been examining shifts in UK discourse on immigration and asylum in this context.

Dr Fordyce has been working with the CDCS team to collect a dataset of full-text parliamentary transcripts from debates on immigration in the House of Commons. Although this information is considered public in the UK, the Hansard database, where these records are stored, did not provide a clear pathway for downloading full-text transcripts. We developed a web scraping pipeline that resulted in successfully acquiring the data. We then extracted relevant metadata for each debate, including the speaker’s identity and political party, which we stored in an accessible and portable spreadsheet format. Dr Fordyce will be analysing the dataset using text analysis software as part of a pilot study for a planned funding bid exploring political discourse about immigration and asylum in political discourse across a range of European contexts.

"Working with the Centre for Data, Culture & Society has led to a major breakthrough in my research on political discourse. [..] I hope very much that I will be able to continue to work with Lucia, Jessica and CDCS in the future to build on this first stage and potentially develop tools for analysing political discourse which have wider applicability." - Dr Ken Fordyce

The team will be presenting a paper ‘Is Open Data Really Open? The Hansard Parliamentary Data Case Study’ based on this work at the Alliance of Digital Humanities Organizations DH2025 Conference in Portugal this summer.