ProQuest Newspaper Data now available for Text Mining Research

The Centre for Data, Culture & Society is delighted to have been able to purchase a number of large ProQuest newspaper datasets to enable machine processing by researchers at the University of Edinburgh.

women reading newspapers

In recent decades, many large newspaper collections around the world have been digitised. They are fantastic resources for scholars across disciplines, revealing how events were perceived as they unfolded and providing multiple viewpoints on an issue. They also allow researchers to trace the historical development of a subject through time, and across geographical areas.  While University of Edinburgh researchers have been able to access newspaper databases and read articles via publisher web interfaces, building up a collection of the data behind those interfaces will allow them to work at scale, querying multiple publications and analysing tens of thousands of articles for a single project.

The Centre for Data, Culture and Society can currently provide access to the following titles, and welcomes enquiries from any University of Edinburgh researcher who is interested in working with these collections.

From ProQuest:

  • The Scotsman  (1817-1933) 
  • The Guardian/Observer  (1821-1912) 
  • The Washington Post (1877-1938) 
  • The New York Times (1851-1938) 

In addition we can provide access to:

  • British Library Newpapers 

1TB of digitised British newspapers from the 18th Century to the early 20th Century

  • Papers Past: New Zealand and Pacific Newspapers 

Over 5 million pages of New Zealand and Pacific newspapers from the 19th and 20th Centuries

  • National Library of Australia Newspapers

Article text from a wide range of Australian papers, mostly C 20th  but some as early at 1830s

  • Chronicling North America 

Papers from the Library of Congress and partner organisations containing newspapers printed in the United States, primarily from 1836 to 1922

The ProQuest purchase was supported by the Data Driven Innovation Initiative, and was bought as part of a project funded by the DDI’s ‘Building Back Better’ open funding programme, helping to transform the City region into the data capital of Europe. The 'Building Back Better' programme was supported by Scottish Funding Council Covid-19 Recovery funding to the University of Edinburgh.

Find out about other CDCS Datasets