Explore Unstructured Data: The Secret World of XML

29 Nov 2024, 10:00 – 12:00

Book Now

In Person

eXtensible Markup Language (XML) is one of the secret ingredients of modern computing infrastructure. Despite being little-known, it underpins all sorts of critical digital infrastructure. A word document or Excel spreadsheet is in fact constructed with XML data. SharePoint is, at its core, an XML data platform. All sorts of mainstream applications utilise XML in various shapes or forms.

In Digital Scholarship, XML is used in a vast range of fields, for example text transcription (TEI-XML), mathematics notation (MathML), metadata in various guises such as bibliographic and archival catalogues (Dublin Core, Bib-XML etc, MODS/METS). It is, in short, an essential bit of digital apparatus with a wide range of roles and potential uses.

Understand XML and you will gain access to a niche but very potent digital framework.

The session will provide an overview of XML data structures as well as a background of the technology. The course will also offer a brief introduction into constructing XML data sets.

Secondarily we will provide an introduction into XPath; the mechanic that is used to navigate the graph structure of XML. And a taster of the various technologies that can utilise XML data. In this case XSLT (eXtensible Stylesheet Language Transformations).

This is an intermediate-level workshop. In order to undertake this session, it is recommended that you have a basic understanding of Python and the Python command line. You can build up familiarity with Python by attending our Introduction to Programming with Python course.

A background and some familiarity with HTML would also be useful as it shares many principles with XML. Some aspects of working with HTLM will be covered in the course Build Your Personal or Project Website with GitHub Pages. If you want to familiarise yourself with HTML you can have a look at the W3School tutorials.

XML has a utility for anyone interested in the following:

Text analysis, text mining, text production (transcriptions, scholarly editions, including non-extant languages).
Non-relational data structures.
Archival or collection metadata.
Semantic web technologies.
Modern computing infrastructure.

This workshop will be taught by Ed MacKenzie.

Those who have registered to take part will receive an email with full details on how to get ready for this workshop.

If you’re new to this training event format, or to CDCS training events in general, read more on what to expect from CDCS training. Here you will also find details of our cancellation and no-show policy, which applies to this event.

If you are interested in other training on working with unstructured data, you can have a look at the following:

Return to the Training Homepage to see other available events.

Room 4.35, Edinburgh Futures Institute

This room is on Level 4, in the North East side of the building.

When you enter via the level 2 East entrance on Middle Meadow Walk, the room will be on the 4th floor straight ahead.

When you enter via the level 2 North entrance on Lauriston Place underneath the clock tower, the room will be on the 4th floor to your left.

When you enter via the level 0 South entrance on Porters Walk (opposite Tribe Yoga), the room will be on the 4th floor to your right.