Innovative knowledge graph tool unlocks new connections previously hidden in data sets

Last month (30 June, 2020) marked the 98th anniversary of the Four Courts fire during the Civil War that destroyed the Irish Public Archives and 700 years of history along with it. The Beyond 2022 team of researchers discussed their progress so far on recreating the lost archive in virtual reality.  As this is a truly interdisciplinary project that combines cutting edge technology with historical research, Dr Christophe Debruyne, ADAPT Centre Computer Scientist, and Dr Lynn Kilgallon from the Department of History at Trinity College Dublin jointly presented their collaborative approach to knowledge graphs, or the means by which information about persons and places is structured and linked.Tools like these may help historians in analysing the complex networks “hidden” in textual material.

To anyone who is interested in Irish history, the knowledge graphs are going to be a key element to not only navigating through these documents, but also creating links between various records and the individuals, places, and events contained within them. This element of the project has come about due to close collaboration between the computer science and humanities disciplines.

What a knowledge graph specifically is defined as is “a set of interconnected typed entities and their attributes and relationships.” The links are established through the use of vocabularies, which standardise the relationships. For example, the image here shows the person “James Audley” in the centre, and the relationships between him and the other entities are labeled accordingly in Resource Description Format (RDF). As he held the position of Justiciar of Ireland, that link is marked as such. These points of data can then easily be linked up externally, as in the area shaded in pink which highlights the man’s biography elsewhere on the internet.

How the collaboration worked between these two very different disciplines was that the computer scientists standardised the rules for structuring the information to ensure it was machine readable. Through consultations and multiple rounds of feedback with the historians, these guidelines were refined and optimised, and then implemented by the researchers actually going through and labeling each record individually. Researcher historians, that is, who are not computer experts and don’t need to learn any extra coding in order to classify the data.

The example document that Lynn and Christophe started with in order to test this whole construction was the Irish Exchequer Payments, 1270-1446. The process began very manually, with Lynn sifting through over 2000 records and manually entering the data into different fields into an Excel spreadsheet. Once the spreadsheet was complete, it went to Christophe for him to transform it into RDF.

Two standard vocabularies were used to create semantic meaning for the data and then were extended by a third which was specific to the Beyond 2020 ontology. The spreadsheet was first checked for errors, then converted to RDF, and then checked again for errors in the RDF both for structure internally and against pre-existing external factoids for accuracy. Once this was completed, the RDF was stored.

What organising the data in this way from this single source enabled the researchers to demonstrate was the way the knowledge graphs can pull out information and make connections instantly, whereas manually these links would be very time consuming to create and oftentimes would simply be missed. For example, they pulled out all of the people who held the title Treasurer of Ireland and compared it to those records associated with the Chancellor of Ireland title during the Medieval period. This resulted in links, or a list of individuals who are noted as having held both offices. Finally, they pulled a third title of Justiciar of Ireland and were able to link this position with the previous ones and pull out a single person who held ll three positions. 

From this example, it’s clear that this method of organising data will enable researchers to explore records and make connections that would have previously not been discoverable. This will allow clusters of related data to emerge and for new conclusions to be drawn in fields beyond just history, like business and finance.

This is only the beginning of the Beyond 2022 project. Once this project is complete, the data will be opened to other researchers who will then be able to grow the network and add more data points to the web.