Finding the right gear

6 November 2020 Blogpost

Jauco Noordzij

Scholars and software engineers have long been finding it difficult to collaborate effectively. If this were merely a question of a simple difference in their respective starting positions, they would probably have bridged that gap by now. Instead, collaboration proves stubbornly hard to achieve even when those involved on either side take the time to come together and make an honest attempt at learning to speak each other’s language.

To find the root of the problem obstructing collaboration between science and software engineering, it might be productive to take a step back and take a sober look at what, exactly, it is that scientists and engineers do, and how they do it. As for the ‘what’, it would seem that that is the same for both scholars and engineers: (1) creating data, (2) interpreting data and translating it into information and knowledge, and (3) using that acquired knowledge to transform the data in order to make it easier to process for those to whom we hand it over.

The how is where it gets interesting. Scholars try to understand the data, and to use that understanding to process it. They will do so at different speeds. They might use an algorithm that extracts some generic statistics in order to get a ‘feel’ for the data. Or they might execute a very specific transformation, such as normalising spelling variations of the 16th century. The specific approach chosen will depend on the data as well as on the scholars’ current understanding of the purpose of employing the data. There is no single predefined approach. The task for the engineers, then, is not to interpret the data, but rather to create the tools that allow the scholars to work with the data. Moreover, the engineers should make it easy for the scholars to adapt the tools to the different speeds of working with the data. Different constraints will be placed upon a specific tool, depending on how it is used. A tool for gathering statistics might require less configuration than a more specialised tool (like normalising spelling), because the objective would be to produce standardised output for comparisons, but at the same time, its use on a wide variety of data may require a high degree of stability. Conversely, the more specialised tool may, perhaps, be used only once, for one specific dataset, and therefore may require a higher degree of configurability but perhaps can be less robust.

Unfortunately, creating software that can easily ‘switch gears’ has proven to be a fundamental problem in the practice of software engineering. Theoretically, the problem has been solved since the 1970s, when computer scientists actively researched the issue and arrived at the conclusion that software should be a collection of interconnecting components and that a tool is constructed by selecting the necessary components. In practical terms, however, the challenge faced by software engineers was not resolved by this insight. Defining what good software looks like, is a far cry from creating it – just as one does not instantly become a good artist oneself by the mere ability to recognise, or even read analyses on 17th century paintings.

The problem, therefore, is not that there is a ‘gap’ between science and humanities, or that both sides speak a different language, or even that both sides might disagree on what the problem is. The real issue bedevilling collaboration between scholars and software engineers is the complexity that is involved in the specifics of their collaboration. Depending on the type of research and the phase of the research process, different qualities may be required from the software. Sometimes it may have to be flexible, and sometimes it may have to be robust. The challenge is to create software that has the right qualities at the right time, and it is no easy challenge.

This is the problem that software engineers try to solve in their attempts to help scholars. When software engineers say they are creating a ‘digital infrastructure’, they are not referring to roads or bridges. The term ‘infrastructure’ literally means, ‘that which is below the structure’ – in other words, that which is below the arrangement of and relations between the parts or elements of a complex entity. Software engineers do not merely aim to create a well-stocked tool box. They also seek to provide a coherent arrangement of components, so that the software can ‘switch gears’ in tandem with the scholar.

Jauco Noordzij (MSc) studied Information Science at the Utrecht University. He graduated with a thesis on usability engineering and information visualisation. Since 2007 he’s worked contracts as a software developer and technical project lead at startups such as Ampelmann and multinationals such as CSM and IKEA. Since 2015 he has been employed by Huygens ING as the architect and lead engineer for the Timbuctoo team and since april 2018 he has become the head of the Product Development department at the Humanities Cluster overseeing the work of about 20 software developers. Jauco is also workpackage leader on both the Dutch national infrastructure project Golden Agents and on the Dutch CLARIAH project.

Wednesday December 12 2018 the KNAW Humanities Cluster presented HuC LIVE!. At this event the departments DHLab and Digital Infrastructure presented their innovative research and infrastructure. The main theme of this afternoon was about bridging the gulf between science and humanities. In this series of blogs, our guest speakers talk about why they bring science and humanities together.

Find more information here.

CLaaS