Scholarly Web Annotation

Marijn Koolen


Annotation is fundamental activity in humanities research. Across the various disciplines in the humanities, scholars use annotations to reference passages of books in their own articles, add tags or codes to correspondences of historical persons to trace topical discussions in social networks, transcribe relevant parts of a radio interview, or link objects in film posters to scenes in their corresponding films.

With the turn to digital research, there is a great need for digital annotation tools that support scholars to make, organise and analyse annotations for their own research, as well as share and reuse them in collaborative projects. This need is not limited to humanities research. In areas such as psychology and the social and cognitive sciences, extensive use is made of annotation of human expressions in digital materials. Through digital humanities’ projects, humanities scholars are also increasingly collaborating with these research areas to study how people experience films and novels, how language evolves across different communities, and how politicians use historical events to gather support or dismiss an opponent’s argument.

At the Digital Infrastructure department of the Humanities Cluster, we are developing a highly flexible, open source annotation tool that is easily incorporated in websites with relevant content for research, can make many different types of annotations on textual, image and audiovisual objects, and allows researchers to store, query, share and reuse these annotations across the web.
Many resources are available online and accessible via a web-browser, which means annotation support should work well in the browser. However, there are many challenges that current annotation tools cannot deal with, or only partially. Browser-based annotation tools use the location and layout of a web page to identify which part of which web page has been selected and annotated by a researcher. This becomes a problem when the layout of the page changes, when the page is moved to a different location, or when the same digital object is also displayed on a different page elsewhere.

Part of our solution is to encode semantic information about the digital object in the underlying structure of the web page, together with persistent identifiers linked to the objects. Researchers see a human-readable presentation of the digital object, the annotation tool sees the content of the object as well as the semantic structure of its parts, and can point an annotation to a persistent identifier of the object, regardless of where and how that object is presented on the web.

This greatly increases the opportunities for sharing annotations with and across projects, and for allowing them to be reused by others. It can also benefit memory and heritage institutions and other content providers: beyond making it possible for scholars to annotate their collections, they may also want to use those annotations to enrich their collections with additional knowledge and interpretations.

Moreover, as the manual annotations have persistent links to digital resources, they can serve as training data for machine learning algorithms to scale up annotation processes, and to identify new patterns and relations in large collections of research materials. This makes our approach to annotation also useful for data scientists and artificial intelligence researchers, who often need human judgements as gold standards to test and evaluate new algorithms.

The needs and uses of digital annotation show the many ties between humanities research and other research areas. Our aim is to develop the technology to strengthen these ties and lower the threshold, to further explore the possibilities of multi-disciplinary research collaborations.


Marijn Koolen is a researcher and developer at the Department of Digital Infrastructure of the KNAW Humanities Cluster. He has a background in Information Retrieval and Artificial Intelligence and got his PhD at the University of Amsterdam for work on using hyperlink structure to improve web search engines. He was assistant professor of information science and digital humanities at the University of Amsterdam and worked as R&D engineer on the CLARIAH digital research infrastructure for media scholars at the Netherlands Institute for Sound and Vision. He currently works on infrastructure for scholarly annotations, transparent recommendation systems and digital data and tool criticism for digital humanities research methodology. He is also involved in digital literary studies projects on the impact that reading fiction has on readers and the prediction of bestsellers.


Wednesday December 12 2018 the KNAW Humanities Cluster presented HuC LIVE!. At this event the departments DHLab and Digital Infrastructure presented their innovative research and infrastructure. The main theme of this afternoon was about bridging the gulf between science and humanities. In this series of blogs, our guest speakers talk about why they bring science and humanities together.

Find more information here.

nl eng
Partner IISG Partner Meertens Instituut Partner Huygens ING