Domain specific software
We are working on CLaaS, CLARIAH as a Service. It will provide Domain Services, like Data & Models (Audio, Video, Text, Images, Structured Data), Transformation (Workflow, Provenance, Curation and Evaluation) and Interaction (Workspace, Execution, CMS, UI & UX).
The domain specific software we develop is specialised enough to be useful for specific groups of researchers and generic enough to support a viable amount of users. We love to share some examples.
OCR and HTR software
We develop and customise software to enhance the OCR output on historic newspapers. The typesetting on those historic newspapers may look like calligraphy, the ink of the typeset may be fading, the column-style layout may pose problems, and advertisements may be identified as articles because in those days they didn’t have any illustrations. The same goes for medieval manuscripts or early modern documents. We adjust Handwritten Text Recognition software to recognise each character despite the unique detailing every individual clerk adds.
Extracting and linking entities
We link entities end-to-end: we extract entities, using customised NLP tools like Named Entity Recognition. And to link these named entities the right way, we develop tools for name disambiguation and word-sense disambiguation in close cooperation with our linguists at the Meertens Institute and digital humanities researchers at DHLab.
Geo-toolkit and fuzzy matching
We also develop a geo-toolkit for all disciplines of history at every spatial geographic level. And last but not least we apply fuzzy matching in linking our data to allow for matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database.
eScience Center awards COLLaiTE
The project ‘COLLaiTE: An Artificial Intelligence Approach to Comparing Text Versions’ of Elli Bleeker and Ronald Haentjens Dekker of the DHLab has been awarded the Open…Read more News
Does the Humanities Cluster need to build national infrastructure?
* Nederlands onder de Engelse tekst * Interview with Menno Rasch Menno Rasch has now been director of the KNAW Humanities Cluster’s Digital Infrastructure department for…Read more News