Domain specific software
We are working on CLaaS, CLARIAH as a Service. It will provide Domain Services, like Data & Models (Audio, Video, Text, Images, Structured Data), Transformation (Workflow, Provenance, Curation and Evaluation) and Interaction (Workspace, Execution, CMS, UI & UX).
The domain specific software we develop is specialised enough to be useful for specific groups of researchers and generic enough to support a viable amount of users. We love to share some examples.
OCR and HTR software
We develop and customise software to enhance the OCR output on historic newspapers. The typesetting on those historic newspapers may look like calligraphy, the ink of the typeset may be fading, the column-style layout may pose problems, and advertisements may be identified as articles because in those days they didn’t have any illustrations. The same goes for medieval manuscripts or early modern documents. We adjust Handwritten Text Recognition software to recognise each character despite the unique detailing every individual clerk adds.
Extracting and linking entities
We link entities end-to-end: we extract entities, using customised NLP tools like Named Entity Recognition. And to link these named entities the right way, we develop tools for name disambiguation and word-sense disambiguation in close cooperation with our linguists at the Meertens Institute and digital humanities researchers at DHLab.
Geo-toolkit and fuzzy matching
We also develop a geo-toolkit for all disciplines of history at every spatial geographic level. And last but not least we apply fuzzy matching in linking our data to allow for matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database.
E-data & Research – June 2021: Magazine about data and research in the humanities and social sciences
Out now: E-data & Research (June 2021), Magazine about data and research in the humanities and social sciencesRead more News
Menno Rasch new director of Digital Infrastructure
We are pleased to announce that on June 1st, 2021, Menno Rasch has started at the KNAW Humanities Cluster as the new director of Digital Infrastructure.Read more News