“I’m afraid I can do that”
Antal van den Bosch
How big is the revolution brought about by the digital computer? It is impossible to tell, and we ain’t seen nothing yet. Humanity is slowly unwrapping the gift of the universal information-processing machine. Every step in this unwrapping process has been shocking, awe-inspiring, and disruptive.
If the humanities were ever needed in this process, it is now. The most advanced front of development in computer science, artificial intelligence, needs counselling. After decades of technological positivism and mass adoption of new computing-based affordances, the next step for all of this technology is to become culturally aware. Artificial intelligence needs ethics and an awareness of gender issues, diversity and inclusiveness, and it needs to be multi-lingual.
We need culturally aware AI, or cultural AI for short, because we need to understand how people can be influenced by how their social media timelines present information. We need cultural AI to understand when language is perceived as toxic. We need it to understand how a story or a meme, fake or real, can become viral in the internet ecology. When a computer is able to profile people by processing the texts they read and write, and identify with some degrees of certainty their gender, age, and personality traits, we need this computer to say, “I’m afraid I can do that – please check that what you want me to do is ethical and does not violate basic principles”.
At the KNAW Humanities Cluster we develop cultural AI methods and perform cultural AI experiments. To get a rough idea of what we do, just imagine we have at our disposal an immense and seemingly endless sandpit and having no human means whatsoever at one’s disposal for crossing it. There is no limit to the number of dimensions that we can work in, and in general we can do superhuman things in this space. We can work with any piece of digital data that we have gathered or are able to simulate, and we can perform basic manipulations such as counting and changing data pieces billions of times per second per basic processing unit. We can have many processing units working in parallel. What this sandpit allows us to do, is to run simulations to test hypotheses, and do data-driven discovery of knowledge in large amounts of data, in a fraction of time.
In their book The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft researchers Hey, Tansley & Tolle conjecture that the age of the computer has given us, on top of the two basic scientific paradigms of theorising and experimentation, the third paradigm of computational simulation, and the fourth paradigm of data-driven scientific discovery. While the book aims to herald the ‘big data’ age, I would like to stress that cultural AI as we see it is firmly rooted in combining theory with experimentation, and not in blind reliance on seemingly unbiased data-driven discoveries. What we do is always linked to thoughts and theories from our underlying fields of humanities expertise. It is computational and superhuman, nonetheless, and can therefore be placed in the third scientific paradigm.
For instance, some of our recent work with the University of Antwerp has focused on using the computational sandpit for generating synthetic literature (science fiction) and poetry (hiphop). As Mike Kestemont and Folgert Karsdorp argued in their keynote address earlier this year at the Digital Humanities Benelux conference, the common definition of humanities as being about the study of products of the human mind, may very well be switched around: why not model the mind instead and have these models of the mind synthesize products, such as a science fiction novel or hiphop lyrics? Modelling the mind is something we share with the field of cognitive science and brings us close to neuroscience as well – these are inviting bridges to cross and this has become entirely possible.
Computational modelling also allows the modelling of superhuman phenomena that we humans have a hard time imagining or computing, such as randomness. A valid question in trying to understand the outcome of a cultural evolution process is how much of the outcome is caused by randomness. Running thousands or millions of simulations of cultural evolution processes, fed by artificial or real data, with or without specific non-random drivers and randomness, may show that what we perceive as the outcome of a cultural process with hidden but distinct causalities, may partly or wholly be explained by randomness. Being able to do this has far-reaching myth-busting potential. How will we react when the computer says “I’m afraid it was all a coincidence”?
Extra reading: Reading beyond the female
The literary work of women is still considered to be less ‘literary’ than that written by men. Corina Koolen, post-doc researcher at the Huygens ING, studied the roots of this difference. She defended her thesis in May 2018 at the University of Amsterdam.
“Sales and library lending statistics for original Dutch literary novels written by female authors are low”, says Koolen. “Men and translated female authors sell more”. A large national survey among readers, the Nationale Lezersonderzoek, brought to light that both male and female readers rate the work of female authors of lower literary quality. “Women are even more critical about female authors than men.”
It is difficult, however, to find the root cause for this statistic. All parties involved, from readers to juries of literary prizes, seem to contribute to a vicious circle. To investigate one important aspect, whether the root cause is in the text, Koolen employed computer analyses to determine if the gender of the author can be linked to aspects of the text. Is the style of female authors intrinsically different from those of male authors? Her research shows it is not, Koolen says.
While the computer analyses did point to some differences between female and male authors, the style of a text was found to be determined much more by genre, dialogue, and narrative, than by author gender. “Female authors write in the style dictated by the genre, just like their male colleagues.” Within-gender differences are greater than cross-gender differences, but we pay more attention to the latter. When a male author engages in soul searching, he writes a ‘Bildungsroman’; when a female author does the same, her writing is considered to be a ‘women’s book’.
Koolen, in conclusion, proposes to read beyond the female as the only way to break the vicious circle.
Source: Based on University of Amsterdam press release, ‘Kijk nu eens daadwerkelijk naar literaire kwaliteit – Stereotypen belemmeren eerlijke beoordeling vrouwelijke auteurs’ [Literary quality should be the only criterion – Stereotypes are adversely affecting the reception of work by female authors]
Extra reading: Rap bot fools Lowlands
In the summer of 2018, a research team from the Meertens Institute and the University of Antwerp took a synthetic hip hop generator to the annual three-day music and performing arts ‘Lowlands’ festival at Biddinghuizen in the Netherlands and invited festival-goers to take the so-called MC Turing test, trying to distinguish between human-made versus computer-generated lyrics.
Some 800 festival-goers participated in the experiment, including two rappers well known in the Netherlands, Leafs and Sticks. Correctly identifying who wrote which texts for a long streak, Sticks achieved a score of 30, which several months after the event is still the all-time high score. The average respondent achieved only a streak of about 5 to 6. The results have yet to be analysed in detail, but this result alone is already remarkable: the average human cannot distinguish between computer-generated rap texts synthesized from real hip hop lyrics, but a true hip hop expert can.
The texts produced by the rap bot were based on three different models, the simplest model being a recurrent neural network that generates texts letter by letter, whereby each letter is a likely continuation of the sequence of letters produced up to that point. The same model was used by the same research team in 2017 when they trained a science fiction-writing bot to generate an 8,000-word short story in collaboration with Dutch author Ronald Giphart. “The model is capable of producing fluent sentences, but it has the memory of a fruit fly”, says Folgert Karsdorp, post-doc researcher at the Meertens Institute,
The second model is a neural network that generates syllables, which should make it theoretically easier for the model to stick to a rhythm and a rhyme. However, the second model is boring: “it favours frequent words and produces run-of-the-mill hip hop. The letter-based model is much more creative, leading to funny and absurd results,” says Karsdorp. The third model combines the two first models, resulting in creative syllables and words.
This type of experiment offers a new perspective in the humanities, says Karsdorp. “Instead of studying the works of J.K. Rowling, you aim to mimic the writer by training a machine to synthesize new works in the style of J.K. Rowling. This is exciting stuff.”
Source: based on Mathilde Jansen, ‘Rapbot weet Lowlandsgangers om de tuin te leiden’, 7 september 2018.
Antal van den Bosch (MA, Computational Linguistics, 1992; Ph.D., Computer Science, 1997) is director of the Meertens Institute of the Royal Netherlands Academy for Arts and Sciences, Amsterdam, and Professor of Language and Speech Technology in the Faculty of Arts at Radboud University, Nijmegen. His work is in the cross‐section of artificial intelligence (machine learning, natural language processing) and the humanities. He held research positions at Tilburg University, the Netherlands and the Université Libre de Bruxelles (1992-1994), Universiteit Maastricht (1994-1997) and Tilburg University (1997-2011). He is guest professor at the Computational Linguistics and Psycholinguistics Research Centre at the University of Antwerp, Belgium, a member of the Netherlands Royal Academy of Arts and Sciences, and fellow of the European Association for Artificial Intelligence.
Wednesday December 12 2018 the KNAW Humanities Cluster presented HuC LIVE!. At this event the departments DHLab and Digital Infrastructure presented their innovative research and infrastructure. The main theme of this afternoon was about bridging the gulf between science and humanities. In this series of blogs, our guest speakers talk about why they bring science and humanities together.
Find more information here.