- Main •
- CV •
- Events •
- Teaching •
- Supervision •
- Review Activities •
- Publications •
- Data •
- Projects & Social Media •
- Material

Want to discuss an idea? Arrange an appointment on my Calendly.
Social media
Connect with me on Twitter or mastodon or LinkedIn.
Memberships and affiliations
Member of the DSI Community Libraries.
Member and co-[founder|coordinator] of the DSI Community Digital Humanities.
Member of the working group Editions of DARIAH-CH.
Member of the Zentrum Altertumswissenschaften Zürich (ZAZH).
Member of the Ägyptologie-Forum Zürich.
Member of digital humanities im deutschsprachigen raum (DHd).
Member of SSHOC-CH.
News!
Check out our AISL Podcast!
Current project(s) and activities
Please also look at the Data page for current and past projects' datasets, models, and code.
- In my habilitation project, I will look at Ekphrasis and the capability of LLMs at identifying descriptions of art and architecture in ancient texts.
- In July 2024, I joined Prof. D. Felix K. Maier's chair of Ancient History to establish the AIncient Studies Lab, in which we focus to bring together Artificial Intelligence and historical studies.
- In December 2023, the DIZH innovation program by Prof. Dr. Felix K. Maier has been approved. The Re-Experiencing History project uses AI technology to recreate historical events and moments visually. It aims to develop a web application that converts historical texts into storyboards, comics, or individual images, addressing challenges in representing pre-modern era 'lost' history. The project will semantically enrich historical content and make it accessible through innovative visual interfaces, tackling issues like copyright and historical distortions from AI-generated images. It collaborates with libraries, archives, and historians, bridging computer technology, history, and citizen science.
- We obtained funding for a second phase of the project Bullinger Digital (see "Past projects" below). In this second part, we focus on the semantic indexing of the correspondence. We aim to automatically classify the letters (via various topic modelling techniques) and recognise and link named entities. The second phase also comprises a citizen science campaign during which citizens assist us in correcting and linking named entities and events in the correspondence. We will employ various AI techniques to translate letters from Latin and Early New High German to English and standard German, generate automatic summaries, and allow novel access to this treasure of the 16th century.
Past projects
- In September 2023, innosuisse approved our application for an Innovation Cheque project with Locomot GmbH. This project aims at making photo archives more accessible to the general public. We will use digitised index cards, a photo, a building, and a card index combined with large language models to re-tell the history of Davos.
- In January 2021, I joined the Bullinger Digital project, which is dedicated to digitalising the correspondence by Swiss reformer Heinrich Bullinger. The project aims to apply Handwritten Text Recognition (HTR) to about 3,000 of the over 12,000 letters Bullinger has written and received from many colleagues of his time from all over Europe. The project is kindly funded by the Hasler Foundation, among others, and also involves a partnership with Andreas Fischer from the Department of Informatics at the University of Fribourg and Tobias Hodel from the Digital Humanities group at the University of Bern. We could obtain further funding for a second project phase (see current projects above).
- As of September 2017, I have been collaborating in the impresso project. impresso stands for Integrated Monitoring of Historical Press Corpora. During a three-year project phase, financed by a SNSF Sinergia grant, the DHLAB at the EPFL, the C2DH at the University of Luxembourg, and our department will work on text mining of historical newspapers. My main contribution to this undertaking was the lexical semantic indexing of texts and topic modelling of historical newspaper articles.
- From 2015 to 2017, I was responsible for the Text+Berg pipeline. The Text+Berg corpus consists of the yearbooks from the Swiss Alpine Club (SAC) and is a big multilingual collection of texts with mountaineering as their main topic.
- From 2014 to 2016, I helped compile the Credit Suisse Corpus, which features text in multiple languages from the web news and the available PDF files, as well as scans of the world's oldest banking magazine from 1895.
- From 2015 to 2016, I was a research assistant at the URPP Language and Space and helped build the ArchiMob corpus of recorded speech from people who lived through the Second World War in Switzerland.
Right now I am ...
... starting a collaboration with the Department of History at the UZH.
... collaborating with the Chair of Systems Desgin at ETH and the Heidelberger Akademie der Wissenschaften to track the spread of ideas in correspondence data during the Reformation.
... collaborating with Patrick J. Burns from NYU about Latin language models.
Upcoming Events
- CAS Hochschuldidaktik at PHZ, June 2024 - June 2025.
Past Events
2024 |
EACL in Malta, March 18-22. Poster presentation of our paper about employing GPT for POS tagging of 16th-century Latin. |
Open Up Digital Editions Conference in Zürich, January 24-26. Poster presentation entitled: Lessons Learnt from Bullinger Digital. See the book of abstracts. |
Invited talk (online) at the South African Center for Digital Language Resources. Title: Innovating Historical Scholarship: The Bullinger Digital Project. January 31, 2024, 10am SAST. |
2023 |
CAIDAS Workshop in Würzburg, February 6-8. Presentation entitled: Bullinger Digital - Texterkennung in einem reformatorischen Briefwechselkorpus. |
Bullinger Digital: 500 Jahre Bullingerbriefwechsel in Zurich, February 24. Presentation entitled: Bullinger Digital 2.0. |
DHd 2023 in Trier, March 13-17. Presentation entitled: Bullingers Briefwechsel zugänglich machen: Stand der Handschriftenerkennung. |
DaSCHCon on Digital Editions and Interoperabilityin Bern, March 24. Participation. |
PhD Defense, May 26, Zürich. Passed. |
June 9: Invited talk at the "Text Recognition and Cultural Heritage" workshop organised by DIZH about the state-of-the-art of handwritten text recognition in the project Bullinger Digital at the Zurich States Archive. |
August 25: Participation at the ADAPDA Workshop at ICDAR in San José, USA, with a paper about the adaptability of TrOCR for historical handwritings. |
Transkriptionen zeitgemäss mit Transkribus. October 4, Zentralbibliothek Zürich. Workshop organiser. |
Brücken bauen: Einblicke in die Vermittlung von ATR an Geisteswissenschaftler. November 29, Zentralbibliothek Zürich. Invited talk at PATT workshop. |
2022 |
COMHUM 2022 in Lausanne, June 9-10. Presentation entitled: Transformer-based HTR for Historical Documents. |
LREC 2022 in Marseille, June 21-23. Presentation entitled: Evaluation of HTR models without Ground Truth Material. |
DARIAH-CH Study Day in Mendrisio, October 20. Presentation entitled: Bullinger Digital – The Transformation and Expansion of an Analogue Edition into the Digital Age. |
2021 |
Einführung in Theorie und Praxis der OCR mit neuronalen Netzwerken in Zurich, October 4. Workshop organiser. |
2020 |
LREC 2020 in Marseille (cancelled due to the Corona Pandemic). Presentation at conference. Paper title: How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR. |
2019 |
Digital Humanities 2019 - Presentation at Conference, July 8-12. Title: Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images. Slides. |
2018 |
Vom Diarium zum Digitarium - Invited Talk at Workshop, April 24-25. Slides (in German). |
2017 |
Sinergia: Kick-Off Workshop at the EPFL, October 24-25. |
Digital Humanities Austria 2017 - Invited Talk at a workshop about "Building Bridges", December 4-6. Slides. |
Teaching
AS 2016 | Teaching Assistant in Einführung in die Multilinguale Textanalyse, with Martin Volk |
SS 2017 | Teaching Assistant in Semantische Rollen und relationale Fakten, with Simon Clematide |
SS 2018 | Teaching Assistant in Sentimentanalyse und Media Monitoring, with Simon Clematide |
AS 2018 | Teaching Assistant in Deep Learning in der Sprachtechnologie, with Simon Clematide |
AS 2018 | Teaching Assistant in Automated Media Content Analysis, with Gerold Schneider |
SS 2019 | Teaching Assistant in Machine Learning for Natural Language Processing 1, with Simon Clematide |
AS 2021 | Online teaching of the course Einführung in die Computerlinguistik at the University of Innsbruck |
SS 2023 | Teaching of the course Creation and Annotation of Linguistic Resources with George Yong |
SS 2024 | Teaching of the course Creation and Annotation of Linguistic Resources with Martin Volk |
Supervision
Semester | Student, Thesis Type | Status | Topic (Thesis Title) |
Spring 2023 | Elina Stüssi, BA | done | Part-of-Speech Tagging for Early Modern Latin Correspondence |
Autumn 2023 | Nikolaj Bauer, MA | ongoing | Exploring the Capabilities of LLMs in Supporting Scholarly Editions |
Autumn 2023 | Olga Shpakova | done | Dokumentbasierter LLM-gestützter Chatbot als Rechtsassistenz: Entwicklung einer chat-basierten API für den Zugriff auf juristische Datenbank. |
Autumn 2023 | Yung-Hsin Chen, MA | ongoing | Improvements in the Adaptation of TrOCR Models for Non-English OCR/HTR |
Spring 2024 | Zejie Guo, MA | ongoing | Faithful Image Generation of Historical Events via Prompt Engineering |
Spring 2024 | Ülkü Karagöz, MA | done | Reconstructing Ancient Rome: Historical Accuracy in AI-Generated Images |
Autumn 2024 | Mo Zhang, MA | ongoing | Faithful Image Generation for Illustrating 16th-Century Correspondence |
Autumn 2024 | Zhaoyi Cheng, MA | ongoing |
Automated Character Avatar Generation from Fiction Books |
Review Activities
I served as a reviewer/on a PC for the following conferences/workshops:
Conference/Journal/Event | # of reviews |
---|---|
ACL 2022 | 2 |
EMNLP 2022 -- Track: Multilinguality | 2 |
ACL 2023 -- Track: Resources and Evaluation | 5 |
EMNLP 2023 | 5 |
EMNLP 2023 Industry Track | TBD |
Begriffe der Digital Humanities by ZfdG | 1 |
DHd 2024 | 5 |
ACL ARR 2023 August | 2 |
ACL ARR 2023 October | 3 |
Digital Humanities Quarterly, October 2023 cycle | 2 |
ACL ARR 2023 December | 3 |
Journal of Open Humanities Data, February cycle |
1 |
ACL ARR 2024 February | 4 |
ACL ARR 2024 June | 2 |
MDPI Journal 'Heritage' | 1 |
ACL ARR 2024 October | 4 |
Total | 41 |
Publications
ZORA Publication List
Download Options
Publications
-
Re-experiencing history: a platform for the re-enactment of historical events with multimodal large language models. In: DHd 2025 Under Construction, Bielefeld, 3 März 2025 - 6 März 2025. Zenodo, 510-511.
-
50 years of editorial practice: a footnote analysis of the Heinrich Bullinger Briefwechsel. In: DHd 2025, Bielefeld, 2025. Zenodo, 207-212.
-
Bringing Rome to life: evaluating historical image generation. In: Proceedings of the Computational Humanities Research Conference 2024, Aarhus, 4 December 2024 - 6 December 2024. CEUR-WS, 113-126.
-
LLM-based Translation Across 500 Years. The Case for Early New High German. In: 20th Conference on Natural Language Processing (KONVENS 2024), Wien, Österreich, 10 September 2024 - 13 September 2024. Association for Computational Linguistics, 368-375.
-
LLM-based Machine Translation and Summarization for Latin. In: Third Workshop on Language Technologies for Historical and Ancient Languages -- LT4HALA (at LREC/COLING), Torino, 25 May 2024.
-
Multilingual Workflows in Bullinger Digital: Data Curation for Latin and Early New High German. Journal of Open Humanities Data, 10(12):12.
-
New “ArchAIval” Practices: Using GPT for OCR and Historical Narration of Index Cards. In: Linking Theory and Practice of Digital Libraries. TPDL 2024, Ljubljana, 2024. Springer (Bücher), 183-192.
-
Decoding 16th-Century Letters: From Topic Models to GPT-Based Keyword Mapping. In: 20th Conference on Natural Language Processing (KONVENS 2024), Wien, Österreich, 10 September 2024 - 13 September 2024. Association for Computational Linguistics, 209-221.
-
TrOCR Meets Language Models: An End-to-End Post-correction Approach. In: ADAPDA, Athens, 2024. Springer (Bücher), 12-26.
-
Part-of-Speech Tagging of 16th-Century Latin with GPT. In: 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), Malta, 22 March 2024. Association for Computational Linguistics, 196-206.
-
Bullingers Briefwechsel zugänglich machen: Stand der Handschriftenerkennung. In: DHd 2023, Trier, 13 March 2023 - 17 March 2023.
-
Lessons Learnt from Bullinger Digital. In: Open Up Digital Editions Conference 2024, Zurich, 24 January 2024 - 26 January 2024. Center Digital Editions & Edition Analytics (University Library Zurich) and Research and Infrastructure Support RISE (University of Basel), 75-76.
-
Evaluating State-of-the-Art Handwritten Text Recognition (HTR) Engines; with Large Language Models (LLMs) for Historical Document Digitisation. In: Conference on Computational Humanities Research, Paris, 2023.
-
The Bullinger Dataset: A Writer Adaptation Challenge. In: Document Analysis and Recognition - ICDAR 2023, San Jose, 2023. Springer, 397-410.
-
The Adaptability of a Transformer-Based OCR Model for Historical Documents. In: Document Analysis and Recognition -- ICDAR 2023 Workshops, San José, 2023. Springer, 34-48.
-
Flexible Techniques for Automatic Text Recognition of Historical Documents. 2023, University of Zurich, Faculty of Arts.
-
Transformer-based HTR for Historical Documents. In: Workshop on Computational Methods in the Humanities 2022, Lausanne, 9 Juni 2022 - 10 Juni 2022.
-
Evaluation of HTR models without Ground Truth Material. In: LREC 2022, Marseille, 21 June 2022 - 23 June 2022, European Language Resources Association.
-
Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters. In: Proceedings of LREC-2022, Marseille, 21 June 2022 - 26 June 2022, LREC.
-
Ein Briefwechsel-Korpus des 16. Jahrhunderts in Frühneuhochdeutsch. In: Kupietz, Marc; Schmidt, Thomas. Neue Entwicklungen in der Korpuslandschaft der Germanistik. Tübingen: Narr Francke Attempto GmbH + Co. KG, 33-42.