Workshop program "LLM fails"

Note: Anonymized abstracts are available when clicking the titles of the talks.

Tuesday, 8 April 2025

Time Session
13:00-13:15h Welcome (Henning Lobin, Scientific Director of the IDS)
& Introduction (organizing team)
  Block 1: Complex semantics
13:15-13:45h The Struggles of Large Language Models with Zero- and Few Shot (Extended)
Metaphor Detection
(Sebastian Reimann, Ruhr University Bochum, Germany
/ Tatjana Scheffler, Ruhr University Bochum, Germany)
13:45-14:15h Do LLMs fail in bridging generation? (Natalia Skachkova, German Research Centre for Artificial Intelligence, Saarland Informatics Campus, Saarbrücken, Germany)
14:15-14:45h Fine-grain semantic analysis by LLMs: real-world technical reports vs simple
synthetic sentences
(Mariame Maarouf, Université Toulouse Jean Jaurès, France / Ludovic Tanguy, Université Toulouse Jean Jaurès, France)
14:45-15:05h Discussion
15:05-15:20h Coffee break
  Block 2: Beyond text
15:20-15:50h Pictorial constituents & the metalinguistic performance of LLMs (John David Storment, Stony Brook University, New York, USA)
15:50-16:20h GPT makes a poor AMR parser (Yanming Li, Inria Saclay, France / Meaghan Fowlie, Utrecht University, Netherlands)
16:20-16:35h Coffee break
  Block 3: Emulating people
16:35-17:05h Failure to Teach Communicative Competence in an AI-assisted (ChatGPT) Mode (Alexey Tymbay, Technical University in Liberec, Czech Republic)
17:05-17:35h Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations (Simon Münker, Department of Computational Linguistics, Trier University, Germany)
17:35-18:00h Discussion
19:00h Dinner

Wednesday, 9 April 2025

Time Session
  Block 4: NER & terminology
09:00-09:30h Challenges for Large Language Models in Identifying Named Entities in
Climate Change and Biodiversity Texts: A Study of Errors
(Elena Volkanovska, Technical University of Darmstadt, Germany)
09:30-10:00h [The knight's many names – using an LLM as named entity recognizer in 16th century novels (Sven Kraus, Humboldt University of Berlin, Germany)] Replacement talk: Is Llama 3-8b a reliable annotator for a sentiment analysis? (Ngoc Duyen Tanja Tu, Leibniz Institute for the German language)
10:00-10:30h Large language models for terminology work: A question of the right prompt? (Barbara Heinisch, Eurac Research, Italy)
10:30-10:50h Discussion
10:50-11:20h Coffee break & Best paper award
  Block 5: Prompting
11:20-11:50h Exploring the Limits of LLMs in German Text Classification: Prompting
and Fine-tuning Strategies Across Small and Medium-sized Datasets
(Elena Leitner, German Research Centre for Artificial Intelligence, Berlin, Germany / Georg Rehm, German Research Centre for Artificial Intelligence, Berlin, Germany)
11:50-12:20h Reassessing the Role of Prompt Engineering for Small-Scale Language Models (Valentin Noël, Devoteam, France / Elimane Yassine Seidou, Devoteam, France)
12.20-12.50h Discussion
12.50-13.00h Closing words