Back to Search Start Over

ELOQUENT 2024 - Robustness Task

Authors :
Sahlgren, Magnus
Karlgren, Jussi
Dürlich, Luise
Gogoulou, Evangelia
Talman, Aarne
Zahra, Shorouq
Sahlgren, Magnus
Karlgren, Jussi
Dürlich, Luise
Gogoulou, Evangelia
Talman, Aarne
Zahra, Shorouq
Publication Year :
2024

Abstract

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to apply high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. One of the tasks for the first year of ELOQUENT was the robustness task, in which we assessed the robustness and consistency of a model output given variation in the input prompts. We found that indeed the consistency varied, both across prompt items and across models, and on a methodological note we find that using a oracle model for assessing the submitted responses is feasible, and intend to investigate consistency across such assessments for different oracle models. We intend to run this task in coming editions for ELOQUENT to establish a solid methodology for further assessing consistency, which we believe to be a crucial component of trustworthiness as a top level quality characteristic of generative language models.<br />This lab has been supported by the European Commission through the DeployAI project (grant number 101146490), by the Swedish Research Council (grant number 2022-02909), and by UK Research and Innovation (UKRI) under the UK government's Horizon Europe funding guarantee [grant number 10039436 (Utter)].

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1457591476
Document Type :
Electronic Resource