Back to Search Start Over

Analyzing Gender Translation Errors to Identify Information Flows between the Encoder and Decoder of a NMT System

Authors :
Wisniewski, Guillaume
Zhu, Lichao
Ballier, Nicolas
Yvon, François
Laboratoire de Linguistique Formelle (LLF - UMR7110)
Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité)
Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP (URP_3967))
Université Paris Cité (UPCité)
Traitement du Langage Parlé (TLP )
Laboratoire Interdisciplinaire des Sciences du Numérique (LISN)
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL)
Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Source :
Actes de BlackboxNLP 2022, BlackboxNLP 2022, BlackboxNLP 2022, Dec 2022, Abu Dhabi, United Arab Emirates
Publication Year :
2022
Publisher :
HAL CCSD, 2022.

Abstract

International audience; Multiple studies have shown that existing NMT systems demonstrate some kind of "gender bias". As a result, MT output appears to err more often for feminine forms and to amplify social gender misrepresentations, which is potentially harmful to users and practioners of these technologies. This paper continues this line of investigations and reports results obtained with a new test set in strictly controlled conditions. This setting allows us to better understand the multiple inner mechanisms that are causing these biases, which include the linguistic expressions of gender, the unbalanced distribution of masculine and feminine forms in the language, the modelling of morphological variation and the training process dynamics. To counterbalance these effects, we formulate several proposals and notably show that modifying the training loss can effectively mitigate such biases.

Details

Language :
English
Database :
OpenAIRE
Journal :
Actes de BlackboxNLP 2022, BlackboxNLP 2022, BlackboxNLP 2022, Dec 2022, Abu Dhabi, United Arab Emirates
Accession number :
edsair.od.......165..dd90365b94559c049e022785c3793eb2