Back to Search Start Over

A statistical approach to detect sensitive features in a group fairness setting

Authors :
Pelegrina, Guilherme Dean
Couceiro, Miguel
Duarte, Leonardo Tomazeli
Universidade Estadual de Campinas = University of Campinas (UNICAMP)
Knowledge representation, reasonning (ORPAILLEUR)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Work supported by Sao Paulo Research Foundation (FAPESP) under the grants #2020/09838-0 (BI0S - Brazilian Institute of Data Science), #2020/10572-5 and #2021/11086-0. The research of the second named author was partiallysupported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No952215, and the Inria Project Lab 'Hybrid Approaches for Interpretable AI' (HyAIAI).
Publication Year :
2023
Publisher :
HAL CCSD, 2023.

Abstract

The use of machine learning models in decision support systems with high societal impact raised concerns about unfair (disparate) results for different groups of people. When evaluating such unfair decisions, one generally relies on predefined groups that are determined by a set of features that are considered sensitive. However, such an approach is subjective and does not guarantee that these features are the only ones to be considered as sensitive nor that they entail unfair (disparate) outcomes. In this paper, we propose a preprocessing step to address the task of automatically recognizing sensitive features that does not require a trained model to verify unfair results. Our proposal is based on the Hilber-Schmidt independence criterion, which measures the statistical dependence of variable distributions. We hypothesize that if the dependence between the label vector and a candidate is high for a sensitive feature, then the information provided by this feature will entail disparate performance measures between groups. Our empirical results attest our hypothesis and show that several features considered as sensitive in the literature do not necessarily entail disparate (unfair) results.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....d11d8f62e591aee89310173daf7f50da