Publication: Implicit Stereotypes in Pre-Trained Classifiers
Issued Date
2021-01-01
Resource Type
ISSN
21693536
Other identifier(s)
2-s2.0-85122065960
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
IEEE Access. Vol.9, (2021), 167936-167947
Suggested Citation
Nassim Dehouche Implicit Stereotypes in Pre-Trained Classifiers. IEEE Access. Vol.9, (2021), 167936-167947. doi:10.1109/ACCESS.2021.3136898 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/76724
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Implicit Stereotypes in Pre-Trained Classifiers
Author(s)
Other Contributor(s)
Abstract
Pre-trained deep learning models underpin many public-facing applications, and their propensity to reproduce implicit racial and gender stereotypes is an increasing source of concern. The risk of large-scale, unfair outcomes resulting from their use thus raises the need for technical tools to test and audit these systems. In this work, a dataset of 10,000 portrait photographs was generated and classified, using CLIP (Contrastive Language-Image Pretraining), according to six pairs of opposing labels describing a subject's gender, ethnicity, attractiveness, friendliness, wealth, and intelligence. Label correlation was analyzed and significant associations, corresponding to common implicit stereotypes in culture and society, were found at the 99% significance level. A strong positive correlation was notably found between labels Female and Attractive, Male and Rich, as well as White Person and Attractive. These results are used to highlight the risk of more innocuous labels being used as partial euphemisms for protected attributes. Moreover, some limitations of common definitions of algorithmic fairness as they apply to general-purpose, pre-trained systems are analyzed, and the idea of controlling for bias at the point of deployment of these systems rather than during data collection and training is put forward as a possible circumvention.