Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings

Katja Geertruida Schmahl; Tom Julian Viering; Stavros Makrodimitris; Arman Naseri Jahfari; David Tax; Marco Loog

doi:10.18653/v1/2020.nlpcss-1.11

Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings

Katja Geertruida Schmahl, Tom Julian Viering, Stavros Makrodimitris, Arman Naseri Jahfari, David Tax, Marco Loog

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

440 Downloads (Pure)

Abstract

Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain stereotypical gender biases. As a result, such unwanted biases will typically also be present in word embeddings derived from such corpora and downstream applications in the field of natural language processing (NLP). To minimize the effect of gender bias in these settings, more insight is needed when it comes to where and how biases manifest themselves in the text corpora employed. This paper contributes by showing how gender bias in word embeddings from Wikipedia has developed over time. Quantifying the gender bias over time shows that art related words have become more female biased. Family and science words have stereotypical biases towards respectively female and male words. These biases seem to have decreased since 2006, but these changes are not more extreme than those seen in random sets of words. Career related words are more strongly associated with male than with female, this difference has only become smaller in recently written articles. These developments provide additional understanding of what can be done to make Wikipedia more gender neutral and how important time of writing can be when considering biases in word embeddings trained from Wikipedia or from other text corpora.

Original language	English
Title of host publication	Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Place of Publication	Online
Publisher	Association for Computational Linguistics
Pages	94-103
Number of pages	10
DOIs	https://doi.org/10.18653/v1/2020.nlpcss-1.11
Publication status	Published - 1 Nov 2020

Access to Document

10.18653/v1/2020.nlpcss-1.11

2020.nlpcss-1.11Final published version, 441 KB

Cite this

Schmahl, K. G., Viering, T. J., Makrodimitris, S., Naseri Jahfari, A., Tax, D., & Loog, M. (2020). Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science (pp. 94-103). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.nlpcss-1.11

Schmahl, Katja Geertruida ; Viering, Tom Julian ; Makrodimitris, Stavros et al. / Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science. Online : Association for Computational Linguistics, 2020. pp. 94-103

@inproceedings{dc49c8ac387a486592e81604dcaf1bd8,

title = "Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings",

abstract = "Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain stereotypical gender biases. As a result, such unwanted biases will typically also be present in word embeddings derived from such corpora and downstream applications in the field of natural language processing (NLP). To minimize the effect of gender bias in these settings, more insight is needed when it comes to where and how biases manifest themselves in the text corpora employed. This paper contributes by showing how gender bias in word embeddings from Wikipedia has developed over time. Quantifying the gender bias over time shows that art related words have become more female biased. Family and science words have stereotypical biases towards respectively female and male words. These biases seem to have decreased since 2006, but these changes are not more extreme than those seen in random sets of words. Career related words are more strongly associated with male than with female, this difference has only become smaller in recently written articles. These developments provide additional understanding of what can be done to make Wikipedia more gender neutral and how important time of writing can be when considering biases in word embeddings trained from Wikipedia or from other text corpora.",

author = "Schmahl, {Katja Geertruida} and Viering, {Tom Julian} and Stavros Makrodimitris and {Naseri Jahfari}, Arman and David Tax and Marco Loog",

year = "2020",

month = nov,

day = "1",

doi = "10.18653/v1/2020.nlpcss-1.11",

language = "English",

pages = "94--103",

booktitle = "Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science",

publisher = "Association for Computational Linguistics",

}

Schmahl, KG, Viering, TJ , Makrodimitris, S , Naseri Jahfari, A , Tax, D & Loog, M 2020, Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. in Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science. Association for Computational Linguistics, Online, pp. 94-103. https://doi.org/10.18653/v1/2020.nlpcss-1.11

Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. / Schmahl, Katja Geertruida; Viering, Tom Julian ; Makrodimitris, Stavros et al.
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science. Online: Association for Computational Linguistics, 2020. p. 94-103.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings

AU - Schmahl, Katja Geertruida

AU - Viering, Tom Julian

AU - Makrodimitris, Stavros

AU - Naseri Jahfari, Arman

AU - Tax, David

AU - Loog, Marco

PY - 2020/11/1

Y1 - 2020/11/1

N2 - Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain stereotypical gender biases. As a result, such unwanted biases will typically also be present in word embeddings derived from such corpora and downstream applications in the field of natural language processing (NLP). To minimize the effect of gender bias in these settings, more insight is needed when it comes to where and how biases manifest themselves in the text corpora employed. This paper contributes by showing how gender bias in word embeddings from Wikipedia has developed over time. Quantifying the gender bias over time shows that art related words have become more female biased. Family and science words have stereotypical biases towards respectively female and male words. These biases seem to have decreased since 2006, but these changes are not more extreme than those seen in random sets of words. Career related words are more strongly associated with male than with female, this difference has only become smaller in recently written articles. These developments provide additional understanding of what can be done to make Wikipedia more gender neutral and how important time of writing can be when considering biases in word embeddings trained from Wikipedia or from other text corpora.

AB - Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain stereotypical gender biases. As a result, such unwanted biases will typically also be present in word embeddings derived from such corpora and downstream applications in the field of natural language processing (NLP). To minimize the effect of gender bias in these settings, more insight is needed when it comes to where and how biases manifest themselves in the text corpora employed. This paper contributes by showing how gender bias in word embeddings from Wikipedia has developed over time. Quantifying the gender bias over time shows that art related words have become more female biased. Family and science words have stereotypical biases towards respectively female and male words. These biases seem to have decreased since 2006, but these changes are not more extreme than those seen in random sets of words. Career related words are more strongly associated with male than with female, this difference has only become smaller in recently written articles. These developments provide additional understanding of what can be done to make Wikipedia more gender neutral and how important time of writing can be when considering biases in word embeddings trained from Wikipedia or from other text corpora.

U2 - 10.18653/v1/2020.nlpcss-1.11

DO - 10.18653/v1/2020.nlpcss-1.11

M3 - Conference contribution

SP - 94

EP - 103

BT - Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

PB - Association for Computational Linguistics

CY - Online

ER -

Schmahl KG, Viering TJ , Makrodimitris S , Naseri Jahfari A , Tax D , Loog M. Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science. Online: Association for Computational Linguistics. 2020. p. 94-103 doi: 10.18653/v1/2020.nlpcss-1.11

Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings

Abstract

Access to Document

Fingerprint

Cite this