All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

Michael D. Ekstrand; Mucun Tian; Ion Madrazo Azpiazu; Jennifer D. Ekstrand; Oghenemaro Anuyah; David McNeill; Maria Soledad Pera

All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

Michael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand, Oghenemaro Anuyah, David McNeill, Maria Soledad Pera

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Abstract

In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.

Original language	Undefined/Unknown
Title of host publication	Proceedings of the 1st Conference on Fairness, Accountability and Transparency
Editors	Sorelle A. Friedler, Christo Wilson
Publisher	PMLR
Pages	172-186
Number of pages	15
Volume	81
Publication status	Published - 1 Nov 2018
Externally published	Yes

Access to Document

https://proceedings.mlr.press/v81/ekstrand18b.html

Cite this

Ekstrand, M. D., Tian, M., Azpiazu, I. M., Ekstrand, J. D., Anuyah, O., McNeill, D., & Pera, M. S. (2018). All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. In S. A. Friedler, & C. Wilson (Eds.), Proceedings of the 1st Conference on Fairness, Accountability and Transparency (Vol. 81, pp. 172-186). PMLR. https://proceedings.mlr.press/v81/ekstrand18b.html

@inproceedings{e5017d1ed48e463ca921e17ab8d62a26,

title = "All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness",

abstract = "In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond na{\"i}ve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.",

author = "Ekstrand, {Michael D.} and Mucun Tian and Azpiazu, {Ion Madrazo} and Ekstrand, {Jennifer D.} and Oghenemaro Anuyah and David McNeill and Pera, {Maria Soledad}",

year = "2018",

month = nov,

day = "1",

language = "Undefined/Unknown",

volume = "81",

pages = "172--186",

editor = "Friedler, {Sorelle A.} and Christo Wilson",

booktitle = "Proceedings of the 1st Conference on Fairness, Accountability and Transparency",

publisher = "PMLR",

}

Ekstrand, MD, Tian, M, Azpiazu, IM, Ekstrand, JD, Anuyah, O, McNeill, D & Pera, MS 2018, All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. in SA Friedler & C Wilson (eds), Proceedings of the 1st Conference on Fairness, Accountability and Transparency. vol. 81, PMLR, pp. 172-186. <https://proceedings.mlr.press/v81/ekstrand18b.html>

All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. / Ekstrand, Michael D.; Tian, Mucun; Azpiazu, Ion Madrazo et al.
Proceedings of the 1st Conference on Fairness, Accountability and Transparency. ed. / Sorelle A. Friedler; Christo Wilson. Vol. 81 PMLR, 2018. p. 172-186.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

AU - Ekstrand, Michael D.

AU - Tian, Mucun

AU - Azpiazu, Ion Madrazo

AU - Ekstrand, Jennifer D.

AU - Anuyah, Oghenemaro

AU - McNeill, David

AU - Pera, Maria Soledad

PY - 2018/11/1

Y1 - 2018/11/1

N2 - In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.

AB - In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes. For example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.

M3 - Conference contribution

VL - 81

SP - 172

EP - 186

BT - Proceedings of the 1st Conference on Fairness, Accountability and Transparency

A2 - Friedler, Sorelle A.

A2 - Wilson, Christo

PB - PMLR

ER -