Abstract
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as relevance judgments used to create information retrieval (IR) evaluation collections. Previous research has shown how collecting high quality labels from a crowdsourcing platform can be challenging. Existing quality assurance techniques focus on answer aggregation or on the use of gold questions where ground-truth data allows to check for the quality of the responses. In this paper, we present qualitative and quantitative results, revealing how different crowd workers adopt different work strate- gies to complete relevance judgment tasks efficiently and their consequent impact on quality. We delve into the techniques and tools that highly experienced crowd workers use to be more effi- cient in completing crowdsourcing micro-tasks. To this end, we use both qualitative results from worker interviews and surveys, as well as the results of a data-driven study of behavioral log data (i.e., clicks, keystrokes and keyboard shortcuts) collected from crowd workers performing relevance judgment tasks. Our results high- light the presence of frequently used shortcut patterns that can speed-up task completion, thus increasing the hourly wage of effi- cient workers. We observe how crowd work experiences result in different types of working strategies, productivity levels, quality and diversity of the crowdsourced judgments.
Original language | English |
---|---|
Pages | 241-249 |
Number of pages | 9 |
DOIs | |
Publication status | Published - 2020 |
Externally published | Yes |
Event | In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM 2020) - Duration: 3 Feb 2020 → 7 Feb 2020 Conference number: 13 |
Conference
Conference | In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM 2020) |
---|---|
Abbreviated title | WSDM'20 |
Period | 3/02/20 → 7/02/20 |
Keywords
- Crowdsourcing
- IR evaluation
- Relevance judgment
- User behavior