Active learning from crowd in document screening

Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon, Zoltán Szlávik

Research output: Contribution to journalConference articleScientificpeer-review

14 Downloads (Pure)

Abstract

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware samplingfor querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

Original languageEnglish
Pages (from-to)19-25
Number of pages7
JournalCEUR Workshop Proceedings
Volume2736
Publication statusPublished - 2020
Event2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation - Vancouver, Canada
Duration: 11 Dec 202011 Dec 2020

Fingerprint

Dive into the research topics of 'Active learning from crowd in document screening'. Together they form a unique fingerprint.

Cite this