Active learning from crowd in document screening

Evgeny Krivosheev; Burcu Sayin; Alessandro Bozzon; Zoltán Szlávik

Active learning from crowd in document screening

Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon, Zoltán Szlávik

Human-Centred Artificial Intelligence

Research output: Contribution to journal › Conference article › Scientific › peer-review

22 Downloads (Pure)

Abstract

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware samplingfor querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

Original language	English
Pages (from-to)	19-25
Number of pages	7
Journal	CEUR Workshop Proceedings
Volume	2736
Publication status	Published - 2020
Event	2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation - Vancouver, Canada Duration: 11 Dec 2020 → 11 Dec 2020

Access to Document

paper4Final published version, 504 KBLicence: CC BY

Cite this

@article{bbc00d0afe75447583d9ffb177a325b7,

title = "Active learning from crowd in document screening",

abstract = "In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware samplingfor querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.",

author = "Evgeny Krivosheev and Burcu Sayin and Alessandro Bozzon and Zolt{\'a}n Szl{\'a}vik",

year = "2020",

language = "English",

volume = "2736",

pages = "19--25",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation, CSW 2020 ; Conference date: 11-12-2020 Through 11-12-2020",

}

TY - JOUR

T1 - Active learning from crowd in document screening

AU - Krivosheev, Evgeny

AU - Sayin, Burcu

AU - Bozzon, Alessandro

AU - Szlávik, Zoltán

PY - 2020

Y1 - 2020

N2 - In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware samplingfor querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

AB - In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -objective-aware samplingfor querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

UR - http://www.scopus.com/inward/record.url?scp=85097872359&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85097872359

SN - 1613-0073

VL - 2736

SP - 19

EP - 25

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2020 Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation

Y2 - 11 December 2020 through 11 December 2020

ER -

Active learning from crowd in document screening

Abstract

Access to Document

Other files and links

Fingerprint

Cite this