A variance maximization criterion for active learning

Yazhou Yang; Marco Loog

doi:10.1016/j.patcog.2018.01.017

A variance maximization criterion for active learning

Yazhou Yang^*, Marco Loog

^*Corresponding author for this work

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

35 Citations (Scopus)

56 Downloads (Pure)

Abstract

Active learning aims to train a classifier as fast as possible with as few labels as possible. The core element in virtually any active learning strategy is the criterion that measures the usefulness of the unlabeled data based on which new points to be labeled are picked. We propose a novel approach which we refer to as maximizing variance for active learning or MVAL for short. MVAL measures the value of unlabeled instances by evaluating the rate of change of output variables caused by changes in the next sample to be queried and its potential labelling. In a sense, this criterion measures how unstable the classifier's output is for the unlabeled data points under perturbations of the training data. MVAL maintains, what we refer to as, retraining information matrices to keep track of these output scores and exploits two kinds of variance to measure the informativeness and representativeness, respectively. By fusing these variances, MVAL is able to select the instances which are both informative and representative. We employ our technique both in combination with logistic regression and support vector machines and demonstrate that MVAL achieves state-of-the-art performance in experiments on a large number of standard benchmark datasets.

Original language	English
Pages (from-to)	358-370
Number of pages	13
Journal	Pattern Recognition
Volume	78
DOIs	https://doi.org/10.1016/j.patcog.2018.01.017
Publication status	Published - 2018

Bibliographical note

Accepted Author Manuscript

Keywords

Active learning
Retraining information matrix
Variance maximization

Access to Document

10.1016/j.patcog.2018.01.017

47686957 - MVALAccepted author manuscript, 453 KBLicence: CC BY-NC-ND

Cite this

@article{1c0e80f90a8e4e349c0c188f868a86ad,

title = "A variance maximization criterion for active learning",

abstract = "Active learning aims to train a classifier as fast as possible with as few labels as possible. The core element in virtually any active learning strategy is the criterion that measures the usefulness of the unlabeled data based on which new points to be labeled are picked. We propose a novel approach which we refer to as maximizing variance for active learning or MVAL for short. MVAL measures the value of unlabeled instances by evaluating the rate of change of output variables caused by changes in the next sample to be queried and its potential labelling. In a sense, this criterion measures how unstable the classifier's output is for the unlabeled data points under perturbations of the training data. MVAL maintains, what we refer to as, retraining information matrices to keep track of these output scores and exploits two kinds of variance to measure the informativeness and representativeness, respectively. By fusing these variances, MVAL is able to select the instances which are both informative and representative. We employ our technique both in combination with logistic regression and support vector machines and demonstrate that MVAL achieves state-of-the-art performance in experiments on a large number of standard benchmark datasets.",

keywords = "Active learning, Retraining information matrix, Variance maximization",

author = "Yazhou Yang and Marco Loog",

note = "Accepted Author Manuscript",

year = "2018",

doi = "10.1016/j.patcog.2018.01.017",

language = "English",

volume = "78",

pages = "358--370",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier",

}

TY - JOUR

T1 - A variance maximization criterion for active learning

AU - Yang, Yazhou

AU - Loog, Marco

N1 - Accepted Author Manuscript

PY - 2018

Y1 - 2018

N2 - Active learning aims to train a classifier as fast as possible with as few labels as possible. The core element in virtually any active learning strategy is the criterion that measures the usefulness of the unlabeled data based on which new points to be labeled are picked. We propose a novel approach which we refer to as maximizing variance for active learning or MVAL for short. MVAL measures the value of unlabeled instances by evaluating the rate of change of output variables caused by changes in the next sample to be queried and its potential labelling. In a sense, this criterion measures how unstable the classifier's output is for the unlabeled data points under perturbations of the training data. MVAL maintains, what we refer to as, retraining information matrices to keep track of these output scores and exploits two kinds of variance to measure the informativeness and representativeness, respectively. By fusing these variances, MVAL is able to select the instances which are both informative and representative. We employ our technique both in combination with logistic regression and support vector machines and demonstrate that MVAL achieves state-of-the-art performance in experiments on a large number of standard benchmark datasets.

AB - Active learning aims to train a classifier as fast as possible with as few labels as possible. The core element in virtually any active learning strategy is the criterion that measures the usefulness of the unlabeled data based on which new points to be labeled are picked. We propose a novel approach which we refer to as maximizing variance for active learning or MVAL for short. MVAL measures the value of unlabeled instances by evaluating the rate of change of output variables caused by changes in the next sample to be queried and its potential labelling. In a sense, this criterion measures how unstable the classifier's output is for the unlabeled data points under perturbations of the training data. MVAL maintains, what we refer to as, retraining information matrices to keep track of these output scores and exploits two kinds of variance to measure the informativeness and representativeness, respectively. By fusing these variances, MVAL is able to select the instances which are both informative and representative. We employ our technique both in combination with logistic regression and support vector machines and demonstrate that MVAL achieves state-of-the-art performance in experiments on a large number of standard benchmark datasets.

KW - Active learning

KW - Retraining information matrix

KW - Variance maximization

UR - http://www.scopus.com/inward/record.url?scp=85042348817&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2018.01.017

DO - 10.1016/j.patcog.2018.01.017

M3 - Article

AN - SCOPUS:85042348817

SN - 0031-3203

VL - 78

SP - 358

EP - 370

JO - Pattern Recognition

JF - Pattern Recognition

ER -

A variance maximization criterion for active learning

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this