Projected estimators for robust semi-supervised classification

Jesse H. Krijthe; Marco Loog

doi:10.1007/s10994-017-5626-8

Projected estimators for robust semi-supervised classification

Jesse H. Krijthe^*, Marco Loog

^*Corresponding author for this work

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

15 Citations (Scopus)

52 Downloads (Pure)

Abstract

For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy typically considered in practice.

Original language	English
Pages (from-to)	993-1008
Number of pages	16
Journal	Machine Learning
Volume	106
Issue number	7
DOIs	https://doi.org/10.1007/s10994-017-5626-8
Publication status	Published - 1 Jul 2017

Keywords

Least squares classification
Projection
Semi-supervised learning

Access to Document

10.1007/s10994-017-5626-8

10.1007_s10994-017-5626-8Final published version, 545 KBLicence: CC BY

Cite this

@article{becaa8c97e5a46c3aaca87763b45a28a,

title = "Projected estimators for robust semi-supervised classification",

abstract = "For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy typically considered in practice.",

keywords = "Least squares classification, Projection, Semi-supervised learning",

author = "Krijthe, {Jesse H.} and Marco Loog",

year = "2017",

month = jul,

day = "1",

doi = "10.1007/s10994-017-5626-8",

language = "English",

volume = "106",

pages = "993--1008",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer",

number = "7",

}

TY - JOUR

T1 - Projected estimators for robust semi-supervised classification

AU - Krijthe, Jesse H.

AU - Loog, Marco

PY - 2017/7/1

Y1 - 2017/7/1

N2 - For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy typically considered in practice.

AB - For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy typically considered in practice.

KW - Least squares classification

KW - Projection

KW - Semi-supervised learning

UR - http://resolver.tudelft.nl/uuid:becaa8c9-7e5a-46c3-aaca-87763b45a28a

UR - http://www.scopus.com/inward/record.url?scp=85016999214&partnerID=8YFLogxK

U2 - 10.1007/s10994-017-5626-8

DO - 10.1007/s10994-017-5626-8

M3 - Article

AN - SCOPUS:85016999214

SN - 0885-6125

VL - 106

SP - 993

EP - 1008

JO - Machine Learning

JF - Machine Learning

IS - 7

ER -

Projected estimators for robust semi-supervised classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this