Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical Results

Alexander Mey; Marco Loog

doi:10.1109/TPAMI.2022.3198175

Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical Results

Alexander Mey, Marco Loog

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

1 Citation (Scopus)

28 Downloads (Pure)

Abstract

Semi-supervised learning is the learning setting in which we have both labeled and unlabeled data at our disposal. This survey covers theoretical results for this setting and maps out the benefits of unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met, including unlabeled data may actually decrease performance. For all practical purposes, it is therefore instructive to have an understanding of the underlying theory and the possible learning behavior that comes with it. This survey gathers results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. Specifically, it aims to answer the following questions: what are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? As, indeed, the precise assumptions made are of the essence, this is where the survey's particular attention goes out to.

Original language	English
Pages (from-to)	4747-4767
Number of pages	21
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	45
Issue number	4
DOIs	https://doi.org/10.1109/TPAMI.2022.3198175
Publication status	Published - 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Complexity theory
Geometry
Manifolds
Semisupervised learning
Standards
Supervised learning
Task analysis

Access to Document

10.1109/TPAMI.2022.3198175

Improved_Generalization_in_Semi-Supervised_Learning_A_Survey_of_Theoretical_ResultsFinal published version, 638 KB

Cite this

@article{ceb50b783fe64459a62236e0baeb9380,

title = "Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical Results",

abstract = "Semi-supervised learning is the learning setting in which we have both labeled and unlabeled data at our disposal. This survey covers theoretical results for this setting and maps out the benefits of unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met, including unlabeled data may actually decrease performance. For all practical purposes, it is therefore instructive to have an understanding of the underlying theory and the possible learning behavior that comes with it. This survey gathers results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. Specifically, it aims to answer the following questions: what are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? As, indeed, the precise assumptions made are of the essence, this is where the survey's particular attention goes out to.",

keywords = "Complexity theory, Geometry, Manifolds, Semisupervised learning, Standards, Supervised learning, Task analysis",

author = "Alexander Mey and Marco Loog",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ",

year = "2022",

doi = "10.1109/TPAMI.2022.3198175",

language = "English",

volume = "45",

pages = "4747--4767",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE",

number = "4",

}

TY - JOUR

T1 - Improved Generalization in Semi-Supervised Learning

T2 - A Survey of Theoretical Results

AU - Mey, Alexander

AU - Loog, Marco

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - Semi-supervised learning is the learning setting in which we have both labeled and unlabeled data at our disposal. This survey covers theoretical results for this setting and maps out the benefits of unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met, including unlabeled data may actually decrease performance. For all practical purposes, it is therefore instructive to have an understanding of the underlying theory and the possible learning behavior that comes with it. This survey gathers results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. Specifically, it aims to answer the following questions: what are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? As, indeed, the precise assumptions made are of the essence, this is where the survey's particular attention goes out to.

AB - Semi-supervised learning is the learning setting in which we have both labeled and unlabeled data at our disposal. This survey covers theoretical results for this setting and maps out the benefits of unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met, including unlabeled data may actually decrease performance. For all practical purposes, it is therefore instructive to have an understanding of the underlying theory and the possible learning behavior that comes with it. This survey gathers results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. Specifically, it aims to answer the following questions: what are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? As, indeed, the precise assumptions made are of the essence, this is where the survey's particular attention goes out to.

KW - Complexity theory

KW - Geometry

KW - Manifolds

KW - Semisupervised learning

KW - Standards

KW - Supervised learning

KW - Task analysis

UR - http://www.scopus.com/inward/record.url?scp=85136844917&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2022.3198175

DO - 10.1109/TPAMI.2022.3198175

M3 - Article

AN - SCOPUS:85136844917

SN - 0162-8828

VL - 45

SP - 4747

EP - 4767

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 4

ER -

Improved Generalization in Semi-Supervised Learning: A Survey of Theoretical Results

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this