Effects of sampling skewness of the importance-weighted risk estimator on model selection

Wouter Kouw; Marco Loog

doi:10.1109/ICPR.2018.8546186

Effects of sampling skewness of the importance-weighted risk estimator on model selection

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

3 Citations (Scopus)

Abstract

Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for data sets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to sub-optimal regularization parameters when used for importance-weighted validation.

Original language	English
Title of host publication	2018 24th International Conference on Pattern Recognition (ICPR)
Publisher	IEEE
Pages	1468-1473
Number of pages	6
ISBN (Electronic)	978-1-5386-3788-3
ISBN (Print)	978-1-5386-3789-0
DOIs	https://doi.org/10.1109/ICPR.2018.8546186
Publication status	Published - 2018
Event	2018 24th International Conference on Pattern Recognition, ICPR 2018 - Beijing, China Duration: 20 Aug 2018 → 24 Aug 2018

Conference

Conference	2018 24th International Conference on Pattern Recognition, ICPR 2018
Country/Territory	China
City	Beijing
Period	20/08/18 → 24/08/18

Access to Document

10.1109/ICPR.2018.8546186

Cite this

@inproceedings{707b7230157541ad8d1706985e2e2bc9,

title = "Effects of sampling skewness of the importance-weighted risk estimator on model selection",

abstract = "Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for data sets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to sub-optimal regularization parameters when used for importance-weighted validation.",

author = "Wouter Kouw and Marco Loog",

year = "2018",

doi = "10.1109/ICPR.2018.8546186",

language = "English",

isbn = "978-1-5386-3789-0 ",

pages = "1468--1473",

booktitle = "2018 24th International Conference on Pattern Recognition (ICPR)",

publisher = "IEEE",

address = "United States",

note = "2018 24th International Conference on Pattern Recognition, ICPR 2018 ; Conference date: 20-08-2018 Through 24-08-2018",

}

TY - GEN

T1 - Effects of sampling skewness of the importance-weighted risk estimator on model selection

AU - Kouw, Wouter

AU - Loog, Marco

PY - 2018

Y1 - 2018

N2 - Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for data sets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to sub-optimal regularization parameters when used for importance-weighted validation.

AB - Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for data sets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to sub-optimal regularization parameters when used for importance-weighted validation.

U2 - 10.1109/ICPR.2018.8546186

DO - 10.1109/ICPR.2018.8546186

M3 - Conference contribution

SN - 978-1-5386-3789-0

SP - 1468

EP - 1473

BT - 2018 24th International Conference on Pattern Recognition (ICPR)

PB - IEEE

T2 - 2018 24th International Conference on Pattern Recognition, ICPR 2018

Y2 - 20 August 2018 through 24 August 2018

ER -

Effects of sampling skewness of the importance-weighted risk estimator on model selection

Abstract

Conference

Access to Document

Fingerprint

Cite this