Prototype Selection for Finding Efficient Representations of Dissimilarity Data

EM Pekalska; RPW Duin

Prototype Selection for Finding Efficient Representations of Dissimilarity Data

EM Pekalska, RPW Duin

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

10 Citations (Scopus)

Abstract

The nearest neighbor (NN) rule is a simple and intuitive method for solving classification problems. Originally, it uses distances to the complete training set. It performs well, however, it is sensitive to noisy objects, due to its operation on local neighborhoods only. A more global approach is possible by mapping the distance data onto a pseudo- Euclidean space, such that the distances are preserved as well as possible. Then, a classifier built in such a space can outperform the NN rule. However, again all objects from the training set are used for a projection of new data. This paper addresses the issue of reducing the training set while possibly preserving the original structure of the mapped data. Some criteria are introduced and evaluated against two problems, polygon recognition and digit recognition. Our experiments show that the representation mismatch criterion is beneficial for the applications considered. Moreover, the linear classifier built in the pseudo- Euclidean space, determined by 20%¿25% of the training objects, outperforms the NN rule based on all of them.

Original language	Undefined/Unknown
Title of host publication	ICPR16, Proceedings
Editors	R Kasturi, D Laurendeau, C Suen
Place of Publication	Los Alamitos, CA
Publisher	IEEE
Pages	37-40
Number of pages	4
ISBN (Print)	0-7695-1696-3
Publication status	Published - 2002
Event	16th International Conference on Pattern Recognition (Quebec City, Canada), vol. III - Los Alamitos, CA Duration: 11 Aug 2002 → 15 Aug 2002

Publication series

Name
Publisher	IEEE Computer Society Press

Name	International Conference on Pattern Recognition
Volume	3
ISSN (Print)	1051-4651

Conference

Conference	16th International Conference on Pattern Recognition (Quebec City, Canada), vol. III
Period	11/08/02 → 15/08/02

Bibliographical note

ISSN 1051-4651, phpub 45

Keywords

conference contrib. refereed
Conf.proc. > 3 pag

Cite this

@inproceedings{0c21e366472446ed94fdd1d174244372,

title = "Prototype Selection for Finding Efficient Representations of Dissimilarity Data",

abstract = "The nearest neighbor (NN) rule is a simple and intuitive method for solving classification problems. Originally, it uses distances to the complete training set. It performs well, however, it is sensitive to noisy objects, due to its operation on local neighborhoods only. A more global approach is possible by mapping the distance data onto a pseudo- Euclidean space, such that the distances are preserved as well as possible. Then, a classifier built in such a space can outperform the NN rule. However, again all objects from the training set are used for a projection of new data. This paper addresses the issue of reducing the training set while possibly preserving the original structure of the mapped data. Some criteria are introduced and evaluated against two problems, polygon recognition and digit recognition. Our experiments show that the representation mismatch criterion is beneficial for the applications considered. Moreover, the linear classifier built in the pseudo- Euclidean space, determined by 20%¿25% of the training objects, outperforms the NN rule based on all of them.",

keywords = "conference contrib. refereed, Conf.proc. > 3 pag",

author = "EM Pekalska and RPW Duin",

note = "ISSN 1051-4651, phpub 45; 16th International Conference on Pattern Recognition (Quebec City, Canada), vol. III ; Conference date: 11-08-2002 Through 15-08-2002",

year = "2002",

language = "Undefined/Unknown",

isbn = "0-7695-1696-3",

publisher = "IEEE",

pages = "37--40",

editor = "R Kasturi and D Laurendeau and C Suen",

booktitle = "ICPR16, Proceedings",

address = "United States",

}

Prototype Selection for Finding Efficient Representations of Dissimilarity Data. / Pekalska, EM; Duin, RPW.
ICPR16, Proceedings. ed. / R Kasturi; D Laurendeau; C Suen. Los Alamitos, CA: IEEE, 2002. p. 37-40 (International Conference on Pattern Recognition; Vol. 3).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Prototype Selection for Finding Efficient Representations of Dissimilarity Data

AU - Pekalska, EM

AU - Duin, RPW

N1 - ISSN 1051-4651, phpub 45

PY - 2002

Y1 - 2002

N2 - The nearest neighbor (NN) rule is a simple and intuitive method for solving classification problems. Originally, it uses distances to the complete training set. It performs well, however, it is sensitive to noisy objects, due to its operation on local neighborhoods only. A more global approach is possible by mapping the distance data onto a pseudo- Euclidean space, such that the distances are preserved as well as possible. Then, a classifier built in such a space can outperform the NN rule. However, again all objects from the training set are used for a projection of new data. This paper addresses the issue of reducing the training set while possibly preserving the original structure of the mapped data. Some criteria are introduced and evaluated against two problems, polygon recognition and digit recognition. Our experiments show that the representation mismatch criterion is beneficial for the applications considered. Moreover, the linear classifier built in the pseudo- Euclidean space, determined by 20%¿25% of the training objects, outperforms the NN rule based on all of them.

AB - The nearest neighbor (NN) rule is a simple and intuitive method for solving classification problems. Originally, it uses distances to the complete training set. It performs well, however, it is sensitive to noisy objects, due to its operation on local neighborhoods only. A more global approach is possible by mapping the distance data onto a pseudo- Euclidean space, such that the distances are preserved as well as possible. Then, a classifier built in such a space can outperform the NN rule. However, again all objects from the training set are used for a projection of new data. This paper addresses the issue of reducing the training set while possibly preserving the original structure of the mapped data. Some criteria are introduced and evaluated against two problems, polygon recognition and digit recognition. Our experiments show that the representation mismatch criterion is beneficial for the applications considered. Moreover, the linear classifier built in the pseudo- Euclidean space, determined by 20%¿25% of the training objects, outperforms the NN rule based on all of them.

KW - conference contrib. refereed

KW - Conf.proc. > 3 pag

UR - http://www.computer.org/proceedings/icpr/1695/volume3/169530037abs.htm

M3 - Conference contribution

SN - 0-7695-1696-3

SP - 37

EP - 40

BT - ICPR16, Proceedings

A2 - Kasturi, R

A2 - Laurendeau, D

A2 - Suen, C

PB - IEEE

CY - Los Alamitos, CA

T2 - 16th International Conference on Pattern Recognition (Quebec City, Canada), vol. III

Y2 - 11 August 2002 through 15 August 2002

ER -

Prototype Selection for Finding Efficient Representations of Dissimilarity Data

Abstract

Publication series

Conference

Bibliographical note

Keywords

Other files and links

Cite this