Join Path-Based Data Augmentation for Decision Trees

Andra Ionescu; Rihan Hai; Marios  Fragkoulis; Asterios Katsifodimos

doi:10.1109/ICDEW55742.2022.00018

Join Path-Based Data Augmentation for Decision Trees

Andra Ionescu, Rihan Hai, Marios Fragkoulis, Asterios Katsifodimos

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

1 Citation (Scopus)

38 Downloads (Pure)

Abstract

Machine Learning (ML) applications require high-quality datasets. Automated data augmentation techniques can help increase the richness of training data, thus increasing the ML model accuracy. Existing solutions focus on efficiency and ML model accuracy but do not exploit the richness of dataset relationships. With relational data, the challenge lies in identifying join paths that best augment a feature table to increase the performance of a model. In this paper we propose a two-step, automated data augmentation approach for relational data that involves: (i) enumerating join paths of various lengths given a base table and (ii) ranking the join paths using filter methods for feature selection. We show that our approach can improve prediction accuracy and reduce runtime compared to the baseline approach.

Original language	English
Title of host publication	Proceedings of the 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)
Editors	L. O'Conner
Place of Publication	Piscataway
Publisher	IEEE
Pages	84-88
Number of pages	5
ISBN (Electronic)	978-1-6654-8104-5
ISBN (Print)	978-1-6654-8105-2
DOIs	https://doi.org/10.1109/ICDEW55742.2022.00018
Publication status	Published - 2022
Event	2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW) - Kuala Lumpur, Malaysia Duration: 9 May 2022 → 9 May 2022 Conference number: 38th

Conference

Conference	2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)
Country/Territory	Malaysia
City	Kuala Lumpur
Period	9/05/22 → 9/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Access to Document

10.1109/ICDEW55742.2022.00018

Join_Path-Based_Data_Augmentation_for_Decision_TreesFinal published version, 375 KB

Cite this

@inproceedings{f3800d88755e46e8927d5e70188fc47d,

title = "Join Path-Based Data Augmentation for Decision Trees",

abstract = "Machine Learning (ML) applications require high-quality datasets. Automated data augmentation techniques can help increase the richness of training data, thus increasing the ML model accuracy. Existing solutions focus on efficiency and ML model accuracy but do not exploit the richness of dataset relationships. With relational data, the challenge lies in identifying join paths that best augment a feature table to increase the performance of a model. In this paper we propose a two-step, automated data augmentation approach for relational data that involves: (i) enumerating join paths of various lengths given a base table and (ii) ranking the join paths using filter methods for feature selection. We show that our approach can improve prediction accuracy and reduce runtime compared to the baseline approach.",

author = "Andra Ionescu and Rihan Hai and Marios Fragkoulis and Asterios Katsifodimos",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW) ; Conference date: 09-05-2022 Through 09-05-2022",

year = "2022",

doi = "10.1109/ICDEW55742.2022.00018",

language = "English",

isbn = "978-1-6654-8105-2",

pages = "84--88",

editor = "L. O'Conner",

booktitle = "Proceedings of the 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)",

publisher = "IEEE",

address = "United States",

}

Ionescu, A , Hai, R , Fragkoulis, M & Katsifodimos, A 2022, Join Path-Based Data Augmentation for Decision Trees. in L O'Conner (ed.), Proceedings of the 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)., 9814493, IEEE, Piscataway, pp. 84-88, 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), Kuala Lumpur, Malaysia, 9/05/22. https://doi.org/10.1109/ICDEW55742.2022.00018

Join Path-Based Data Augmentation for Decision Trees. / Ionescu, Andra ; Hai, Rihan ; Fragkoulis, Marios et al.
Proceedings of the 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW). ed. / L. O'Conner. Piscataway: IEEE, 2022. p. 84-88 9814493.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Join Path-Based Data Augmentation for Decision Trees

AU - Ionescu, Andra

AU - Hai, Rihan

AU - Fragkoulis, Marios

AU - Katsifodimos, Asterios

N1 - Conference code: 38th

PY - 2022

Y1 - 2022

N2 - Machine Learning (ML) applications require high-quality datasets. Automated data augmentation techniques can help increase the richness of training data, thus increasing the ML model accuracy. Existing solutions focus on efficiency and ML model accuracy but do not exploit the richness of dataset relationships. With relational data, the challenge lies in identifying join paths that best augment a feature table to increase the performance of a model. In this paper we propose a two-step, automated data augmentation approach for relational data that involves: (i) enumerating join paths of various lengths given a base table and (ii) ranking the join paths using filter methods for feature selection. We show that our approach can improve prediction accuracy and reduce runtime compared to the baseline approach.

AB - Machine Learning (ML) applications require high-quality datasets. Automated data augmentation techniques can help increase the richness of training data, thus increasing the ML model accuracy. Existing solutions focus on efficiency and ML model accuracy but do not exploit the richness of dataset relationships. With relational data, the challenge lies in identifying join paths that best augment a feature table to increase the performance of a model. In this paper we propose a two-step, automated data augmentation approach for relational data that involves: (i) enumerating join paths of various lengths given a base table and (ii) ranking the join paths using filter methods for feature selection. We show that our approach can improve prediction accuracy and reduce runtime compared to the baseline approach.

UR - http://www.scopus.com/inward/record.url?scp=85134879140&partnerID=8YFLogxK

U2 - 10.1109/ICDEW55742.2022.00018

DO - 10.1109/ICDEW55742.2022.00018

M3 - Conference contribution

SN - 978-1-6654-8105-2

SP - 84

EP - 88

BT - Proceedings of the 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)

A2 - O'Conner, L.

PB - IEEE

CY - Piscataway

T2 - 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)

Y2 - 9 May 2022 through 9 May 2022

ER -

Join Path-Based Data Augmentation for Decision Trees

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this