TY - GEN
T1 - Amalur
T2 - 39th IEEE International Conference on Data Engineering, ICDE 2023
AU - Hai, Rihan
AU - Koutras, Christos
AU - Ionescu, Andra
AU - Li, Ziyu
AU - Sun, Wenbo
AU - van Schijndel, Jessie
AU - Kang, Yan
AU - Katsifodimos, Asterios
N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
PY - 2023
Y1 - 2023
N2 - Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the premises of data silos, hence model training should proceed in a decentralized manner. In this work, we present a vision of how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. Towards this direction, we analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning and federated learning.
AB - Machine learning (ML) training data is often scattered across disparate collections of datasets, called data silos. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the premises of data silos, hence model training should proceed in a decentralized manner. In this work, we present a vision of how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. Towards this direction, we analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight new research opportunities from the aspects of systems, representations, factorized learning and federated learning.
UR - http://www.scopus.com/inward/record.url?scp=85167664964&partnerID=8YFLogxK
U2 - 10.1109/ICDE55515.2023.00301
DO - 10.1109/ICDE55515.2023.00301
M3 - Conference contribution
AN - SCOPUS:85167664964
SN - 979-8-3503-2228-6
T3 - Proceedings - International Conference on Data Engineering
SP - 3729
EP - 3739
BT - Proceedings of the 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PB - IEEE
CY - Piscataway
Y2 - 3 April 2023 through 7 April 2023
ER -