Human-in-the-Loop Feature Discovery for Tabular Data

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

11 Downloads (Pure)

Abstract

In recent years, researchers have developed several methods to automate discovering datasets and augmenting features for training Machine Learning (ML) models. Together with feature selection, these efforts have paved the way towards what is termed the feature discovery process. Data scientists and engineers use automated feature discovery over tabular datasets to add new features from different sources and enrich training data. By surveying data practitioners, we have observed that automated feature discovery approaches do not allow data scientists to use their domain knowledge during the feature discovery process. In addition, automated feature discovery methods can leak private features or introduce biased ones.

In this paper, we introduce the first user-driven human-in-the-loop feature discovery method called HILAutoFeat. We demonstrate the capabilities of HILAutoFeat, which effectively combines automated feature discovery with user-driven insights. Our demonstration is centred around two scenarios: (i) an automated feature discovery scenario -- HILAutoFeat acts as a steward in a large data lake where the user is unaware of the quality and relevance of the data, and (ii) a scenario where HILAutoFeat and the user work together -- the user drives the feature discovery process by adding his domain and business knowledge, while HILAutoFeat performs the intensive computations.
Original languageEnglish
Title of host publicationCIKM '24
Subtitle of host publicationProceedings of the 33rd ACM International Conference on Information and Knowledge Management
Place of PublicationNew York, NY
PublisherACM
Pages5215-5219
Number of pages5
ISBN (Electronic)979-8-4007-0436-9
DOIs
Publication statusPublished - 2024
Event33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States
Duration: 21 Oct 202425 Oct 2024

Conference

Conference33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Country/TerritoryUnited States
CityBoise
Period21/10/2425/10/24

Keywords

  • AutoML
  • data science
  • feature discovery
  • human-in-the-loop
  • tabular data

Fingerprint

Dive into the research topics of 'Human-in-the-Loop Feature Discovery for Tabular Data'. Together they form a unique fingerprint.

Cite this