Exploring the Feasibility of Crowd-Powered Decomposition of Complex User Questions in Text-to-SQL Tasks

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

3 Citations (Scopus)
78 Downloads (Pure)

Abstract

Natural Language Interfaces to Databases (NLIDB), also known as Text-to-SQL models, enable users with different levels of knowledge in Structured Query Language (SQL) to access relational databases without any programming effort. By translating natural languages into SQL query, not only do NLIDBs minimize the burden of memorizing the schema of databases and writing complex SQL queries, but they also allow non-experts to acquire information from databases in natural languages. However, existing NLIDBs largely fail to translate natural languages to SQL when they are complex, preventing them from being deployed in real-world scenarios and generalizing across unseen complex databases. In this paper, we explored the feasibility of decomposing complex user questions into multiple sub-questions - each with a reduced complexity - as a means to circumvent the problem of complex SQL generation. We investigated the feasibility of decomposing complex user questions in a manner that each sub-question is simple enough for existing NLIDBs to generate correct SQL queries, using non-expert crowd workers in juxtaposition with SQL experts. Through an empirical study on an NLIDB benchmark dataset, we found that crowd-powered decomposition of complex user questions led to an accuracy boost of an existing Text-to-SQL pipeline from 30% to 59% (96% accuracy boost). Similarly, decomposition by SQL experts resulted in boosting the accuracy to 76% (153% accuracy boost). Our findings suggest that crowd-powered decomposition can be a scalable alternative to producing the training data necessary to build machine learning models that can automatically decompose complex user questions, thereby improving Text-to-SQL pipelines.

Original languageEnglish
Title of host publicationHT 2022
Subtitle of host publication33rd ACM Conference on Hypertext and Social Media - Co-located with ACM WebSci 2022 and ACM UMAP 2022
PublisherAssociation for Computing Machinery (ACM)
Pages154-165
Number of pages12
ISBN (Electronic)978-1-4503-9233-4
DOIs
Publication statusPublished - 2022
Event33rd ACM Conference on Hypertext and Social Media, HT 2022 - Co-located with ACM WebSci 2022 and ACM UMAP 2022 - Virtual, Online, Spain
Duration: 28 Jun 20221 Jul 2022

Publication series

NameHT 2022: 33rd ACM Conference on Hypertext and Social Media - Co-located with ACM WebSci 2022 and ACM UMAP 2022

Conference

Conference33rd ACM Conference on Hypertext and Social Media, HT 2022 - Co-located with ACM WebSci 2022 and ACM UMAP 2022
Country/TerritorySpain
CityVirtual, Online
Period28/06/221/07/22

Keywords

  • Corpus Annotation
  • Crowdsourcing
  • Human Computation
  • Natural Language Interface to Databases
  • Semantic Parsing
  • Text-to-SQL

Fingerprint

Dive into the research topics of 'Exploring the Feasibility of Crowd-Powered Decomposition of Complex User Questions in Text-to-SQL Tasks'. Together they form a unique fingerprint.

Cite this