Unsupervised Domain Adaptation for Question Generation with Domain Data Selection and Self-training

Peide Zhu, Claudia Hauff

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)
64 Downloads (Pure)

Abstract

Question generation (QG) approaches based on large neural models require (i) large-scale and (ii) high-quality training data. These two requirements pose difficulties for specific application domains where training data is expensive and difficult to obtain. The trained QG models' effectiveness can degrade significantly when they are applied on a different domain due to domain shift. In this paper, we explore an unsupervised domain adaptation approach to combat the lack of training data and domain shift issue with domain data selection and self-training. We first present a novel answer-aware strategy for domain data selection to select data with the most similarity to a new domain. The selected data are then used as pseudo in-domain data to retrain the QG model. We then present generation confidenceguided self-training with two generation confidence modeling methods: (i) generated questions' perplexity and (ii) the fluency score. We test our approaches on three large public datasets with different domain similarities, using a transformer-based pre-trained QG model. The results show that our proposed approaches outperform the baselines, and show the viability of unsupervised domain adaptation with answer-aware data selection and self-training on the QG task. The code is available at https://github.com/zpeide/transfer_qg.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: NAACL 2022
Subtitle of host publicationNAACL 2022 - Findings
PublisherAssociation for Computational Linguistics (ACL)
Pages2388-2401
Number of pages14
ISBN (Electronic)9781955917766
Publication statusPublished - 2022
Event2022 Findings of the Association for Computational Linguistics: NAACL 2022 - Seattle, United States
Duration: 10 Jul 202215 Jul 2022

Publication series

NameFindings of the Association for Computational Linguistics: NAACL 2022 - Findings

Conference

Conference2022 Findings of the Association for Computational Linguistics: NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period10/07/2215/07/22

Fingerprint

Dive into the research topics of 'Unsupervised Domain Adaptation for Question Generation with Domain Data Selection and Self-training'. Together they form a unique fingerprint.

Cite this