Radial Graph Convolutional Network for Visual Question Generation

Xing Xu, Tan Wang, Yang Yang, Alan Hanjalic, Heng Tao Shen

Research output: Contribution to journalArticleScientificpeer-review

2 Citations (Scopus)

Abstract

In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question answer (VQA) task, requiring the exhaustive match among all the image regions and the given answer. To reduce the complexity, we propose an innovative answer-centric approach termed radial graph convolutional network (Radial-GCN) to focus on the relevant image regions only. Our Radial-GCN method can quickly find the core answer area in an image by matching the latent answer with the semantic labels learned from all image regions. Then, a novel sparse graph of the radial structure is naturally built to capture the associations between the core node (i.e., answer area) and peripheral nodes (i.e., other areas); the graphic attention is subsequently adopted to steer the convolutional propagation toward potentially more relevant nodes for final question generation. Extensive experiments on three benchmark data sets show the superiority of our approach compared with the reference methods. Even in the unexplored challenging zero-shot VQA task, the synthesized questions by our method remarkably boost the performance of several state-of-the-art VQA methods from 0% to over 40%. The implementation code of our proposed method and the successfully generated questions are available at https://github.com/Wangt-CN/VQG-GCN.

Original languageEnglish
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
Publication statusE-pub ahead of print - 2020

Keywords

  • Cross-media understanding
  • graph convolutional network (GCN)
  • visual question generation (VQG).

Fingerprint Dive into the research topics of 'Radial Graph Convolutional Network for Visual Question Generation'. Together they form a unique fingerprint.

Cite this