Con-Text: Text Detection for Fine-Grained Object Classification

Sezer Karaoğlu, Ran Tao, Jan van Gemert, Theo Gevers

Research output: Contribution to journalArticleScientificpeer-review

26 Citations (Scopus)


This paper focuses on fine-grained object classification using recognized scene text in natural images. While the state-of-the-art relies on visual cues only, this paper is the first work which proposes to combine textual and visual cues. Another novelty is the textual cue extraction. Unlike the state-of-the-art text detection methods, we focus more on the background instead of text regions. Once text regions are detected, they are further processed by two methods to perform text recognition, i.e., ABBYY commercial OCR engine and a state-of-the-art character recognition algorithm. Then, to perform textual cue encoding, bi- and trigrams are formed between the recognized characters by considering the proposed spatial pairwise constraints. Finally, extracted visual and textual cues are combined for fine-grained classification. The proposed method is validated on four publicly available data sets: ICDAR03, ICDAR13, Con-Text, and Flickr-logo. We improve the state-of-the-art end-to-end character recognition by a large margin of 15% on ICDAR03. We show that textual cues are useful in addition to visual cues for fine-grained classification. We show that textual cues are also useful for logo retrieval. Adding textual cues outperforms visual- and textual-only in fine-grained classification (70.7% to 60.3%) and logo retrieval (57.4% to 54.8%).

Original languageEnglish
Article number7933250
Pages (from-to)3965-3980
Number of pages16
JournalIEEE Transactions on Image Processing
Issue number8
Publication statusPublished - 2017


  • fine-grained classification
  • logo-retrieval
  • Multimodal fusion
  • text detection
  • text saliency


Dive into the research topics of 'Con-Text: Text Detection for Fine-Grained Object Classification'. Together they form a unique fingerprint.

Cite this