Deep Visual City Recognition Visualization

Xiangwei Shi; Seyran Khademi; Jan van Gemert

Deep Visual City Recognition Visualization

Xiangwei Shi, Seyran Khademi, Jan van Gemert

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

85 Downloads (Pure)

Abstract

Understanding how cities visually differ from each others is interesting for planners, residents, and historians. We investigate the interpretation of deep features learned by convolutional neural networks (CNNs) for city recognition. Given a trained city recognition network, we first generate weighted masks using the known Grad-CAM technique and to select the most discriminate regions in the image. Since the image classification label is the city name, it contains no information of objects that are class-discriminate, we investigate the interpretability of deep representations with two methods. (i) Unsupervised method is used to cluster the objects appearing in the visual explanations. (ii) A pretrained semantic segmentation model is used to label objects in pixel level, and then we introduce statistical measures to quantitatively evaluate the interpretability of discriminate objects. The influence of network architectures and random initializations in training, is studied on the interpretability of CNN features for city recognition. The results suggest that network architectures would affect the interpretability of learned visual representations greater than different initializations.

Original language	English
Title of host publication	NCCV 2019 – The Netherlands Conference on Computer Vision
Pages	1-6
Number of pages	6
Publication status	Published - 2019
Event	NCCV 2019 – The Netherlands Conference on Computer Vision - Wageningen, Netherlands Duration: 16 Dec 2019 → 17 Dec 2019

Conference

Conference	NCCV 2019 – The Netherlands Conference on Computer Vision
Abbreviated title	NCCV 2019
Country/Territory	Netherlands
City	Wageningen
Period	16/12/19 → 17/12/19

Access to Document

Shi_Deep_Visual_City_Recognition_Visualization_CVPRW_2019_paperFinal published version, 4.82 MB

Cite this

@inproceedings{3c430c3c1a2c438596818d3f16752697,

title = "Deep Visual City Recognition Visualization",

abstract = "Understanding how cities visually differ from each others is interesting for planners, residents, and historians. We investigate the interpretation of deep features learned by convolutional neural networks (CNNs) for city recognition. Given a trained city recognition network, we first generate weighted masks using the known Grad-CAM technique and to select the most discriminate regions in the image. Since the image classification label is the city name, it contains no information of objects that are class-discriminate, we investigate the interpretability of deep representations with two methods. (i) Unsupervised method is used to cluster the objects appearing in the visual explanations. (ii) A pretrained semantic segmentation model is used to label objects in pixel level, and then we introduce statistical measures to quantitatively evaluate the interpretability of discriminate objects. The influence of network architectures and random initializations in training, is studied on the interpretability of CNN features for city recognition. The results suggest that network architectures would affect the interpretability of learned visual representations greater than different initializations. ",

author = "Xiangwei Shi and Seyran Khademi and {van Gemert}, Jan",

year = "2019",

language = "English",

pages = "1--6",

booktitle = "NCCV 2019 – The Netherlands Conference on Computer Vision",

note = "NCCV 2019 – The Netherlands Conference on<br/>Computer Vision, NCCV 2019 ; Conference date: 16-12-2019 Through 17-12-2019",

}

TY - GEN

T1 - Deep Visual City Recognition Visualization

AU - Shi, Xiangwei

AU - Khademi, Seyran

AU - van Gemert, Jan

PY - 2019

Y1 - 2019

N2 - Understanding how cities visually differ from each others is interesting for planners, residents, and historians. We investigate the interpretation of deep features learned by convolutional neural networks (CNNs) for city recognition. Given a trained city recognition network, we first generate weighted masks using the known Grad-CAM technique and to select the most discriminate regions in the image. Since the image classification label is the city name, it contains no information of objects that are class-discriminate, we investigate the interpretability of deep representations with two methods. (i) Unsupervised method is used to cluster the objects appearing in the visual explanations. (ii) A pretrained semantic segmentation model is used to label objects in pixel level, and then we introduce statistical measures to quantitatively evaluate the interpretability of discriminate objects. The influence of network architectures and random initializations in training, is studied on the interpretability of CNN features for city recognition. The results suggest that network architectures would affect the interpretability of learned visual representations greater than different initializations.

AB - Understanding how cities visually differ from each others is interesting for planners, residents, and historians. We investigate the interpretation of deep features learned by convolutional neural networks (CNNs) for city recognition. Given a trained city recognition network, we first generate weighted masks using the known Grad-CAM technique and to select the most discriminate regions in the image. Since the image classification label is the city name, it contains no information of objects that are class-discriminate, we investigate the interpretability of deep representations with two methods. (i) Unsupervised method is used to cluster the objects appearing in the visual explanations. (ii) A pretrained semantic segmentation model is used to label objects in pixel level, and then we introduce statistical measures to quantitatively evaluate the interpretability of discriminate objects. The influence of network architectures and random initializations in training, is studied on the interpretability of CNN features for city recognition. The results suggest that network architectures would affect the interpretability of learned visual representations greater than different initializations.

M3 - Conference contribution

SP - 1

EP - 6

BT - NCCV 2019 – The Netherlands Conference on Computer Vision

T2 - NCCV 2019 – The Netherlands Conference on<br/>Computer Vision

Y2 - 16 December 2019 through 17 December 2019

ER -

Deep Visual City Recognition Visualization

Abstract

Conference

Access to Document

Fingerprint

Cite this