Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

L. Zeng; A. Lengyel; N. Tömen; J.C. van Gemert

Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

L. Zeng^*, A. Lengyel, N. Tömen, J.C. van Gemert

^*Corresponding author for this work

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

26 Downloads (Pure)

Abstract

In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with existing models, yet, we do not need to pre-train on ImageNet or COCO, while we are also more computationally efficient. Our code is available on https://github.com/LeungTsang/CPCDR.

Original language	English
Title of host publication	33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022
Publisher	BMVA Press
Number of pages	18
Publication status	Published - 2022
Event	33rd British Machine Vision Conference 2022 - London, United Kingdom Duration: 21 Nov 2022 → 24 Nov 2022 Conference number: 33

Conference

Conference	33rd British Machine Vision Conference 2022
Abbreviated title	BMVC 2022
Country/Territory	United Kingdom
City	London
Period	21/11/22 → 24/11/22

Access to Document

0893Final published version, 10.4 MB

Cite this

@inproceedings{92b07c5dd38747cab542aba467a34e2e,

title = "Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation",

abstract = "In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with existing models, yet, we do not need to pre-train on ImageNet or COCO, while we are also more computationally efficient. Our code is available on https://github.com/LeungTsang/CPCDR.",

author = "L. Zeng and A. Lengyel and N. T{\"o}men and {van Gemert}, J.C.",

year = "2022",

language = "English",

booktitle = "33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022",

publisher = "BMVA Press",

note = "33rd British Machine Vision Conference 2022, BMVC 2022 ; Conference date: 21-11-2022 Through 24-11-2022",

}

Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation. / Zeng, L.; Lengyel, A.; Tömen, N. et al.
33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022. BMVA Press, 2022.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

AU - Zeng, L.

AU - Lengyel, A.

AU - Tömen, N.

AU - van Gemert, J.C.

N1 - Conference code: 33

PY - 2022

Y1 - 2022

N2 - In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with existing models, yet, we do not need to pre-train on ImageNet or COCO, while we are also more computationally efficient. Our code is available on https://github.com/LeungTsang/CPCDR.

AB - In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with existing models, yet, we do not need to pre-train on ImageNet or COCO, while we are also more computationally efficient. Our code is available on https://github.com/LeungTsang/CPCDR.

UR - https://bmvc2022.mpi-inf.mpg.de/893/

M3 - Conference contribution

BT - 33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022

PB - BMVA Press

T2 - 33rd British Machine Vision Conference 2022

Y2 - 21 November 2022 through 24 November 2022

ER -

Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation

Abstract

Conference

Access to Document

Other files and links

Fingerprint

Cite this