Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations

Zimin Xia; Olaf Booij; Marco Manfredi; Julian Kooij

doi:10.1109/LRA.2021.3088076

Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations

Zimin Xia, Olaf Booij, Marco Manfredi, Julian Kooij

Intelligent Vehicles

Research output: Contribution to journal › Article › Scientific › peer-review

9 Citations (Scopus)

65 Downloads (Pure)

Abstract

Cross-view matching aims to learn a shared image representation between ground-level images and satellite or aerial images at the same locations. In robotic vehicles, matching a camera image to a database of geo-referenced aerial imagery can serve as a method for self-localization. However, existing work on cross-view matching only aims at global localization, and overlooks the easily accessible rough location estimates from GNSS or temporal filtering. We argue that the availability of coarse location estimates at test time should already be considered during training. We adopt a simple but effective adaptation to the common triplet loss, resulting in an image representation that is more discriminative within the geographically local neighborhood, without any modifications to a baseline deep neural network. Experiments on the CVACT dataset confirm that the improvements generalize across spatial regions. On a new benchmark constructed from the Oxford RobotCar dataset, we also show generalization across recording days within the same region. Finally, we validate that improvements on these image-retrieval benchmarks also translate to a real-world localization task. Using a particle filter to fuse the cross-view matching scores of a vehicle's camera stream with real GPS measurements, our learned geographically local representation reduces the mean localization error by 17\% compared to the standard global representation learned by the current state-of-the-art.

Original language	English
Pages (from-to)	5921-5928
Journal	IEEE Robotics and Automation Letters
Volume	6
Issue number	3
DOIs	https://doi.org/10.1109/LRA.2021.3088076
Publication status	Published - 2021

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Benchmark testing
Global navigation satellite system
Intelligent Transportation Systems
Localization
Location awareness
Representation Learning
Satellites
Sensors
Task analysis
Training

Access to Document

10.1109/LRA.2021.3088076

Cross-View_Matching_for_Vehicle_Localization_by_Learning_Geographically_Local_RepresentationsFinal published version, 1.66 MB

Cite this

@article{5c854dd6f69a4302ab84aac2ac3fd8d9,

title = "Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations",

abstract = "Cross-view matching aims to learn a shared image representation between ground-level images and satellite or aerial images at the same locations. In robotic vehicles, matching a camera image to a database of geo-referenced aerial imagery can serve as a method for self-localization. However, existing work on cross-view matching only aims at global localization, and overlooks the easily accessible rough location estimates from GNSS or temporal filtering. We argue that the availability of coarse location estimates at test time should already be considered during training. We adopt a simple but effective adaptation to the common triplet loss, resulting in an image representation that is more discriminative within the geographically local neighborhood, without any modifications to a baseline deep neural network. Experiments on the CVACT dataset confirm that the improvements generalize across spatial regions. On a new benchmark constructed from the Oxford RobotCar dataset, we also show generalization across recording days within the same region. Finally, we validate that improvements on these image-retrieval benchmarks also translate to a real-world localization task. Using a particle filter to fuse the cross-view matching scores of a vehicle's camera stream with real GPS measurements, our learned geographically local representation reduces the mean localization error by 17\% compared to the standard global representation learned by the current state-of-the-art.",

keywords = "Benchmark testing, Global navigation satellite system, Intelligent Transportation Systems, Localization, Location awareness, Representation Learning, Satellites, Sensors, Task analysis, Training",

author = "Zimin Xia and Olaf Booij and Marco Manfredi and Julian Kooij",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2021",

doi = "10.1109/LRA.2021.3088076",

language = "English",

volume = "6",

pages = "5921--5928",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "3",

}

TY - JOUR

T1 - Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations

AU - Xia, Zimin

AU - Booij, Olaf

AU - Manfredi, Marco

AU - Kooij, Julian

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2021

Y1 - 2021

N2 - Cross-view matching aims to learn a shared image representation between ground-level images and satellite or aerial images at the same locations. In robotic vehicles, matching a camera image to a database of geo-referenced aerial imagery can serve as a method for self-localization. However, existing work on cross-view matching only aims at global localization, and overlooks the easily accessible rough location estimates from GNSS or temporal filtering. We argue that the availability of coarse location estimates at test time should already be considered during training. We adopt a simple but effective adaptation to the common triplet loss, resulting in an image representation that is more discriminative within the geographically local neighborhood, without any modifications to a baseline deep neural network. Experiments on the CVACT dataset confirm that the improvements generalize across spatial regions. On a new benchmark constructed from the Oxford RobotCar dataset, we also show generalization across recording days within the same region. Finally, we validate that improvements on these image-retrieval benchmarks also translate to a real-world localization task. Using a particle filter to fuse the cross-view matching scores of a vehicle's camera stream with real GPS measurements, our learned geographically local representation reduces the mean localization error by 17\% compared to the standard global representation learned by the current state-of-the-art.

AB - Cross-view matching aims to learn a shared image representation between ground-level images and satellite or aerial images at the same locations. In robotic vehicles, matching a camera image to a database of geo-referenced aerial imagery can serve as a method for self-localization. However, existing work on cross-view matching only aims at global localization, and overlooks the easily accessible rough location estimates from GNSS or temporal filtering. We argue that the availability of coarse location estimates at test time should already be considered during training. We adopt a simple but effective adaptation to the common triplet loss, resulting in an image representation that is more discriminative within the geographically local neighborhood, without any modifications to a baseline deep neural network. Experiments on the CVACT dataset confirm that the improvements generalize across spatial regions. On a new benchmark constructed from the Oxford RobotCar dataset, we also show generalization across recording days within the same region. Finally, we validate that improvements on these image-retrieval benchmarks also translate to a real-world localization task. Using a particle filter to fuse the cross-view matching scores of a vehicle's camera stream with real GPS measurements, our learned geographically local representation reduces the mean localization error by 17\% compared to the standard global representation learned by the current state-of-the-art.

KW - Benchmark testing

KW - Global navigation satellite system

KW - Intelligent Transportation Systems

KW - Localization

KW - Location awareness

KW - Representation Learning

KW - Satellites

KW - Sensors

KW - Task analysis

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85111034542&partnerID=8YFLogxK

U2 - 10.1109/LRA.2021.3088076

DO - 10.1109/LRA.2021.3088076

M3 - Article

AN - SCOPUS:85111034542

SN - 2377-3766

VL - 6

SP - 5921

EP - 5928

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 3

ER -

Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this