DDL-MVS: Depth Discontinuity Learning for Multi-View Stereo Networks

N. Ibrahimli*, H. Ledoux, J.F.P. Kooij, L. Nan

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

122 Downloads (Pure)

Abstract

We propose an enhancement module called depth discontinuity learning (DDL) for learning-based multi-view stereo (MVS) methods. Traditional methods are known for their accuracy but struggle with completeness. While recent learning-based methods have improved completeness at the cost of accuracy, our DDL approach aims to improve accuracy while retaining completeness in the reconstruction process. To achieve this, we introduce the joint estimation of depth and boundary maps, where the boundary maps are explicitly utilized for further refinement of the depth maps. We validate our idea by integrating it into an existing learning-based MVS pipeline where the reconstruction depends on high-quality depth map estimation. Extensive experiments on various datasets, namely DTU, ETH3D, “Tanks and Temples”, and BlendedMVS, show that our method improves reconstruction quality compared to our baseline, Patchmatchnet. Our ablation study demonstrates that incorporating the proposed DDL significantly reduces the depth map error, for instance, by more than 30% on the DTU dataset, and leads to improved depth map quality in both smooth and boundary regions. Additionally, our qualitative analysis has shown that the reconstructed point cloud exhibits enhanced quality without any significant compromise on completeness. Finally, the experiments reveal that our proposed model and strategies exhibit strong generalization capabilities across the various datasets.
Original languageEnglish
Article number2970
Number of pages18
JournalRemote Sensing
Volume15
Issue number12
DOIs
Publication statusPublished - 2023

Keywords

  • multi-view stereo
  • 3D reconstruction
  • depth map refinement
  • depth boundary estimation

Fingerprint

Dive into the research topics of 'DDL-MVS: Depth Discontinuity Learning for Multi-View Stereo Networks'. Together they form a unique fingerprint.

Cite this