Conference Presentation (After Call) FZJ-2025-00071

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Multi-Source Auxiliary Tasks supported Monocular Depth Estimation

 ;  ;  ;  ;  ;  ;

2024

Helmholtz AI Conference, DüsseldorfDüsseldorf, Germany, 12 Jun 2024 - 14 Jun 20242024-06-122024-06-14

Abstract: Monocular depth estimation (MDE) is a challenging task in computer vision, often hindered by the cost and scarcity of high-quality labeled datasets. We tackle this challenge using auxiliary datasets from related vision tasks for joint training of a shared decoder on top of a pre-trained vision foundation model, while giving a higher weight to MDE.In particular, we leverage a frozen DINOv2 ViT Giant model as a feature extractor, bypassing the need for fine-tuning, and jointly train a shared DPT decoder with auxiliary datasets from related tasks to improve MDE. We illustrate the qualitative and quantitative improvements of our method over the DINOv2 MDE baseline in Figures 1 and 2, respectively.Notably, compared to the recent Depth Anything, which reports no improvements using a jointly fine-tuned DINOv2 ViT Large and task-specific decoders, our method successfully leverages auxiliary tasks.Through extensive experiments we demonstrate the benefits of incorporating various auxiliary datasets and tasks to improve MDE quality on average by ~11% for related datasets. Our experimental analysis shows that auxiliary tasks have different impacts, confirming the importance of task selection, highlighting that quality gains are not achieved by merely adding data. Remarkably, our study reveals that using semantic segmentation datasets as multi-label dense classification often results in additional quality gains.


Note: The original abstract contains figures that cannot be shown here.

Contributing Institute(s):
  1. Datenanalyse und Maschinenlernen (IAS-8)
  2. Computational and Systems Neuroscience (IAS-6)
  3. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  2. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  3. SLNS - SimLab Neuroscience (Helmholtz-SLNS) (Helmholtz-SLNS)

Appears in the scientific report 2024
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Conference Presentations
Institute Collections > IAS > IAS-6
Institute Collections > IAS > IAS-8
Workflow collections > Public records
Institute Collections > JSC
Publications database

 Record created 2025-01-06, last modified 2025-02-03



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)