TY - CONF
AU - Quercia, Alessio
AU - Yildiz, Erenus
AU - Cao, Zhuo
AU - Morrison, Abigail
AU - Krajsek, Kai
AU - Assent, Ira
AU - Scharr, Hanno
TI - Multi-Source Auxiliary Tasks supported Monocular Depth Estimation
M1 - FZJ-2025-00071
PY - 2024
N1 - The original abstract contains figures that cannot be shown here.
AB - Monocular depth estimation (MDE) is a challenging task in computer vision, often hindered by the cost and scarcity of high-quality labeled datasets. We tackle this challenge using auxiliary datasets from related vision tasks for joint training of a shared decoder on top of a pre-trained vision foundation model, while giving a higher weight to MDE.In particular, we leverage a frozen DINOv2 ViT Giant model as a feature extractor, bypassing the need for fine-tuning, and jointly train a shared DPT decoder with auxiliary datasets from related tasks to improve MDE. We illustrate the qualitative and quantitative improvements of our method over the DINOv2 MDE baseline in Figures 1 and 2, respectively.Notably, compared to the recent Depth Anything, which reports no improvements using a jointly fine-tuned DINOv2 ViT Large and task-specific decoders, our method successfully leverages auxiliary tasks.Through extensive experiments we demonstrate the benefits of incorporating various auxiliary datasets and tasks to improve MDE quality on average by ~11% for related datasets. Our experimental analysis shows that auxiliary tasks have different impacts, confirming the importance of task selection, highlighting that quality gains are not achieved by merely adding data. Remarkably, our study reveals that using semantic segmentation datasets as multi-label dense classification often results in additional quality gains.
T2 - Helmholtz AI Conference
CY - 12 Jun 2024 - 14 Jun 2024, Düsseldorf (Germany)
Y2 - 12 Jun 2024 - 14 Jun 2024
M2 - Düsseldorf, Germany
LB - PUB:(DE-HGF)6
UR - https://juser.fz-juelich.de/record/1034963
ER -