Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks

Quercia, Alessio; Morrison, Abigail; Cao, Zhuo; Yildiz, Erenus; Assent, Ira; Scharr, Hanno; Krajsek, Kai
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Quercia:1033882,
      author       = {Quercia, Alessio and Yildiz, Erenus and Cao, Zhuo and
                      Krajsek, Kai and Morrison, Abigail and Assent, Ira and
                      Scharr, Hanno},
      title        = {{E}nhancing {M}onocular {D}epth {E}stimation with
                      {M}ulti-{S}ource {A}uxiliary {T}asks},
      reportid     = {FZJ-2024-06720},
      pages        = {8},
      year         = {2025},
      abstract     = {Monocular depth estimation (MDE) is a challenging task in
                      computer vision, often hindered by the cost and scarcity of
                      high-quality labeled datasets. We tackle this challenge
                      using auxiliary datasets from related vision tasks for an
                      alternating training scheme with a shared decoder built on
                      top of a pre-trained vision foundation model, while giving a
                      higher weight to MDE. Through extensive experiments we
                      demonstrate the benefits of incorporating various in-domain
                      auxiliary datasets and tasks to improve MDE quality on
                      average by $~11\%.$ Our experimental analysis shows that
                      auxiliary tasks have different impacts, confirming the
                      importance of task selection, highlighting that quality
                      gains are not achieved by merely adding data. Remarkably,
                      our study reveals that using semantic segmentation datasets
                      as Multi-Label Dense Classification (MLDC) often results in
                      additional quality gains. Lastly, our method significantly
                      improves the data efficiency for the considered MDE
                      datasets, enhancing their quality while reducing their size
                      by at least $80\%.$ This paves the way for using auxiliary
                      data from related tasks to improve MDE quality despite
                      limited availability of high-quality labeled data. Code is
                      available at https://jugit.fz-juelich.de/ias-8/mdeaux.},
      month         = {Feb},
      date          = {2025-02-28},
      organization  = {IEEE/CVF Winter Conference on
                       Applications of Computer Vision, Tucson
                       (USA), 28 Feb 2025 - 4 Mar 2025},
      cin          = {IAS-8 / IAS-6 / JSC},
      cid          = {I:(DE-Juel1)IAS-8-20210421 / I:(DE-Juel1)IAS-6-20130828 /
                      I:(DE-Juel1)JSC-20090406},
      pnm          = {5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs)
                      and Research Groups (POF4-511) / SLNS - SimLab Neuroscience
                      (Helmholtz-SLNS) / 5111 - Domain-Specific Simulation $\&$
                      Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)},
      pid          = {G:(DE-HGF)POF4-5112 / G:(DE-Juel1)Helmholtz-SLNS /
                      G:(DE-HGF)POF4-5111},
      typ          = {PUB:(DE-HGF)8},
      url          = {https://juser.fz-juelich.de/record/1033882},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help