Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction

Wang, Qin; Scharr, Hanno; Krajsek, Kai
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Wang:1037214,
      author       = {Wang, Qin and Krajsek, Kai and Scharr, Hanno},
      title        = {{E}quivariant {R}epresentation {L}earning for
                      {A}ugmentation-based {S}elf-{S}upervised {L}earning via
                      {I}mage {R}econstruction},
      reportid     = {FZJ-2025-00547},
      pages        = {12},
      year         = {2024},
      abstract     = {Augmentation-based self-supervised learning methods have
                      shown remarkable success in self-supervised visual
                      representation learning, excelling in learning invariant
                      features but often neglecting equivariant ones. This
                      limitation reduces the generalizability of foundation
                      models, particularly for downstream tasks requiring
                      equivariance. We propose integrating an image reconstruction
                      task as an auxiliary component in augmentation-based
                      self-supervised learning algorithms to facilitate
                      equivariant feature learning without additional parameters.
                      Our method implements a cross-attention mechanism to blend
                      features learned from two augmented views, subsequently
                      reconstructing one of them. This approach is adaptable to
                      various datasets and augmented-pair based learning methods.
                      We evaluate its effectiveness on learning equivariant
                      features through multiple linear regression tasks and
                      downstream applications on both artificial (3DIEBench) and
                      natural (ImageNet) datasets. Results consistently
                      demonstrate significant improvements over standard
                      augmentation-based self-supervised learning methods and
                      state-of-the-art approaches, particularly excelling in
                      scenarios involving combined augmentations. Our method
                      enhances the learning of both invariant and equivariant
                      features, leading to more robust and generalizable visual
                      representations for computer vision tasks.},
      month         = {Dec},
      date          = {2024-12-10},
      organization  = {The Thirty-Eighth Annual Conference on
                       Neural Information Processing Systems
                       Workshop: Self-Supervised Learning -
                       Theory and Practice, Vancouver
                       (Canada), 10 Dec 2024 - 15 Dec 2024},
      cin          = {IAS-8 / JSC},
      cid          = {I:(DE-Juel1)IAS-8-20210421 / I:(DE-Juel1)JSC-20090406},
      pnm          = {5111 - Domain-Specific Simulation $\&$ Data Life Cycle Labs
                      (SDLs) and Research Groups (POF4-511) / 5112 - Cross-Domain
                      Algorithms, Tools, Methods Labs (ATMLs) and Research Groups
                      (POF4-511) / SLNS - SimLab Neuroscience (Helmholtz-SLNS)},
      pid          = {G:(DE-HGF)POF4-5111 / G:(DE-HGF)POF4-5112 /
                      G:(DE-Juel1)Helmholtz-SLNS},
      typ          = {PUB:(DE-HGF)8},
      doi          = {10.34734/FZJ-2025-00547},
      url          = {https://juser.fz-juelich.de/record/1037214},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help