Self-Supervised Learning Based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation

Wang, Qin; Morrison, Abigail; Scharr, Hanno; Krajsek, Kai; Quercia, Alessio; Bruns, Benjamin

doi:10.1609/aaai.v40i31.39849

Contribution to a conference proceedings/Contribution to a book

FZJ-2026-02194

Self-Supervised Learning Based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation

Wang, Q. (Corresponding author)FZJ* ; Quercia, A.FZJ* ; Bruns, B.FZJ* ; Morrison, A.FZJ* ; Scharr, H.FZJ* ; Krajsek, K.FZJ*

2026

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)
The 40th Annual AAAI Conference on Artificial Intelligence, Singapore, Singapore, 20 Jan 2026 - 27 Jan 2026 40(31), 26425 - 26434 (2026) [10.1609/aaai.v40i31.39849]

This record in other databases:

Please use a persistent id in citations: doi:10.1609/aaai.v40i31.39849

Abstract: Self-supervised learning (SSL) methods have achieved remarkable success in learning image representations allowing invariances in them — but therefore discarding transformation information that some computer vision tasks actually require. While recent approaches attempt to address this limitation by learning equivariant features using linear operators in feature space, they impose restrictive assumptions that constrain flexibility and generalization. We introduce a weaker definition for the transformation relation between image and feature space denoted as equivariance-coherence. We propose a novel SSL auxillary task that learns equivariance-coherent representations through intermediate transformation reconstruction, which can be integrated with existing joint embedding SSL methods. Our key idea is to reconstruct images at intermediate points along transformation paths, e.g. when training on 30° rotations, we reconstruct the 10° and 20° rotation states. Reconstructing intermediate states requires the transformation information used in augmentations, rather than suppressing it, and therefore fosters features containing the augmented transformation information. Our method decomposes feature vectors into invariant and equivariant parts, training them with standard SSL losses and reconstruction losses, respectively. We demonstrate substantial improvements on synthetic equivariance benchmarks while maintaining competitive performance on downstream tasks requiring invariant representations. The approach seamlessly integrates with existing SSL methods (iBOT, DINOv2) and consistently enhances performance across diverse tasks, including segmentation, detection, depth estimation, and video dense prediction. Our framework provides a practical way for augmenting SSL methods with equivariant capabilities while preserving invariant performance.

Contributing Institute(s):

Research Program(s):

Appears in the scientific report 2026

Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Document types > Books > Contribution to a book
Institute Collections > IAS > IAS-6
Institute Collections > IAS > IAS-8
Workflow collections > Public records
Institute Collections > JSC
Publications database

Record created 2026-04-09, last modified 2026-06-02

Similar records

Restricted:

PDF

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help