Poster (After Call) FZJ-2025-02904

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
ConText Transformer: Text-guided Instance Segmentation in Scientific Imaging

 ;  ;

2025

Helmholtz AI Conference 2025, HAICON25, KarlsruheKarlsruhe, Germany, 3 Jun 2025 - 5 Jun 20252025-06-032025-06-05

Abstract: Scientific imaging gives rise to a multitude of different segmentation tasks, many of which involve manually annotated datasets. We have collected a large number of such heterogeneous datasets, comprising over 10 million instance annotations, and demonstrate that in a multi-task setting, segmentation models at this scale cannot be effectively trained using solely image-based supervised learning. A major reason is that images from the same domain may be used to address different research questions, with varying annotation procedures and styles. For example, images of biological tissues may be evaluated for nuclei or cell bodies, despite using the same image modality. To overcome these challenges, we propose using simple text-based task descriptions to provide models with the necessary context for solving a given objective. We introduce the ConText Transformer, which implements a dual-stream architecture, processing and fusing both image and text data. Based on the provided textual descriptions, the model learns to adapt its internal feature representations to effectively switch between segmenting different classes and annotation styles observed in the datasets. These descriptions can range from simple class names (e.g., “white blood cells”)—prompting the model to only segment the referenced class—to more nuanced formulations such as toggling the use of overlapping segmentations in model predictions or segmenting a nucleus, even in the absence of cytoplasm or membrane, as is common in datasets like TissueNet but omitted in Cellpose. Since interpreting these descriptions is part of the model training, it is also possible to define dedicated terms abbreviating very complex descriptions. ConText Transformer is designed for compatibility. It can be used with existing segmentation frameworks, including the Contour Proposal Network (CPN) or Mask R-CNN. Our experiments on over 10 million instance annotations show that ConText Transformer models achieve competitive segmentation performance and outperform specialized models in several benchmarks; confirming that a single, unified model can effectively handle a wide spectrum of segmentation tasks; and eventually may replace specialist models in scientific image segmentation


Contributing Institute(s):
  1. Strukturelle und funktionelle Organisation des Gehirns (INM-1)
Research Program(s):
  1. 5254 - Neuroscientific Data Analytics and AI (POF4-525) (POF4-525)
  2. DFG project G:(GEPRIS)313856816 - SPP 2041: Computational Connectomics (313856816) (313856816)
  3. EBRAINS 2.0 - EBRAINS 2.0: A Research Infrastructure to Advance Neuroscience and Brain Health (101147319) (101147319)
  4. HIBALL - Helmholtz International BigBrain Analytics and Learning Laboratory (HIBALL) (InterLabs-0015) (InterLabs-0015)
  5. Helmholtz AI - Helmholtz Artificial Intelligence Coordination Unit – Local Unit FZJ (E.40401.62) (E.40401.62)

Appears in the scientific report 2025
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Poster
Institute Collections > INM > INM-1
Workflow collections > Public records
Publications database

 Record created 2025-06-27, last modified 2025-07-16



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)