ConText Transformer: Text-guided Instance Segmentation in Scientific Imaging

Upschulte, Eric; Amunts, Katrin; Dickscheid, Timo

Items
Marc 21

001			1043528
005			20250716202229.0
037	_	_	\|a FZJ-2025-02904
041	_	_	\|a English
100	1	_	\|a Upschulte, Eric \|0 P:(DE-Juel1)177675 \|b 0 \|e Corresponding author \|u fzj
111	2	_	\|a Helmholtz AI Conference 2025 \|g HAICON25 \|c Karlsruhe \|d 2025-06-03 - 2025-06-05 \|w Germany
245	_	_	\|a ConText Transformer: Text-guided Instance Segmentation in Scientific Imaging
260	_	_	\|c 2025
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a CONFERENCE_POSTER \|2 ORCID
336	7	_	\|a Output Types/Conference Poster \|2 DataCite
336	7	_	\|a Poster \|b poster \|m poster \|0 PUB:(DE-HGF)24 \|s 1752680519_22200 \|2 PUB:(DE-HGF) \|x After Call
520	_	_	\|a Scientific imaging gives rise to a multitude of different segmentation tasks, many of which involve manually annotated datasets. We have collected a large number of such heterogeneous datasets, comprising over 10 million instance annotations, and demonstrate that in a multi-task setting, segmentation models at this scale cannot be effectively trained using solely image-based supervised learning. A major reason is that images from the same domain may be used to address different research questions, with varying annotation procedures and styles. For example, images of biological tissues may be evaluated for nuclei or cell bodies, despite using the same image modality. To overcome these challenges, we propose using simple text-based task descriptions to provide models with the necessary context for solving a given objective. We introduce the ConText Transformer, which implements a dual-stream architecture, processing and fusing both image and text data. Based on the provided textual descriptions, the model learns to adapt its internal feature representations to effectively switch between segmenting different classes and annotation styles observed in the datasets. These descriptions can range from simple class names (e.g., “white blood cells”)—prompting the model to only segment the referenced class—to more nuanced formulations such as toggling the use of overlapping segmentations in model predictions or segmenting a nucleus, even in the absence of cytoplasm or membrane, as is common in datasets like TissueNet but omitted in Cellpose. Since interpreting these descriptions is part of the model training, it is also possible to define dedicated terms abbreviating very complex descriptions. ConText Transformer is designed for compatibility. It can be used with existing segmentation frameworks, including the Contour Proposal Network (CPN) or Mask R-CNN. Our experiments on over 10 million instance annotations show that ConText Transformer models achieve competitive segmentation performance and outperform specialized models in several benchmarks; confirming that a single, unified model can effectively handle a wide spectrum of segmentation tasks; and eventually may replace specialist models in scientific image segmentation
536	_	_	\|a 5254 - Neuroscientific Data Analytics and AI (POF4-525) \|0 G:(DE-HGF)POF4-5254 \|c POF4-525 \|f POF IV \|x 0
536	_	_	\|a DFG project G:(GEPRIS)313856816 - SPP 2041: Computational Connectomics (313856816) \|0 G:(GEPRIS)313856816 \|c 313856816 \|x 1
536	_	_	\|a EBRAINS 2.0 - EBRAINS 2.0: A Research Infrastructure to Advance Neuroscience and Brain Health (101147319) \|0 G:(EU-Grant)101147319 \|c 101147319 \|f HORIZON-INFRA-2022-SERV-B-01 \|x 2
536	_	_	\|a HIBALL - Helmholtz International BigBrain Analytics and Learning Laboratory (HIBALL) (InterLabs-0015) \|0 G:(DE-HGF)InterLabs-0015 \|c InterLabs-0015 \|x 3
536	_	_	\|a Helmholtz AI - Helmholtz Artificial Intelligence Coordination Unit – Local Unit FZJ (E.40401.62) \|0 G:(DE-Juel-1)E.40401.62 \|c E.40401.62 \|x 4
700	1	_	\|a Amunts, Katrin \|0 P:(DE-Juel1)131631 \|b 1 \|u fzj
700	1	_	\|a Dickscheid, Timo \|0 P:(DE-Juel1)165746 \|b 2 \|u fzj
909	C	O	\|o oai:juser.fz-juelich.de:1043528 \|p openaire \|p VDB \|p ec_fundedresources
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)177675
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)131631
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)165746
913	1	_	\|a DE-HGF \|b Key Technologies \|l Natural, Artificial and Cognitive Information Processing \|1 G:(DE-HGF)POF4-520 \|0 G:(DE-HGF)POF4-525 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Decoding Brain Organization and Dysfunction \|9 G:(DE-HGF)POF4-5254 \|x 0
914	1	_	\|y 2025
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)INM-1-20090406 \|k INM-1 \|l Strukturelle und funktionelle Organisation des Gehirns \|x 0
980	_	_	\|a poster
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)INM-1-20090406
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe