ConText Transformer: Text-guided Instance Segmentation in Scientific Imaging

Upschulte, Eric; Amunts, Katrin; Dickscheid, Timo
001043529 001__ 1043529
001043529 005__ 20250716202229.0
001043529 037__ $$aFZJ-2025-02905
001043529 041__ $$aEnglish
001043529 1001_ $$0P:(DE-Juel1)177675$$aUpschulte, Eric$$b0$$eCorresponding author$$ufzj
001043529 1112_ $$aHelmholtz Imaging Conference 2025$$cPotsdam$$d2025-06-25 - 2025-06-27$$wGermany
001043529 245__ $$aConText Transformer: Text-guided Instance Segmentation in Scientific Imaging
001043529 260__ $$c2025
001043529 3367_ $$033$$2EndNote$$aConference Paper
001043529 3367_ $$2DataCite$$aOther
001043529 3367_ $$2BibTeX$$aINPROCEEDINGS
001043529 3367_ $$2DRIVER$$aconferenceObject
001043529 3367_ $$2ORCID$$aLECTURE_SPEECH
001043529 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1752680553_22258$$xAfter Call
001043529 520__ $$aScientific imaging gives rise to a multitude of different segmentation tasks, in many cases addressed with manually annotated datasets. We collected a large number of such heterogeneous datasets, consisting of over 10 million instance annotations, and demonstrate that in a multi-task setting, segmentation models at this scale cannot be trained effectively by only using image-based supervised learning. A major reason is that images of the same domain may be used to address different research questions, with varying annotation procedures and styles. For example, images of biological tissues may be evaluated for nuclei or cell bodies despite using the same staining. To overcome these challenges, we propose using simple text-based task descriptions to provide models the necessary context for solving a given objective. We introduce the ConText Transformer, which implements a dual-stream architecture, processing and fusing both image and text data. Based on the provided textual descriptions, the model learns to adapt its internal feature representations to effectively switch between segmenting different classes and annotation styles observed in the datasets. These descriptions can range from simple class names (e.g. “white blood cells”)—prompting the model to only segment the referenced class—to more nuanced formulations such as toggling the use of overlapping segmentations in model predictions or segmenting a cell’s nuclei during cell segmentation if the respective cell boundary is not visible, as it is common for example in the TissueNet dataset. Since interpreting these descriptions is part of the model training, it is also possible to define dedicated terms abbreviating very complex descriptions. ConText Transformer is designed for compatibility. It can be used with existing segmentation frameworks, including Contour Proposal Network (CPN) or Mask R-CNN. Our experiments on over 10 million instance annotations show that ConText Transformer models achieve competitive segmentation performance and outperform specialized models in several benchmarks; confirming that a single, unified model can effectively handle a wide spectrum of segmentation tasks; and eventually allowing to replace specialist models in scientific image segmentation.
001043529 536__ $$0G:(DE-HGF)POF4-5254$$a5254 - Neuroscientific Data Analytics and AI (POF4-525)$$cPOF4-525$$fPOF IV$$x0
001043529 536__ $$0G:(DE-Juel-1)E.40401.62$$aHelmholtz AI - Helmholtz Artificial Intelligence  Coordination Unit – Local Unit FZJ (E.40401.62)$$cE.40401.62$$x1
001043529 536__ $$0G:(DE-HGF)InterLabs-0015$$aHIBALL - Helmholtz International BigBrain Analytics and Learning Laboratory (HIBALL) (InterLabs-0015)$$cInterLabs-0015$$x2
001043529 536__ $$0G:(GEPRIS)313856816$$aDFG project G:(GEPRIS)313856816 - SPP 2041: Computational Connectomics (313856816)$$c313856816$$x3
001043529 7001_ $$0P:(DE-Juel1)131631$$aAmunts, Katrin$$b1$$ufzj
001043529 7001_ $$0P:(DE-Juel1)165746$$aDickscheid, Timo$$b2$$ufzj
001043529 909CO $$ooai:juser.fz-juelich.de:1043529$$pVDB
001043529 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)177675$$aForschungszentrum Jülich$$b0$$kFZJ
001043529 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)131631$$aForschungszentrum Jülich$$b1$$kFZJ
001043529 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165746$$aForschungszentrum Jülich$$b2$$kFZJ
001043529 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5254$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x0
001043529 9141_ $$y2025
001043529 920__ $$lyes
001043529 9201_ $$0I:(DE-Juel1)INM-1-20090406$$kINM-1$$lStrukturelle und funktionelle Organisation des Gehirns$$x0
001043529 980__ $$aconf
001043529 980__ $$aVDB
001043529 980__ $$aI:(DE-Juel1)INM-1-20090406
001043529 980__ $$aUNRESTRICTED
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help