TY - THES
AU - Kromm, Edward
TI - Data Fusion for Scene Graph Generation: Bridging Simulated and Real-World Datasets
PB - Hochschule Coburg
VL - Masterarbeit
CY - Jülich
M1 - FZJ-2025-03665
SP - 102 pages: Figures, Tables
PY - 2025
N1 - Masterarbeit, Hochschule Coburg, 2025
AB - Scene graph generation has emerged as a powerful tool for AI-driven visual understandingof images by not only detecting objects in an image but also predicting the relationshipsbetween them, such as car–stops at–traffic light or pedestrian–crosses–street. This capabilityis particularly important for autonomous driving, where relational context between roadusers and infrastructure plays a critical role. However, the application of scene graphgeneration in this domain is hindered by the scarcity of annotated datasets. Drivingsimulators such as CARLA provide a scalable alternative, enabling efficient data generationcompared to manual annotation. Yet models trained exclusively on simulated data oftenfail to generalize to real-world data due to the substantial domain gap between the two.This thesis addresses this challenge by proposing a novel data fusion framework thatcombines simulated and real datasets to construct autonomous driving–specific relationshipannotations and subsequently bridge the domain gap for real-world prediction. The workpresents the complete pipeline, including dataset generation in simulation, adaptationof publicly available resources, and augmentation strategies. The Relation Transformermodel is analyzed in depth, and particular attention is given to interpreting its internalmechanisms by visualizing the learned attention maps as heatmaps. This analysis providesinsights into whether the model focuses on semantically meaningful regions when predictingrelationships. Building on this understanding, two new approaches are introduced to enableinference on real data while transferring relational knowledge acquired in simulation. Anablation study further quantifies the impact of the domain gap on model performance andhighlights the strengths and limitations of the proposed methods. Results demonstratethat one of the developed approaches effectively mitigates the simulation-to-reality gapand concrete suggestions for advancing this technique toward further uses for AI-drivenvisual understanding of images in the automotive context are provided.
LB - PUB:(DE-HGF)19
DO - DOI:10.34734/FZJ-2025-03665
UR - https://juser.fz-juelich.de/record/1046025
ER -