Towards generalized machine learning models for dislocation image analysis: a parametric based synthetic data approach

Govind, Kishan; Sandfeld, Stefan; Mayer, Joachim

doi:10.18154/RWTH-2025-08373

Dissertation / PhD Thesis

FZJ-2026-02803

Towards generalized machine learning models for dislocation image analysis: a parametric based synthetic data approach

Govind, K. (Corresponding author) ; Sandfeld, S. (Supervisor)FZJ* ; Mayer, J. (Supervisor)FZJ*

2025
RWTH Aachen University

RWTH Aachen University pages 1 Online-Ressource : Illustrationen (2025) [10.18154/RWTH-2025-08373] = Dissertation, RWTH Aachen, 2026

This record in other databases:

Please use a persistent id in citations: doi:10.18154/RWTH-2025-08373 doi:10.34734/FZJ-2026-02803

Abstract: Since the first observation of dislocations in the mid 1950s, when electron microscopy was used to visualize these defects, there have been significant advancements in microscopy techniques, allowing for the acquisition of high-quality, high-resolution dislocation image data. Today, it is even possible to perform in-situ mechanical testing, enabling the observation of dislocation microstructure evolution during the plastic deformation of materials. The dislocation image data generated in such experiments need to be studied quantitatively to facilitate meaningful calculations and to understand the underlying mechanisms. Deep learning methods, particularly image segmentation based on convolutional neural networks like U-Net, offer a powerful tool for segmenting dislocation lines which can provide us a way to represent the dislocations as splines to perform quantitative studies. However, these methods require substantial amounts of labeled training data, requiring us to perform many more experiments and labor-intensive manual labeling of dislocation lines. Lack of high quality, large quantity training data presents a significant challenge to applying state-of-the-art deep learning models to dislocation image data. This work addresses that challenge. In this work, we introduce a novel parametric-based synthetic data generation model, which enables the creation of synthetic training datasets for deep learning-based training of Transmission Electron Microscopy (TEM) images of dislocation microstructures. The synthetic data generation model proposed in this work is designed to generate training data in a way that not only replicates the background of TEM images but also renders complex dislocation microstructures—an essential aspect of materials science research. Two distinct methods are used for generating synthetic image backgrounds. The first method leverages Perlin noise, combined with random white noise, to create a purely synthetic background, offering a controlled environment for dislocation rendering. The second method, which is much more realistic, uses patches of backgrounds from real TEM images, reassembling them to form realistic-looking backgrounds. This approach mirrors the complexity and variability present in real TEM images, providing a more accurate context for the synthetic dislocation structures. The core innovation of this work lies in the modeling of dislocation microstructures for synthetic training data. We start with dislocation line and model it as a spline by providing support points for the spline. By representing dislocations as splines, the model achieves high fidelity in simulating dislocation patterns, such as dislocation pileups. These support points can be obtained through two methods: polynomial approximation of dislocation lines or manual selection of key points using image annotation tools like Labelme on dislocations in real TEM images. This flexibility allows for the creation of diverse range of dislocation microstructures consisting of a wide range of configurations, such as dislocation pileups, with varying slip widths, directions, and dislocation counts. Additionally, two more structures—slip trace lines and grain boundaries—are incorporated into the microstructure which are modelled as a line, further aiding machine learning models in learning the characteristics of dislocations and improving predictive accuracy. The ability to generate complex dislocation structures, some of which are challenging or even impossible to observe in actual TEM images, is particularly significant. After generating the synthetic training data, the next step involves training machine learning models. In this work, we explore three different machine learning approaches. The first two approaches, multi-label segmentation and instance segmentation, predict individual dislocations as binary masks, which need to be post-processed to represent dislocations as splines and obtain digital representation of the image. Third approach is a more direct approach which estimates the spline support points on the dislocations to represent the dislocation splines directly. We conduct extensive studies to demonstrate the use of the synthetic data and show how it can be used as an alternate to real experimental data or along with real data. This research represents an important step toward developing generalized machine learning models for dislocation analysis by leveraging synthetic data. The development of a novel parametric-based synthetic data generation model addresses the need of obtaining high-quality training data for machine learning models, particularly for TEM image analysis. The synthetic data generation model enables the creation of synthetic images that closely resemble real TEM images while capturing complex dislocation structures. By generating diverse and realistic training datasets, this research opens up new possibilities for applying advanced deep learning methods, such as U-Net and Mask R-CNN, to the segmentation and analysis of dislocations enabling high throughout studies. Furthermore, the study demonstrates the effectiveness of using machine learning models trained on synthetic data to perform quantitative analysis on real experimental data, reinforcing the practical applicability of these methods in material science research and offers valuable insights into the mechanisms of plastic deformation, further contributing to our understanding of material behavior.

Keyword(s): AI ; TEM images ; data driven ; dislocations ; instance segmentation ; segmentation