Combining Synthetic Images and Deep Active Learning: Data-Efficient Training of an Industrial Object Detection Model

J Imaging. 2024 Jan 6;10(1):0. doi: 10.3390/jimaging10010016.

Abstract

Generating synthetic data is a promising solution to the challenge of limited training data for industrial deep learning applications. However, training on synthetic data and testing on real-world data creates a sim-to-real domain gap. Research has shown that the combination of synthetic and real images leads to better results than those that are generated using only one source of data. In this work, the generation of synthetic training images via physics-based rendering is combined with deep active learning for an industrial object detection task to iteratively improve model performance over time. Our experimental results show that synthetic images improve model performance, especially at the beginning of the model's life cycle with limited training data. Furthermore, our implemented hybrid query strategy selects diverse and informative new training images in each active learning cycle, which outperforms random sampling. In conclusion, this work presents a workflow to train and iteratively improve object detection models with a small number of real-world images, leading to data-efficient and cost-effective computer vision models.

Keywords: active learning; computer vision; data efficiency; deep active learning; deep learning; image synthesis; industrial application; object detection; synthetic images; turbine blade.