A Combined Frame Difference and Convolution Method for Moving Vehicle Detection in Satellite Videos

Sensors (Basel). 2025 Jan 7;25(2):306. doi: 10.3390/s25020306.

Abstract

To address the challenges of missed detections caused by insufficient shape and texture features and blurred boundaries in existing detection methods, this paper introduces a novel moving vehicle detection approach for satellite videos. The proposed method leverages frame difference and convolution to effectively integrate spatiotemporal information. First, a frame difference module (FDM) is designed, combining frame difference and convolution. This module extracts motion features between adjacent frames using frame difference, refines them through backpropagation in the neural network, and integrates them with the current frame to compensate for the missing motion features in single-frame images. Next, the initial features are processed by a backbone network to further extract spatiotemporal feature information. The neck incorporates deformable convolution, which adaptively adjusts convolution kernel sampling positions, optimizing feature representation and enabling effective multiscale information integration. Additionally, shallow large-scale feature maps, which use smaller receptive fields to focus on small targets and reduce background interference, are fed into the detection head. To enhance small-target feature representation, a small-target self-reconstruction module (SR-TOD) is introduced between the neck and the detection head. Experiments using the Jilin-1 satellite video dataset demonstrate that the proposed method outperforms comparison models, significantly reducing missed detections caused by weak color and texture features and blurred boundaries. For the satellite-video moving vehicle detection task, this method achieves notable improvements, with an average F1-score increase of 3.9% and a per-frame processing speed enhancement of 7 s compared to the next best model, DSFNet.

Keywords: frame difference method; motion target detection; neural network; satellite video; self-reconfiguration.