Background and purpose: White matter hyperintensities in brain MRI are key indicators of various neurological conditions, and their accurate segmentation is essential for assessing disease progression. This study aims to evaluate the performance of a 3D convolutional neural network and a 3D Transformer-based model for white matter hyperintensities segmentation, focusing on their efficacy with limited datasets and similar computational resources.
Materials and methods: We implemented a convolution-based model (3D ResNet-50 U-Net with spatial and channel squeeze & excitation) and a Transformer-based model (3D Swin Transformer with a convolutional stem). The models were evaluated on two clinical datasets from Kaohsiung Chang Gung Memorial Hospital and National Center for High-Performance Computing. Four metrics were used for evaluation: Dice similarity coefficient, lesion segmentation, lesion F1-Score, and lesion sensitivity.
Results: The Transformer-based model, with appropriate adjustments, outperformed the well-established convolution-based model in foreground Dice similarity coefficient, lesion F1-Score, and sensitivity, demonstrating robust segmentation accuracy. DRLoc enhanced the Transformer's performance, achieving comparable results on internal and benchmark datasets despite limited data availability.
Conclusion: With comparable computational overhead, a Transformer-based model can surpass a well-established convolution-based model in white matter hyperintensities segmentation on small datasets by capturing global context effectively, making them suitable for clinical applications where computational resources are constrained.
Keywords: Brain MRI; Convolutional neural network; Segmentation; Vision transformer; White matter hyperintensities.
© 2025. The Author(s).