External validation of the deep learning system "SpineNet" for grading radiological features of degeneration on MRIs of the lumbar spine

Eur Spine J. 2022 Aug;31(8):2137-2148. doi: 10.1007/s00586-022-07311-x. Epub 2022 Jul 14.

Abstract

Background: Magnetic resonance imaging (MRI) is used to detect degenerative changes of the lumbar spine. SpineNet (SN), a computer vision-based system, performs an automated analysis of degenerative features in MRI scans aiming to provide high accuracy, consistency and objectivity. This study evaluated SN's ratings compared with those of an expert radiologist.

Method: MRIs of 882 patients (mean age, 72 ± 8.8 years) with degenerative spinal disorders from two previous trials carried out in our spine center between 2011 and 2019, were analyzed by an expert radiologist. Lumbar segments (L1/2-L5/S1) were graded for Pfirrmann Grades (PG), Spondylolisthesis (SL) and Central Canal Stenosis (CCS). SN's analysis for the equivalent parameters was generated. Agreement between methods was analyzed using kappa (κ), Spearman correlation (ρ) and Lin's concordance correlation (ρc) coefficients and class average accuracy (CAA).

Results: 4410 lumbar segments were analyzed. κ statistics showed moderate to substantial agreement in PG between the radiologist and SN depending on spinal level (range κ 0.63-0.77, all levels together 0.72; range CAA 45-68%, all levels 55%), slight to substantial agreement for SL (range κ 0.07-0.60, all levels 0.63; range CAA 47-57%, all levels 56%) and CCS (range κ 0.17-0.57, all levels 0.60; range CAA 35-41%, all levels 43%). SN tended to record more pathological features in PG than did the radiologist whereas the contrary was the case for CCS. SL showed an even distribution between methods.

Conclusion: SN is a robust and reliable tool with the ability to grade degenerative features such as PG, SL or CCS in lumbar MRIs with moderate to substantial agreement compared to the current gold-standard, the radiologist. It is a valuable alternative for analyzing MRIs from large cohorts for diagnostic and research purposes.

Keywords: Automated software; Degenerative spinal disorders; Diagnostic imaging; Disc degeneration; Inter-rater agreement; Machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Aged, 80 and over
  • Constriction, Pathologic
  • Deep Learning*
  • Humans
  • Intervertebral Disc Degeneration* / diagnostic imaging
  • Intervertebral Disc Degeneration* / pathology
  • Lumbar Vertebrae / diagnostic imaging
  • Lumbar Vertebrae / pathology
  • Lumbosacral Region / pathology
  • Magnetic Resonance Imaging / methods
  • Middle Aged
  • Spondylolisthesis* / diagnostic imaging
  • Spondylolisthesis* / pathology