High-throughput phenotyping using VIS/NIR spectroscopy in the classification of soybean genotypes for grain yield and industrial traits

Spectrochim Acta A Mol Biomol Spectrosc. 2024 Apr 5:310:123963. doi: 10.1016/j.saa.2024.123963. Epub 2024 Feb 1.

Abstract

Employing visible and near infrared sensors in high-throughput phenotyping provides insight into the relationship between the spectral characteristics of the leaf and the content of grain properties, helping soybean breeders to direct their program towards improving grain traits according to researchers' interests. Our research hypothesis is that the leaf reflectance of soybean genotypes can be directly related to industrial grain traits such as protein and fiber contents. Thus, the objectives of the study were: (i) to classify soybean genotypes according to the grain yield and industrial traits; (ii) to identify the algorithm(s) with the highest accuracy for classifying genotypes using leaf reflectance as model input; (iii) to identify the best input data for the algorithms to improve their performance. A field experiment was carried out in randomized block design with three replications and 32 soybean genotypes. At 60 days after emergence, spectral analysis was carried out on three leaf samples from each plot. A hyperspectral sensor was used to capture reflectance between the wavelengths from 450 to 824 nm. Representative spectral bands were selected and grouped into means. After harvest, grain yield was assessed and laboratory analyses of industrial traits were carried out. Spectral, industrial traits and yield data were subjected to statistical analysis. Data were analyzed by the following machine learning algorithms: J48 (J48) and REPTree (DT) decision trees, Random Forest (RF), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and conventional Logistic Regression (LR) analysis. The clusters formed were used as the output of the models, while two groups of input data were used for the input of the models: the spectral variables (WL) noise-free obtained by the sensor (450-828 nm) and the spectral means of the selected bands (SB) (450.0-720.6 nm). Soybean genotypes were grouped according to their grain yield and industrial traits, in which the SVM and J48 algorithms performed better at classifying them. Using the spectral bands selected in the study improved the classification accuracy of the algorithms.

Keywords: Crude protein; Decision tree; Hyperspectral sensor; Machine learning; Support vector machine.

MeSH terms

  • Edible Grain / genetics
  • Genotype
  • Glycine max* / genetics
  • Phenotype
  • Spectroscopy, Near-Infrared*