Prediction of Individual Gas Yields of Supercritical Water Gasification of Lignocellulosic Biomass by Machine Learning Models

Molecules. 2024 May 16;29(10):2337. doi: 10.3390/molecules29102337.

Abstract

Supercritical water gasification (SCWG) of lignocellulosic biomass is a promising pathway for the production of hydrogen. However, SCWG is a complex thermochemical process, the modeling of which is challenging via conventional methodologies. Therefore, eight machine learning models (linear regression (LR), Gaussian process regression (GPR), artificial neural network (ANN), support vector machine (SVM), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), and categorical boosting regressor (CatBoost)) with particle swarm optimization (PSO) and a genetic algorithm (GA) optimizer were developed and evaluated for prediction of H2, CO, CO2, and CH4 gas yields from SCWG of lignocellulosic biomass. A total of 12 input features of SCWG process conditions (temperature, time, concentration, pressure) and biomass properties (C, H, N, S, VM, moisture, ash, real feed) were utilized for the prediction of gas yields using 166 data points. Among machine learning models, boosting ensemble tree models such as XGB and CatBoost demonstrated the highest power for the prediction of gas yields. PSO-optimized XGB was the best performing model for H2 yield with a test R2 of 0.84 and PSO-optimized CatBoost was best for prediction of yields of CH4, CO, and CO2, with test R2 values of 0.83, 0.94, and 0.92, respectively. The effectiveness of the PSO optimizer in improving the prediction ability of the unoptimized machine learning model was higher compared to the GA optimizer for all gas yields. Feature analysis using Shapley additive explanation (SHAP) based on best performing models showed that (21.93%) temperature, (24.85%) C, (16.93%) ash, and (29.73%) C were the most dominant features for the prediction of H2, CH4, CO, and CO2 gas yields, respectively. Even though temperature was the most dominant feature, the cumulative feature importance of biomass characteristics variables (C, H, N, S, VM, moisture, ash, real feed) as a group was higher than that of the SCWG process condition variables (temperature, time, concentration, pressure) for the prediction of all gas yields. SHAP two-way analysis confirmed the strong interactive behavior of input features on the prediction of gas yields.

Keywords: artificial intelligence; biofuel; hydrogen; lignocellulosic biomass; machine learning; supercritical water gasification.

MeSH terms

  • Algorithms
  • Biomass*
  • Carbon Dioxide / analysis
  • Carbon Dioxide / chemistry
  • Gases / analysis
  • Gases / chemistry
  • Hydrogen* / analysis
  • Hydrogen* / chemistry
  • Lignin* / chemistry
  • Machine Learning*
  • Methane / analysis
  • Methane / chemistry
  • Neural Networks, Computer
  • Support Vector Machine
  • Water* / chemistry

Substances

  • Lignin
  • lignocellulose
  • Water
  • Hydrogen
  • Gases
  • Carbon Dioxide
  • Methane