Predicting environmental risk factors in relation to health outcomes among school children from Romania using random forest model - An analysis of data from the SINPHONIE project

Sci Total Environ. 2021 Aug 25:784:147145. doi: 10.1016/j.scitotenv.2021.147145. Epub 2021 Apr 16.

Abstract

Background: Few studies have simultaneously assessed the health impact of school and home environmental factors on children, since handling multiple highly correlated environmental variables is challenging. In this study, we examined indoor home and school environments in relation to health outcomes using machine learning methods and logistic regression.

Methods: We used the data collected by the SINPHONIE (Schools Indoor Pollution and Health: Observatory Network in Europe) project in Romania, a multicenter European research study that collected comprehensive information on school and home environments, health symptoms in children, smoking, and school policies. The health outcomes were categorized as: any health symptoms, asthma, allergy and flu-like symptoms. Both logistic regression and random forest (RF) methods were used to predict the four categories of health outcomes, and the methods prediction performance was compared.

Results: The RF method we employed for analysis showed that common risk factors for the investigated categories of health outcomes, included: environmental tobacco smoke (ETS), dampness in the indoor school environment, male gender, air freshener use, residence located in proximity of traffic (<200 m), stressful schoolwork, and classroom noise (contributions ranged from 7.91% to 23.12%). Specificity, accuracy and area under the curve (AUC) values for most outcomes were higher when using RF compared to logistic regression, while sensitivity was similar in both methods.

Conclusion: This study suggests that ETS, dampness in the indoor school environment, use of air fresheners, living in proximity to traffic (<200 m) and noise are common environmental risk factors for the investigated health outcomes. RF pointed out better predictive values, sensitivity and accuracy compared to logistic regression.

Keywords: Child; Exposure; Health outcomes; Logistic regression; Machine learning.

Publication types

  • Multicenter Study

MeSH terms

  • Air Pollution, Indoor* / analysis
  • Child
  • Europe
  • Humans
  • Male
  • Outcome Assessment, Health Care
  • Risk Factors
  • Romania / epidemiology
  • Schools