Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample

PLoS One. 2016 Dec 9;11(12):e0167055. doi: 10.1371/journal.pone.0167055. eCollection 2016.

Abstract

Background: Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study.

Methods: A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary.

Results: Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters.

Conclusion: This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.

MeSH terms

  • Adult
  • Cluster Analysis
  • Community Networks
  • Comorbidity
  • Cross-Sectional Studies
  • Depression / diagnosis*
  • Depression / epidemiology
  • Depression / psychology
  • Depressive Disorder / classification
  • Depressive Disorder / diagnosis*
  • Depressive Disorder / epidemiology
  • Female
  • Gastrointestinal Diseases / diagnosis
  • Gastrointestinal Diseases / epidemiology
  • Humans
  • Logistic Models*
  • Machine Learning*
  • Male
  • Middle Aged
  • Nutrition Surveys / statistics & numerical data
  • Population Surveillance / methods
  • Self Report

Grants and funding

MB is supported by a NHMRC Senior Principal Research Fellowship 1059660. LJW is supported by a NHMRC Career Development Fellowship 1064272. FNJ is supported by an NHMRC Career Development Fellowship 1108125. The author(s) received no specific funding for this work.