Machine-learning analysis of intrinsically disordered proteins identifies key factors that contribute to neurodegeneration-related aggregation

Akshatha Ganne; Meenakshisundaram Balasubramaniam; Srinivas Ayyadevara; Robert J Shmookler Reis

doi:10.3389/fnagi.2022.938117

Machine-learning analysis of intrinsically disordered proteins identifies key factors that contribute to neurodegeneration-related aggregation

Front Aging Neurosci. 2022 Aug 3:14:938117. doi: 10.3389/fnagi.2022.938117. eCollection 2022.

Authors

Akshatha Ganne¹, Meenakshisundaram Balasubramaniam², Srinivas Ayyadevara^{1

2

3}, Robert J Shmookler Reis^{1

2

3}

Affiliations

¹ Bioinformatics Program, University of Arkansas for Medical Sciences and University of Arkansas at Little Rock, Little Rock, AR, United States.
² Department of Geriatrics, University of Arkansas for Medical Sciences, Little Rock, AR, United States.
³ Central Arkansas Veterans Healthcare System, Little Rock, AR, United States.

Abstract

Protein structure is determined by the amino acid sequence and a variety of post-translational modifications, and provides the basis for physiological properties. Not all proteins in the proteome attain a stable conformation; roughly one third of human proteins are unstructured or contain intrinsically disordered regions exceeding 40% of their length. Proteins comprising or containing extensive unstructured regions are termed intrinsically disordered proteins (IDPs). IDPs are known to be overrepresented in protein aggregates of diverse neurodegenerative diseases. We evaluated the importance of disordered proteins in the nematode Caenorhabditis elegans, by RNAi-mediated knockdown of IDPs in disease-model strains that mimic aggregation associated with neurodegenerative pathologies. Not all disordered proteins are sequestered into aggregates, and most of the tested aggregate-protein IDPs contribute to important physiological functions such as stress resistance or reproduction. Despite decades of research, we still do not understand what properties of a disordered protein determine its entry into aggregates. We have employed machine-learning models to identify factors that predict whether a disordered protein is found in sarkosyl-insoluble aggregates isolated from neurodegenerative-disease brains (both AD and PD). Machine-learning predictions, coupled with principal component analysis (PCA), enabled us to identify the physiochemical properties that determine whether a disordered protein will be enriched in neuropathic aggregates.

Keywords: Alzheimer’s disease; Parkinson’s disease; drug screening and discovery; intrinsically disordered proteins (IDPs); misfolding and aggregation; neural network; proteostasis; support vector machine.