Machine-learning analysis of intrinsically disordered proteins identifies key factors that contribute to neurodegeneration-related aggregation

Front Aging Neurosci. 2022 Aug 3:14:938117. doi: 10.3389/fnagi.2022.938117. eCollection 2022.

Abstract

Protein structure is determined by the amino acid sequence and a variety of post-translational modifications, and provides the basis for physiological properties. Not all proteins in the proteome attain a stable conformation; roughly one third of human proteins are unstructured or contain intrinsically disordered regions exceeding 40% of their length. Proteins comprising or containing extensive unstructured regions are termed intrinsically disordered proteins (IDPs). IDPs are known to be overrepresented in protein aggregates of diverse neurodegenerative diseases. We evaluated the importance of disordered proteins in the nematode Caenorhabditis elegans, by RNAi-mediated knockdown of IDPs in disease-model strains that mimic aggregation associated with neurodegenerative pathologies. Not all disordered proteins are sequestered into aggregates, and most of the tested aggregate-protein IDPs contribute to important physiological functions such as stress resistance or reproduction. Despite decades of research, we still do not understand what properties of a disordered protein determine its entry into aggregates. We have employed machine-learning models to identify factors that predict whether a disordered protein is found in sarkosyl-insoluble aggregates isolated from neurodegenerative-disease brains (both AD and PD). Machine-learning predictions, coupled with principal component analysis (PCA), enabled us to identify the physiochemical properties that determine whether a disordered protein will be enriched in neuropathic aggregates.

Keywords: Alzheimer’s disease; Parkinson’s disease; drug screening and discovery; intrinsically disordered proteins (IDPs); misfolding and aggregation; neural network; proteostasis; support vector machine.